AMD’s RDNA 4 introduces memory subsystem enhancements, with a focus on resolving false cross-wave memory dependencies. Prior to RDNA 4, RDNA 3 had strict ordering on data returns which could result in delays between waves. The new architecture allows requests from different shaders to be satisfied out-of-order, breaking the shared memory access queue into per-thread queues. This improvement is essential in raytracing workloads where traversal and result handling occur simultaneously. While RDNA 4’s memory subsystem enhancements are significant, they build on existing techniques seen in GCN and other GPU architectures. The changes aim to boost performance and efficiency, paving the way for future advancements in AMD’s GPU technology.
https://chipsandcheese.com/p/rdna-4s-out-of-order-memory-accesses