SIMT, or “single instruction multiple thread,” is a programming model used by GPUs that allows programmers to write programs from the perspective of a single thread. The system then runs the program in parallel with different threadIDs. GPUs optimize this process by using vector registers and instructions to run multiple threads at the same time. Control flow in SIMT can be complicated, but GPUs handle it using predicated instructions and bitmasks to determine which threads should execute certain blocks of code. AVX-512 is an instruction set that supports predication bitmasks, allowing for SIMT-like operations. The author implemented a compiler that compiles a toy language to simulate the SIMT transformation using AVX-512. The compiler focuses on learning rather than performance optimization. The toy language is similar to C and includes features like pointers, loops, and structs. The compiler transforms variables, math operations, l-values, r-values, pointers, copying, and function calls to work with AVX-512’s predication bitmasks. Pointer handling becomes more complex when dealing with struct assignments, and function calls need to be deduplicated for optimal performance.
http://litherum.blogspot.com/2023/10/implementing-gpus-programming-model-on.html