Optimizing a WebGPU Matmul Kernel for 1 TFLOP

Summary:
The author works at Nomic, where colleagues build TSNE-like visualizations in the browser. Deepscatter, developed by Ben Schmidt, solves scaling problems. Conversations revolve around Typescript and WebGPU’s benefits. The author created Surfgrad, a high-performant, WebGPU-powered autograd library for browser-based tensor operations. WebGPU allows GPU code on any device with a web browser. It supports subgroups for shared data and can compile to Vulkan and Metal. The post explains optimizing WebGPU Matrix Multiplication with nuances compared to CUDA. Various strategies are discussed, like unrolling loops for better kernel performance. The limitations and advancements of WebGPU for large matrix operations are explored.

https://zanussbaum.substack.com/p/optimizing-a-webgpu-matmul-kernel

To top