Making AMD GPUs competitive for LLM inference (2023)

MLC-LLM enables the compilation and deployment of LLMs on AMD GPUs using ROCm with competitive performance, such as the AMD Radeon RX 7900 XTX. Benchmark results show that it offers 80% of the speed of NVIDIA GeForce RTX 4090 and 94% for RTX 3090Ti for Llama2-7B/13B models. The post discusses how AMD GPUs are catching up with investments in the ROCm stack compared to NVIDIA GPUs optimized for CUDA. Using machine learning compilation, MLC-LLM offers high-performance deployment across different backends like CUDA, Metal, Vulkan, and OpenCL. It highlights the potential of AMD GPUs for LLM inference and ongoing efforts for continuous improvements in the ML engineering landscape.

https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference