MLC-LLM is a machine learning compilation-based solution that enables the deployment of LLMs on AMD GPUs using ROCm. The AMD Radeon RX 7900 XTX shows competitive performance compared to NVIDIA’s GeForce RTX 4090 and RTX 3090Ti for Llama2-7B/13B. The performance of AMD GPUs has historically lagged due to a lack of software support and optimizations, but investments in the ROCm stack and machine learning compilation technology are starting to bridge the gap. The MLC-LLM solution supports various backends, including CUDA, Metal, ROCm, Vulkan, and OpenCL. Benchmarking results show that AMD GPUs can achieve 80% of the speed of NVIDIA GPUs for LLM inference. Instructions are provided to try out the solution on your own devices. Moreover, the article highlights the need for continuous development and improvement in machine learning system engineering and acknowledges the open-source communities and technologies that make these advancements possible.
https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference