The AMD GPU Inference project offers a Docker-based engine for running Large Language Models (LLMs) on AMD GPUs, optimized for Hugging Face models, particularly the LLaMA family. Prerequisites include an AMD GPU with ROCm support, Docker, and ROCm drivers. The project structure includes necessary files and scripts for easy setup. Users can quickly start running inference with a specific model and prompt. Customization options allow for using different models or modifying the inference logic. Troubleshooting tips are provided for common issues. Contributions are welcomed, and the project utilizes Hugging Face Transformers library and AMD’s ROCm.
https://github.com/slashml/amd_inference