LLaMA-Omni is a cutting-edge speech-language model that allows for seamless speech interactions with low latency and high quality responses. It is built on Llama-3.1-8B-Instruct, ensuring top-notch performance. Surprisingly, it can simultaneously generate text and speech responses based on speech instructions. This model was trained in less than 3 days using just 4 GPUs, showcasing its efficiency. The installation process is straightforward, and there is a Gradio demo available to experience LLaMA-3.1-8B-Omni firsthand. Note that due to streaming audio playback issues, autoplay is disabled. For local inference, specific instructions are provided, and the code is released under the Apache-2.0 License.
https://github.com/ictnlp/LLaMA-Omni