OuteTTS-0.1-350M is a text-to-speech synthesis model that uses a pure language modeling approach, eliminating the need for external adapters or complex architectures. By leveraging the LLaMa architecture and Oute3-350M-DEV base model, it demonstrates high-quality speech synthesis through crafted prompts and audio tokens. The model has voice cloning capabilities and is compatible with llama.cpp and GGUF format. Limitations of the experimental release include vocabulary constraints, string-only input support, and potential word alterations. Interesting features include a three-step audio processing approach and the ability to generate TTS with custom voices. Future improvements involve scaling parameters and exploring alternative alignment methods. Disclaimers highlight the risks associated with using the model.
https://huggingface.co/OuteAI/OuteTTS-0.1-350M