ChatTTS is a text-to-speech model optimized for dialogue scenarios, supporting English and Chinese languages with fine-grained prosodic control. The open-source version on HuggingFace is a pre-trained model with 40,000 hours without SFT. It excels in conversational TTS and surpasses most open-source models in prosody. To prevent misuse, high-frequency noise was added during training. However, an internal detection model is being developed for potential open-sourcing. Current user control options include laughter, pauses, and intonation, with plans for additional emotional control in future models. Acknowledging the contributions of other models, ChatTTS emphasizes responsible and ethical use, limiting commercial and legal purposes.
https://github.com/2noise/ChatTTS