Pushing the frontiers of audio generation

Our speech generation technologies are revolutionizing digital assistants and AI tools by creating natural and engaging voices for more human-like interactions. By developing models that can generate high-quality speech from various inputs, we are enhancing Google products like Gemini Live and YouTube’s auto dubbing. Through groundbreaking research in audio generation, we have achieved the ability to produce multi-speaker dialogue for complex content accessibility, such as NotebookLM Audio Overviews and Illuminate. Our latest advancements allow for faster generation of longer speech segments with improved quality, opening up new possibilities for enriched speech experiences combined with other modalities like video.

https://deepmind.google/discover/blog/pushing-the-frontiers-of-audio-generation/