Generating audio for video

Video-to-audio research is paving the way for rich soundtracks to be generated from video pixels and natural language text prompts. This technology allows for the creation of synchronized audiovisual content, enhancing the creative control of users by providing the option to define positive or negative prompts to guide the audio output. The system uses diffusion-based approaches to realistically synchronize video and audio information without the need for manual alignment. While the technology is still undergoing improvements, it shows promise in bringing generated movies to life and is being developed with a commitment to safety and transparency.

https://deepmind.google/discover/blog/generating-audio-for-video/