DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion

The DiffRhythm model presented by Ziqian Ning et al. addresses the limitations of existing music generation systems by synthesizing complete songs with both vocal and accompaniment in a remarkably fast ten seconds. The model eliminates the need for complex data preparation and intricate architectures, ensuring scalability while maintaining high musicality and intelligibility. The simplicity of DiffRhythm lies in its non-autoregressive structure, requiring only lyrics and a style prompt for inference. The model stands out for its ability to generate full-length songs up to 4m45s in duration swiftly while ensuring a high level of creativity and efficiency.

https://aslp-lab.github.io/DiffRhythm.github.io/