In their blog post titled “MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices,” authors Yang Zhao and Tingbo Hou introduce MobileDiffusion, an efficient latent diffusion model specifically designed for mobile devices. They address the challenges faced by text-to-image diffusion models, such as the iterative denoising process and the complexity of network architecture, which make them computationally expensive to run on mobile devices. MobileDiffusion overcomes these challenges by optimizing the model’s architectural efficiency and adopting one-step sampling during inference. The authors have tested MobileDiffusion on iOS and Android premium devices, and it can generate a high-quality image in just half a second. Its small model size of 520M parameters makes it suitable for mobile deployment. By providing rapid text-to-image generation on mobile devices, MobileDiffusion opens up new possibilities for enhancing user experience and addressing privacy concerns.
https://blog.research.google/2024/01/mobilediffusion-rapid-text-to-image.html