Today, we’re excited to share our latest research on Stable Diffusion 3, a cutting-edge text-to-image generation system that outperforms current models like DALL·E 3 and Midjourney v6 in typography and prompt adherence based on human evaluations. Our new Multimodal Diffusion Transformer (MMDiT) architecture improves text understanding by using separate sets of weights for image and language representations. Stable Diffusion 3 showcases impressive performance across Visual Aesthetics, Prompt Following, and Typography when compared to other open and closed-source models. With flexibility in model sizes and remarkable image generation capabilities, Stable Diffusion 3 is a game-changer in the field. Check out the full research paper for more details on our innovative approach.
https://stability.ai/news/stable-diffusion-3-research-paper