Diffusion models from scratch, from a new theoretical perspective

Diffusion models are gaining popularity in generative modeling, showing exceptional results in sampling from multimodal distributions. They are widely used in text-to-image tools like Stable Diffusion and other domains like audio, video, protein design, and robotics. This tutorial focuses on implementing diffusion models from scratch using optimization theory. By training neural networks to predict noise direction, diffusion models generate samples from a training set. The tutorial explains the training process step by step and introduces various noise schedules. The theory behind diffusion models interprets denoisers as projections to data manifolds, providing insights into convergence and sampling algorithms. Ideal denoisers aim at finding the closest point in the dataset and relate to smoothed squared-distance functions. The tutorial includes a demonstration of sampling from learned denoisers for practical applications.

https://www.chenyang.co/diffusion.html