Diffusion Lab
Learn the real diffusion loop: corrupt clean structure in closed form, then spend sampler budget bringing it back with the right amount of guidance.
Training runs in the main thread, ~30 steps per animation frame. Sampling does 1000 reverse steps per point and blocks briefly. DDIM and CFG controls land in later phases.
How to read this
A small MLP is being trained right now in your tab to denoise samples from the chosen 2D distribution. The forward process adds Gaussian noise on the cosine schedule of Nichol & Dhariwal: at step t a clean point x_0 becomes x_t = sqrt(bar_alpha_t) · x_0 + sqrt(1 − bar_alpha_t) · eps. The network learns to predict that eps from x_t with class and time conditioning. Reverse-time samplers turn those eps predictions back into samples.
Sampler choice
DDPM walks the full T = 1000 reverse chain with fresh noise at each step (Ho 2020). DDIM keeps the same trained network but takes a sub-schedule of N deterministic steps via the predicted x_0(Song, Meng & Ermon 2020). Heun (EDM) rewrites the trajectory in sigma-space using σ = sqrt((1 − bar_alpha_t)/bar_alpha_t) and integrates the probability-flow ODE with a 2nd-order predictor-corrector on a Karras ρ=7 schedule. The quality chart makes the trade-off visible: at low step counts Heun usually wins on MMD, at high step counts the three converge.
Classifier-free guidance
The network is trained on both class labels and an empty token (Ho & Salimans 2022). At sample time the eps used is (1 + w) · eps_c − w · eps_∅. The vector field plot lets you watch the ε-arrows on a 16×16 grid stretch and sharpen as w increases. w = 0 recovers the plain class-conditional field.
Vector field
Each arrow plots the score s_θ = −eps_θ / sqrt(1 − bar_alpha_t) at one (x, y) location. At small t the arrows snap clean points into mode centers; at large t they point inward toward the origin because almost all of x_t is noise. Watch the field morph as training progresses.
Quality vs steps
Each point on the chart is the kernel-MMD between 200 generated samples and a fresh batch from the target, computed with an RBF kernel set at σ ∈ {0.5, 1, 2} × median pairwise distance (Gretton 2012). Lower is better. The DDIM and Heun curves use the same trained weights and the same step counts; the network-call cost of Heun is roughly twice that of DDIM at the same step count because each Heun step is a predictor-corrector pair.
References
- Ho, Jain, Abbeel. Denoising Diffusion Probabilistic Models. NeurIPS 2020.
arXiv:2006.11239 - Song, Meng, Ermon. Denoising Diffusion Implicit Models. ICLR 2021.
arXiv:2010.02502 - Nichol, Dhariwal. Improved Denoising Diffusion Probabilistic Models. ICML 2021.
arXiv:2102.09672 - Ho, Salimans. Classifier-Free Diffusion Guidance. NeurIPS 2021 Workshop.
arXiv:2207.12598 - Karras, Aittala, Aila, Laine. Elucidating the Design Space of Diffusion-Based Generative Models. NeurIPS 2022.
arXiv:2206.00364 - Gretton et al. A Kernel Two-Sample Test. JMLR 2012.