Skip to main content

Diffusion Lab

Learn the real diffusion loop: corrupt clean structure in closed form, then spend sampler budget bringing it back with the right amount of guidance.

Target
Sampler
CFG
step 0loss
Target distribution
300 fresh samples from p(x, c)
Model samples
press Sample to draw from the current model
Score field
arrows point toward higher density at noise level t
t = 200 / 1000
start training to populate the score field
Sample quality vs steps · DDIM (η=0) vs Heun (Karras)press Quality vs steps to evaluate
48163264128steps (network forwards: DDIM = N, Heun ≈ 2N)0.0500DDIMHeun (EDM)press Quality vs steps after training
Training lossno steps yet
press Train

Training runs in the main thread, ~30 steps per animation frame. Sampling does 1000 reverse steps per point and blocks briefly. DDIM and CFG controls land in later phases.

How to read this

A small MLP is being trained right now in your tab to denoise samples from the chosen 2D distribution. The forward process adds Gaussian noise on the cosine schedule of Nichol & Dhariwal: at step t a clean point x_0 becomes x_t = sqrt(bar_alpha_t) · x_0 + sqrt(1 − bar_alpha_t) · eps. The network learns to predict that eps from x_t with class and time conditioning. Reverse-time samplers turn those eps predictions back into samples.

Sampler choice

DDPM walks the full T = 1000 reverse chain with fresh noise at each step (Ho 2020). DDIM keeps the same trained network but takes a sub-schedule of N deterministic steps via the predicted x_0(Song, Meng & Ermon 2020). Heun (EDM) rewrites the trajectory in sigma-space using σ = sqrt((1 − bar_alpha_t)/bar_alpha_t) and integrates the probability-flow ODE with a 2nd-order predictor-corrector on a Karras ρ=7 schedule. The quality chart makes the trade-off visible: at low step counts Heun usually wins on MMD, at high step counts the three converge.

Classifier-free guidance

The network is trained on both class labels and an empty token (Ho & Salimans 2022). At sample time the eps used is (1 + w) · eps_c − w · eps_∅. The vector field plot lets you watch the ε-arrows on a 16×16 grid stretch and sharpen as w increases. w = 0 recovers the plain class-conditional field.

Vector field

Each arrow plots the score s_θ = −eps_θ / sqrt(1 − bar_alpha_t) at one (x, y) location. At small t the arrows snap clean points into mode centers; at large t they point inward toward the origin because almost all of x_t is noise. Watch the field morph as training progresses.

Quality vs steps

Each point on the chart is the kernel-MMD between 200 generated samples and a fresh batch from the target, computed with an RBF kernel set at σ ∈ {0.5, 1, 2} × median pairwise distance (Gretton 2012). Lower is better. The DDIM and Heun curves use the same trained weights and the same step counts; the network-call cost of Heun is roughly twice that of DDIM at the same step count because each Heun step is a predictor-corrector pair.

References

  • Ho, Jain, Abbeel. Denoising Diffusion Probabilistic Models. NeurIPS 2020. arXiv:2006.11239
  • Song, Meng, Ermon. Denoising Diffusion Implicit Models. ICLR 2021. arXiv:2010.02502
  • Nichol, Dhariwal. Improved Denoising Diffusion Probabilistic Models. ICML 2021. arXiv:2102.09672
  • Ho, Salimans. Classifier-Free Diffusion Guidance. NeurIPS 2021 Workshop. arXiv:2207.12598
  • Karras, Aittala, Aila, Laine. Elucidating the Design Space of Diffusion-Based Generative Models. NeurIPS 2022. arXiv:2206.00364
  • Gretton et al. A Kernel Two-Sample Test. JMLR 2012.