ML Methods
Probability Flow ODE
Song et al. 2021: every diffusion SDE has a deterministic ODE that produces the same time-marginals. The deterministic dual of Anderson's reverse SDE; the basis of DDIM, DPM-Solver, EDM samplers, exact-likelihood computation, and the conceptual bridge to flow matching.
Why This Matters
Every fast deterministic sampler used in production diffusion models — DDIM, DPM-Solver, DPM-Solver++, EDM, UniPC — is integrating the same underlying object: the probability flow ODE of Song, Sohl-Dickstein, Kingma, Kumar, Ermon, and Poole (ICLR 2021). Anderson's reverse SDE is the stochastic way to invert a forward noising process; the probability flow ODE is the deterministic way that produces the same intermediate marginals. The choice between them happens purely at sampling time, with the same trained score network.
The ODE form is what enables three things that the SDE form cannot give. It admits exact log-likelihood computation via the instantaneous change-of-variables formula (Chen, Rubanova, Bettencourt, and Duvenaud, 2018), so a diffusion model becomes a continuous normalizing flow at inference time. It allows high-order solvers (DPM-Solver, Heun, exponential integrators) that take 20-50 steps where Euler-Maruyama on the reverse SDE needs hundreds or thousands. It is deterministic: the same noise sample maps to the same image, which is the property that makes DDIM-style image editing, latent-space interpolation, and consistency models possible at all.
The probability flow ODE is also the conceptual bridge from score-based diffusion to flow matching (Lipman, Chen, Ben-Hamu, Nickel, and Le, ICLR 2023) and rectified flow (Liu, Gong, and Liu, ICLR 2023). Flow matching trains a velocity field to directly match the probability flow drift, bypassing the score parameterization entirely. Once you see the ODE, the score is just one way to specify a transport field; flow matching is what you get if you specify the transport field directly.
Beyond generative modeling, the same construction shows up in optimal transport (Otto's gradient-flow / JKO scheme), Fokker-Planck-driven particle methods (Maoutsa, Reich, and Opper, 2020), and any setting where you want to replace a stochastic sampler with a deterministic transport map without changing the marginals.
Mental Model
A Fokker-Planck equation is a continuity equation for densities: , where is the probability current. The current can be split into a part that comes from the SDE drift and a part that comes from the diffusion. The first key observation is that many different vector fields produce the same current, because the current underdetermines on regions where is constant. The second is that for a given Fokker-Planck flow there is exactly one deterministic transport that reproduces it: the unique such that with no diffusion term.
That unique deterministic drift is the probability flow ODE field. It carries each density level through time on noiseless characteristic curves whose distribution at every time matches the SDE's marginal. Pathwise the SDE and the ODE are very different (Brownian fluctuations versus smooth ODE trajectories), but as measures on intermediate times they are indistinguishable.
A useful aphorism: the Fokker-Planck equation is what the densities do; the SDE and the ODE are two different particle realizations of those densities. Anderson's reverse SDE adds noise to average over many trajectories; the probability flow ODE removes noise to follow one trajectory deterministically. Both are correct, and both are useful, for different jobs.
Formal Statement
Probability Flow ODE
Let solve the forward Itô SDE on with marginal density (assume the diffusion coefficient is independent of for simplicity). The probability flow ODE associated to this SDE is the deterministic ODE
If and solves the ODE with , then has marginal density for every . The score field is the same object that appears in Anderson's reverse-time SDE; the only difference is the factor in front of the score correction.
In the general case where depends on , the formula picks up an additional divergence term and reads . For variance-preserving and variance-exploding diffusion schedules is -independent, and the simpler form above is the one used in practice.
Marginal Equivalence Theorem
Probability Flow ODE Marginal Equivalence
Statement
Under the assumptions above, let be the marginal density of the SDE with . Define the deterministic vector field and let solve the ODE with . Then for every the law of has density , the same density the SDE produces.
Intuition
Both processes must satisfy a continuity equation for some velocity field . The SDE's Fokker-Planck operator rewrites as a continuity equation with . That is the unique deterministic drift compatible with the marginal flow. The diffusion has been absorbed into the drift via the score; the noise term disappears because deterministic transport already carries the right amount of mass.
Proof Sketch
Start from the Fokker-Planck equation . With independent of , the second term simplifies to . Use the identity to rewrite this as . Substitute back into the Fokker-Planck equation and collect terms: . This is exactly the continuity equation for . The characteristic curves of this continuity equation are the trajectories of the ODE , and mass is carried along characteristics, so the marginal of any ODE solution started from is .
Why It Matters
Two practical consequences follow. First, you can sample from a diffusion model by integrating a deterministic ODE instead of a stochastic SDE. This allows higher-order ODE solvers (Heun, RK4, exponential integrators, DPM-Solver multistep) that match SDE-sampler quality in 20-50 function evaluations instead of 250-1000. Second, because the ODE is a continuous normalizing flow, you can compute exact log-likelihoods via the instantaneous change-of-variables formula (next theorem), which the stochastic sampler cannot do. The deterministic ODE produces marginal density identical to the SDE's marginal at every .
Failure Mode
The score becomes singular near the data manifold as . For data concentrated on a low-dimensional submanifold of (the realistic case for images), is not a density on at all, and for small has nearly singular score near the manifold. The ODE then becomes stiff: standard explicit solvers either blow up or take vanishingly small steps near . EDM (Karras et al., 2022) addresses this by reparameterizing the ODE so the singular behavior is absorbed into a preconditioner, and DPM-Solver uses exponential integrators that are stable for the linear part of the dynamics. Naive Euler integration of the probability flow ODE on unprocessed forward SDEs is the canonical failure mode.
Instantaneous Change of Variables and Exact Likelihood
Instantaneous Change of Variables (Chen et al. 2018)
Statement
Let solve with initial condition and let be the density of the random variable when . Then along the trajectory,
Integrating from to gives the exact log-likelihood identity , where is the ODE trajectory connecting and .
Intuition
The density along the flow is the Jacobian determinant of the flow map. Differentiating in time gives the trace of the time-derivative of the Jacobian, which is . The minus sign comes from the continuity equation: where the velocity field has positive divergence, mass is being stretched apart and density must drop.
Proof Sketch
By the continuity equation and the chain rule along characteristics, at . Divide by to get . The integral form follows by the fundamental theorem of calculus.
Why It Matters
Combined with the probability flow ODE, this gives an exact likelihood estimator for diffusion models. Algorithm: integrate the ODE forward from a data point to noise , simultaneously accumulating the trace integral . The log-likelihood is where is the standard Gaussian log-density at the noise endpoint. This is the only way to get exact (not lower-bound) likelihoods from a diffusion model, and it is what Song et al. 2021 used to report bits-per-dim on CIFAR-10 competitive with autoregressive models. The trace in high dimensions is estimated stochastically via Hutchinson's trick: for or Rademacher, which costs one Jacobian-vector product per estimate. The log-density along an ODE trajectory evolves as .
Failure Mode
The Hutchinson estimator has variance that grows with the dimension; for high-dimensional images, getting a tight likelihood estimate requires many samples per data point. The ODE integration error also compounds with the trace-integral error; halving the step size doubles both the integration work and the trace-estimator work. Reported "exact" likelihoods always carry an integration tolerance that practitioners sometimes skip past in the comparison tables.
Connection to DDIM, EDM, and Flow Matching
DDIM (Song, Meng, and Ermon, ICLR 2021) was discovered before the continuous-time formulation; the original paper described a non-Markovian discrete forward process whose deterministic reverse is closed form. Once Song et al. 2021 introduced the SDE framework, DDIM was recognized as exactly the discretization of the probability flow ODE for the variance-preserving diffusion schedule. The "deterministic DDIM" sampler that ships in every diffusion library is integrating with a first-order Euler step in a transformed time variable. There is no separate trained model; it is the same DDPM weights, sampled differently.
EDM (Karras, Aittala, Aila, and Laine, NeurIPS 2022) reparameterizes the probability flow ODE in terms of a noise-level rather than a time , and applies preconditioning to the network so the singular behavior near the data manifold is absorbed into the parameterization. With Heun's second-order solver and the EDM preconditioner, 30-40 function evaluations match the FID of 1000-step DDPM sampling. DPM-Solver (Lu, Zhou, Bao, Chen, Li, and Zhu, NeurIPS 2022) goes further by exploiting the linear structure of the variance-preserving forward process: the linear term has a closed-form solution, so only the score-driven nonlinear term needs numerical integration, yielding 10-20 step samplers with little quality loss.
Flow matching (Lipman et al. 2023) takes the reframing one step further. Train a network to directly regress the probability flow drift, conditional on a chosen forward path between data and noise. The training loss is where is the conditional velocity along the chosen path. The score parameterization disappears entirely; what remains is a direct regression on the velocity field of a continuous normalizing flow. For Gaussian source paths this reproduces score-matching diffusion exactly, but the framework also accommodates other interpolation paths (rectified flow, optimal-transport-coupled paths) and is the formulation behind Stable Diffusion 3.
Worked Example: Variance-Preserving SDE
The variance-preserving (VP) forward SDE is for a positive schedule . The diffusion coefficient is , so . The probability flow ODE is
Compare with Anderson's reverse SDE for the same forward process: . The deterministic ODE has a half score correction where the SDE has a full one. The factor of is what makes the SDE trajectory wander stochastically while the ODE trajectory stays on a single deterministic curve.
As a sanity check, suppose the VP process has reached its stationary marginal (this is the limit as for the VP schedule). The score is , and the ODE becomes . The deterministic transport correctly identifies that once the marginals have stopped evolving, no transport is needed. For finite-time intermediate marginals, integrate the ODE numerically with a learned score in place of the unknown true score; this is what every deterministic diffusion sampler does at inference time.
Common Confusions
Same marginals are not the same joint distribution
The probability flow ODE and the SDE produce the same time-marginals , but they are very different processes pathwise. The SDE's trajectory is a Brownian-driven random function with Hölder-1/2 regularity; the ODE's trajectory is a smooth deterministic curve. Quantities that depend on more than one time slice — joint distributions of , exit times, path-functional expectations — generally differ between the two processes. The marginal-equivalence theorem only asserts equality at single times.
The probability flow ODE is unique among gradient-class transports, not among all transports
Many vector fields produce the same density evolution . The probability flow ODE is the unique drift you get by absorbing the diffusion into the deterministic transport via the identity . If you allow vector fields outside this gradient-class structure (for example, adding a divergence-free perturbation with ), you get a different ODE that produces the same marginals. Rectified flow exploits exactly this freedom by post-processing trajectories to make them straighter. The probability flow ODE is canonical because it is the unique gradient-class deterministic transport, not because it is the only deterministic transport.
DDIM is not a different model from DDPM
A common belief: DDPM and DDIM are different generative models, with DDIM being faster and lower quality. They are the same model. DDPM and DDIM share weights, training loss, and forward process. They differ only at sampling time: DDPM integrates the reverse SDE (stochastic) with first-order Euler steps; DDIM integrates the probability flow ODE (deterministic). The trained network is identical. This is why you can take any pretrained DDPM checkpoint and run DDIM sampling on it without retraining. The same identity applies to score-SDE models and EDM samplers.
Exercises
Problem
Verify the probability flow ODE for standard Brownian motion: with on . Compute the marginal and the score . Write the probability flow ODE explicitly. Solve the ODE starting from and confirm that when .
Problem
Derive the instantaneous change of variables formula from the continuity equation along characteristics of the ODE .
References
No canonical references provided.
No current references provided.
No frontier references provided.
Next Topics
- Diffusion Models: the generative-modeling family in which the probability flow ODE is the deterministic sampler.
- Score Matching: the training objective for the score that appears in the ODE drift.
- Time Reversal of SDEs: Anderson's stochastic dual to the probability flow ODE; same marginals, different paths.
- Fokker-Planck Equation: the PDE machinery the marginal-equivalence proof reduces to.
- Stochastic Differential Equations: the parent framework for both the SDE and its ODE counterpart.
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Stochastic Differential EquationsLayer 3
- Brownian MotionLayer 2
- Measure-Theoretic ProbabilityLayer 0B
- Martingale TheoryLayer 0B
- Ito's LemmaLayer 3
- Stochastic Calculus for MLLayer 3
- Fokker–Planck EquationLayer 3
- PDE Fundamentals for Machine LearningLayer 1
- Fast Fourier TransformLayer 1
- Exponential Function PropertiesLayer 0A
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Functional Analysis CoreLayer 0B
- Metric Spaces, Convergence, and CompletenessLayer 0A
- Inner Product Spaces and OrthogonalityLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
- Score MatchingLayer 3