Beyond Llms
Flow Matching
Learn a velocity field that transports noise to data along straight-line paths. Simpler training than diffusion, faster sampling, and cleaner math.
Prerequisites
Why This Matters
Diffusion models generate stunning samples but come with baggage: complex noise schedules, many sampling steps, and a training objective derived through variational bounds that obscure the underlying geometry. Flow matching strips this away.
The idea is direct: learn a velocity field that pushes a simple distribution (e.g., Gaussian) to the data distribution along smooth paths. Training is a simple regression problem. Sampling is an ODE solve. The math is cleaner, the design choices are fewer, and the framework extends naturally beyond images to molecules, audio, video, and any domain where you can define a continuous path from noise to data.
Flow matching is rapidly becoming the default framework for frontier generative models.
Mental Model
Picture a cloud of Gaussian noise particles and a cloud of data points. Flow matching learns a velocity field, like a wind map, that blows each noise particle to a data point. If you design the paths to be straight lines (the optimal transport choice), each particle takes the shortest route.
Training is simple: pick a random data point , pick a random noise point , define a straight-line path between them, and train a neural network to predict the velocity along that path. At generation time, start from noise and follow the learned velocity field.
Formal Setup and Notation
Let be the source (noise) distribution and be the target (data) distribution. We want to construct a time-dependent probability path for that interpolates between and .
Continuous Normalizing Flow
A continuous normalizing flow (CNF) defines a time-dependent velocity field that generates a flow via the ODE:
The density evolves according to the continuity equation:
If the velocity field transports to , then is a sample from the data distribution.
Flow Matching Objective
The flow matching (FM) objective trains a neural network to match a target velocity field :
The target is the velocity field that generates the desired probability path . The problem: we cannot easily sample from or compute for arbitrary paths.
Conditional Flow Matching
Conditional flow matching (CFM) resolves the intractability by conditioning on individual data points. For a data point , define a conditional path:
The conditional velocity is , which is simply the direction from noise to data. The CFM objective is:
Main Theorems
FM and CFM Objectives Have the Same Gradients
Statement
The flow matching objective and the conditional flow matching objective have identical gradients with respect to :
Therefore, minimizing the tractable CFM objective is equivalent to minimizing the intractable FM objective.
Intuition
The marginal velocity field is the conditional expectation of the conditional velocity fields over all data points that could produce at time . The CFM loss averages the regression error over all conditionals. Because the network does not depend on the conditioning variable , the gradient passes through the expectation and the two objectives agree.
Proof Sketch
Write the FM loss as . Expand the square and note that the cross term equals by the law of total expectation. Since (the marginal velocity is the conditional expectation), the cross terms in FM and CFM losses are equal. The remaining terms either do not depend on or are identical between the two losses.
Why It Matters
This is the key result that makes flow matching practical. Without it, you would need to sample from the marginal , which requires knowing the data distribution. CFM only requires sampling noise-data pairs and interpolating, which is trivial.
Optimal Transport Conditional Paths
Statement
The straight-line conditional path with and generates a marginal velocity field such that the induced probability path continuously interpolates between and . The conditional velocity is constant along each path.
For Gaussian and , the marginal path recovers the displacement interpolation from optimal transport theory.
Intuition
Each noise-data pair is connected by a straight line. The velocity is constant: you move from to at uniform speed. This is the simplest possible path and, in the Gaussian case, it is the optimal one in the sense of minimizing total transport cost.
Proof Sketch
By construction, so . The marginal distribution is the pushforward of the joint distribution of through the map . For Gaussians, this pushforward is Gaussian, and the resulting path matches McCann's displacement interpolation.
Why It Matters
Straight-line paths are the key innovation of conditional flow matching over earlier CNF methods. They make the velocity field as simple as possible (nearly constant in direction, varying mainly in magnitude), which is easy for a neural network to learn. This leads to fast convergence and high-quality generation with fewer sampling steps than diffusion models.
Failure Mode
Straight-line paths can cross when the source and target distributions are very different, leading to multi-valued velocity fields. In practice, this is handled by the neural network averaging over crossings, but it can cause blurriness. Minibatch optimal transport coupling (sorting noise-data pairs to minimize total distance) reduces crossing and improves quality.
Canonical Examples
Flow matching vs. diffusion for image generation
In diffusion, you design a noise schedule , derive the forward process , derive the reverse SDE, compute a variational bound, and train a score network. In flow matching, you define and train to predict . The FM training loop is simpler: sample from data, sample , sample , compute , and regress onto .
Flow matching for molecular generation
To generate 3D molecular conformations, let be atom coordinates. Flow matching transports Gaussian noise to valid molecular geometries. The straight-line paths respect 3D structure without requiring domain-specific noise schedules. Equivariant flow matching adds rotation and translation invariance by using SE(3)-equivariant velocity networks.
Common Confusions
Flow matching is not just rebranded diffusion
While diffusion and flow matching can be derived as special cases of a general framework (stochastic interpolants), they differ in practice. Diffusion uses stochastic differential equations with noise injection at each step; flow matching uses ordinary differential equations with deterministic paths. FM paths are straighter, leading to faster sampling (fewer ODE steps) and simpler training (no noise schedule).
You do not need optimal transport to use flow matching
The straight-line paths in CFM are inspired by optimal transport but do not require solving an OT problem. The connection to OT is exact only for Gaussians. For general distributions, CFM uses the straight-line interpolation as a heuristic that works well in practice. Minibatch OT coupling is an optional improvement, not a requirement.
Summary
- Flow matching learns a velocity field that transports noise to data via an ODE
- Conditional flow matching makes training tractable: regress on the velocity along straight-line paths
- The CFM and FM objectives have identical gradients, so the simple training procedure is theoretically justified
- Straight-line paths connect to optimal transport and yield faster sampling than diffusion (fewer ODE steps)
- The framework is domain-agnostic: images, molecules, audio, video, and any continuous data
Exercises
Problem
Derive the conditional velocity field for the straight-line path . Verify that is recovered at and at .
Problem
Consider a Gaussian source and Gaussian target . Show that the marginal distribution under straight-line interpolation is Gaussian and compute its mean and covariance.
Problem
Explain the path-crossing problem in conditional flow matching. When do straight-line conditional paths cross, and how does minibatch optimal transport coupling mitigate this?
References
Canonical:
- Lipman et al., Flow Matching for Generative Modeling (ICLR 2023)
- Liu et al., Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (ICLR 2023)
Current:
-
Tong et al., Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport (ICML 2024)
-
Esser et al., Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2024)
-
Goodfellow, Bengio, Courville, Deep Learning (2016), Chapters 14-20
-
Zhang et al., Dive into Deep Learning (2023), Chapters 14-17
Next Topics
Flow matching connects to many active research directions:
- Rectified flow for distillation and few-step generation
- Equivariant flows for molecular and protein generation
- Discrete flow matching for text and categorical data
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Diffusion ModelsLayer 4
- Variational AutoencodersLayer 3
- AutoencodersLayer 2
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1
- Matrix Operations and PropertiesLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A