Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Beyond Llms

Flow Matching

Learn a velocity field that transports noise to data along straight-line paths. Simpler training than diffusion, faster sampling, and cleaner math.

ResearchTier 2Frontier~55 min

Prerequisites

0

Why This Matters

Diffusion models generate stunning samples but come with baggage: complex noise schedules, many sampling steps, and a training objective derived through variational bounds that obscure the underlying geometry. Flow matching strips this away.

The idea is direct: learn a velocity field that pushes a simple distribution (e.g., Gaussian) to the data distribution along smooth paths. Training is a simple regression problem. Sampling is an ODE solve. The math is cleaner, the design choices are fewer, and the framework extends naturally beyond images to molecules, audio, video, and any domain where you can define a continuous path from noise to data.

Flow matching is rapidly becoming the default framework for frontier generative models.

Mental Model

Picture a cloud of Gaussian noise particles and a cloud of data points. Flow matching learns a velocity field, like a wind map, that blows each noise particle to a data point. If you design the paths to be straight lines (the optimal transport choice), each particle takes the shortest route.

Training is simple: pick a random data point x1x_1, pick a random noise point x0x_0, define a straight-line path between them, and train a neural network to predict the velocity along that path. At generation time, start from noise and follow the learned velocity field.

Formal Setup and Notation

Let p0=N(0,I)p_0 = \mathcal{N}(0, I) be the source (noise) distribution and p1p_1 be the target (data) distribution. We want to construct a time-dependent probability path ptp_t for t[0,1]t \in [0, 1] that interpolates between p0p_0 and p1p_1.

Definition

Continuous Normalizing Flow

A continuous normalizing flow (CNF) defines a time-dependent velocity field vt:RdRdv_t: \mathbb{R}^d \to \mathbb{R}^d that generates a flow via the ODE:

dxtdt=vt(xt),x0p0\frac{dx_t}{dt} = v_t(x_t), \quad x_0 \sim p_0

The density ptp_t evolves according to the continuity equation:

ptt+(ptvt)=0\frac{\partial p_t}{\partial t} + \nabla \cdot (p_t v_t) = 0

If the velocity field transports p0p_0 to p1p_1, then x1x_1 is a sample from the data distribution.

Definition

Flow Matching Objective

The flow matching (FM) objective trains a neural network vθv_\theta to match a target velocity field utu_t:

LFM(θ)=EtU[0,1],xtptvθ(t,xt)ut(xt)2\mathcal{L}_{\text{FM}}(\theta) = \mathbb{E}_{t \sim U[0,1], x_t \sim p_t} \|v_\theta(t, x_t) - u_t(x_t)\|^2

The target utu_t is the velocity field that generates the desired probability path ptp_t. The problem: we cannot easily sample from ptp_t or compute utu_t for arbitrary paths.

Definition

Conditional Flow Matching

Conditional flow matching (CFM) resolves the intractability by conditioning on individual data points. For a data point x1p1x_1 \sim p_1, define a conditional path:

xt=(1t)x0+tx1,x0p0x_t = (1 - t) x_0 + t x_1, \quad x_0 \sim p_0

The conditional velocity is ut(xtx1)=x1x0u_t(x_t \mid x_1) = x_1 - x_0, which is simply the direction from noise to data. The CFM objective is:

LCFM(θ)=Et,x0,x1vθ(t,xt)(x1x0)2\mathcal{L}_{\text{CFM}}(\theta) = \mathbb{E}_{t, x_0, x_1} \|v_\theta(t, x_t) - (x_1 - x_0)\|^2

Main Theorems

Theorem

FM and CFM Objectives Have the Same Gradients

Statement

The flow matching objective LFM(θ)\mathcal{L}_{\text{FM}}(\theta) and the conditional flow matching objective LCFM(θ)\mathcal{L}_{\text{CFM}}(\theta) have identical gradients with respect to θ\theta:

θLFM(θ)=θLCFM(θ)\nabla_\theta \mathcal{L}_{\text{FM}}(\theta) = \nabla_\theta \mathcal{L}_{\text{CFM}}(\theta)

Therefore, minimizing the tractable CFM objective is equivalent to minimizing the intractable FM objective.

Intuition

The marginal velocity field ut(x)u_t(x) is the conditional expectation of the conditional velocity fields over all data points that could produce xx at time tt. The CFM loss averages the regression error over all conditionals. Because the network vθv_\theta does not depend on the conditioning variable x1x_1, the gradient passes through the expectation and the two objectives agree.

Proof Sketch

Write the FM loss as Et,xtvθut2\mathbb{E}_{t, x_t} \|v_\theta - u_t\|^2. Expand the square and note that the cross term Ext[vθ(t,xt)Tut(xt)]\mathbb{E}_{x_t}[v_\theta(t, x_t)^T u_t(x_t)] equals Ex1Extx1[vθ(t,xt)Tut(xtx1)]\mathbb{E}_{x_1}\mathbb{E}_{x_t|x_1}[v_\theta(t, x_t)^T u_t(x_t | x_1)] by the law of total expectation. Since ut(xt)=E[ut(xtx1)xt]u_t(x_t) = \mathbb{E}[u_t(x_t|x_1) | x_t] (the marginal velocity is the conditional expectation), the cross terms in FM and CFM losses are equal. The remaining terms either do not depend on θ\theta or are identical between the two losses.

Why It Matters

This is the key result that makes flow matching practical. Without it, you would need to sample from the marginal ptp_t, which requires knowing the data distribution. CFM only requires sampling noise-data pairs and interpolating, which is trivial.

Proposition

Optimal Transport Conditional Paths

Statement

The straight-line conditional path xt=(1t)x0+tx1x_t = (1 - t)x_0 + tx_1 with x0N(0,I)x_0 \sim \mathcal{N}(0, I) and x1p1x_1 \sim p_1 generates a marginal velocity field utu_t such that the induced probability path ptp_t continuously interpolates between p0p_0 and p1p_1. The conditional velocity ut(xtx1)=x1x0u_t(x_t \mid x_1) = x_1 - x_0 is constant along each path.

For Gaussian p0p_0 and p1p_1, the marginal path ptp_t recovers the displacement interpolation from optimal transport theory.

Intuition

Each noise-data pair is connected by a straight line. The velocity is constant: you move from x0x_0 to x1x_1 at uniform speed. This is the simplest possible path and, in the Gaussian case, it is the optimal one in the sense of minimizing total transport cost.

Proof Sketch

By construction, xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1 so dxt/dt=x1x0dx_t/dt = x_1 - x_0. The marginal distribution ptp_t is the pushforward of the joint distribution of (x0,x1)(x_0, x_1) through the map (x0,x1)(1t)x0+tx1(x_0, x_1) \mapsto (1-t)x_0 + tx_1. For Gaussians, this pushforward is Gaussian, and the resulting path pt=N((1t)μ0+tμ1,)p_t = \mathcal{N}((1-t)\mu_0 + t\mu_1, \ldots) matches McCann's displacement interpolation.

Why It Matters

Straight-line paths are the key innovation of conditional flow matching over earlier CNF methods. They make the velocity field as simple as possible (nearly constant in direction, varying mainly in magnitude), which is easy for a neural network to learn. This leads to fast convergence and high-quality generation with fewer sampling steps than diffusion models.

Failure Mode

Straight-line paths can cross when the source and target distributions are very different, leading to multi-valued velocity fields. In practice, this is handled by the neural network averaging over crossings, but it can cause blurriness. Minibatch optimal transport coupling (sorting noise-data pairs to minimize total distance) reduces crossing and improves quality.

Canonical Examples

Example

Flow matching vs. diffusion for image generation

In diffusion, you design a noise schedule βt\beta_t, derive the forward process q(xtx0)q(x_t | x_0), derive the reverse SDE, compute a variational bound, and train a score network. In flow matching, you define xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1 and train vθ(t,xt)v_\theta(t, x_t) to predict x1x0x_1 - x_0. The FM training loop is simpler: sample x1x_1 from data, sample x0N(0,I)x_0 \sim \mathcal{N}(0,I), sample tU[0,1]t \sim U[0,1], compute xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1, and regress vθ(t,xt)v_\theta(t, x_t) onto x1x0x_1 - x_0.

Example

Flow matching for molecular generation

To generate 3D molecular conformations, let x1x_1 be atom coordinates. Flow matching transports Gaussian noise to valid molecular geometries. The straight-line paths respect 3D structure without requiring domain-specific noise schedules. Equivariant flow matching adds rotation and translation invariance by using SE(3)-equivariant velocity networks.

Common Confusions

Watch Out

Flow matching is not just rebranded diffusion

While diffusion and flow matching can be derived as special cases of a general framework (stochastic interpolants), they differ in practice. Diffusion uses stochastic differential equations with noise injection at each step; flow matching uses ordinary differential equations with deterministic paths. FM paths are straighter, leading to faster sampling (fewer ODE steps) and simpler training (no noise schedule).

Watch Out

You do not need optimal transport to use flow matching

The straight-line paths in CFM are inspired by optimal transport but do not require solving an OT problem. The connection to OT is exact only for Gaussians. For general distributions, CFM uses the straight-line interpolation as a heuristic that works well in practice. Minibatch OT coupling is an optional improvement, not a requirement.

Summary

  • Flow matching learns a velocity field vθ(t,x)v_\theta(t, x) that transports noise p0p_0 to data p1p_1 via an ODE
  • Conditional flow matching makes training tractable: regress on the velocity x1x0x_1 - x_0 along straight-line paths xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1
  • The CFM and FM objectives have identical gradients, so the simple training procedure is theoretically justified
  • Straight-line paths connect to optimal transport and yield faster sampling than diffusion (fewer ODE steps)
  • The framework is domain-agnostic: images, molecules, audio, video, and any continuous data

Exercises

ExerciseCore

Problem

Derive the conditional velocity field for the straight-line path xt=(1t)x0+tx1x_t = (1-t)x_0 + tx_1. Verify that x0x_0 is recovered at t=0t = 0 and x1x_1 at t=1t = 1.

ExerciseAdvanced

Problem

Consider a Gaussian source p0=N(0,I)p_0 = \mathcal{N}(0, I) and Gaussian target p1=N(μ,Σ)p_1 = \mathcal{N}(\mu, \Sigma). Show that the marginal distribution ptp_t under straight-line interpolation is Gaussian and compute its mean and covariance.

ExerciseResearch

Problem

Explain the path-crossing problem in conditional flow matching. When do straight-line conditional paths cross, and how does minibatch optimal transport coupling mitigate this?

References

Canonical:

  • Lipman et al., Flow Matching for Generative Modeling (ICLR 2023)
  • Liu et al., Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (ICLR 2023)

Current:

  • Tong et al., Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport (ICML 2024)

  • Esser et al., Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2024)

  • Goodfellow, Bengio, Courville, Deep Learning (2016), Chapters 14-20

  • Zhang et al., Dive into Deep Learning (2023), Chapters 14-17

Next Topics

Flow matching connects to many active research directions:

  • Rectified flow for distillation and few-step generation
  • Equivariant flows for molecular and protein generation
  • Discrete flow matching for text and categorical data

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.