Flow Matching

Sneiderman, Robby

Beyond LLMS

Flow Matching

Learn a velocity field that transports noise to data along straight-line paths. Simpler training than diffusion, faster sampling, and cleaner math.

ResearchTier 2FrontierFrontier watch~55 min

Prerequisites

Diffusion Models Ito Lemma Pde Fundamentals for ML

Prereq Map

Learning position

Read this page in the graph.

beyond-llms | layer 4 | tier 2. This page has 3 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Take the diagnostic

No published continuation is declared yet, so the diagnostic is the clean next route.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Diffusion models generate stunning samples but come with baggage: complex noise schedules, many sampling steps, and a training objective derived through variational bounds that obscure the underlying geometry. Flow matching strips this away.

The idea is direct: learn a velocity field that pushes a simple distribution (e.g., Gaussian) to the data distribution along smooth paths. Training is a simple regression problem. Sampling is an ODE solve. The math is cleaner, the design choices are fewer, and the framework extends naturally beyond images to molecules, audio, video, and any domain where you can define a continuous path from noise to data.

Flow matching is rapidly becoming the default framework for frontier generative models.

Mental Model

Picture a cloud of Gaussian noise particles and a cloud of data points. Flow matching learns a velocity field, like a wind map, that blows each noise particle to a data point. If you design the conditional paths between a fixed pair $(x_0, x_1)$ to be straight lines, each pair takes the shortest route between those two endpoints. This is not the same as the Wasserstein-optimal coupling between the noise and data distributions: under the standard independent coupling, the marginal flow generally differs from McCann displacement interpolation. Recovering true OT requires choosing the coupling (e.g., minibatch OT or rectified flows), not just the conditional path.

Training is simple: pick a random data point $x_1$ , pick a random noise point $x_0$ , define a straight-line path between them, and train a neural network to predict the velocity along that path. At generation time, start from noise and follow the learned velocity field.

Formal Setup and Notation

Let $p_0 = \mathcal{N}(0, I)$ be the source (noise) distribution and $p_1$ be the target (data) distribution. We want to construct a time-dependent probability path $p_t$ for $t \in [0, 1]$ that interpolates between $p_0$ and $p_1$ .

Definition

Continuous Normalizing Flow

A continuous normalizing flow (CNF) defines a time-dependent velocity field $v_t: \mathbb{R}^d \to \mathbb{R}^d$ that generates a flow via the ODE:

$\frac{dx_t}{dt} = v_t(x_t), \quad x_0 \sim p_0$

The density $p_t$ evolves according to the continuity equation:

$\frac{\partial p_t}{\partial t} + \nabla \cdot (p_t v_t) = 0$

If the velocity field transports $p_0$ to $p_1$ , then $x_1$ is a sample from the data distribution.

Definition

Flow Matching Objective

The flow matching (FM) objective trains a neural network $v_\theta$ to match a target velocity field $u_t$ :

$\mathcal{L}_{\text{FM}}(\theta) = \mathbb{E}_{t \sim U[0,1], x_t \sim p_t} \|v_\theta(t, x_t) - u_t(x_t)\|^2$

The target $u_t$ is the velocity field that generates the desired probability path $p_t$ . The problem: we cannot easily sample from $p_t$ or compute $u_t$ for arbitrary paths.

Definition

Conditional Flow Matching

Conditional flow matching (CFM) resolves the intractability by conditioning on individual data points. For a data point $x_1 \sim p_1$ , define a conditional path:

$x_t = (1 - t) x_0 + t x_1, \quad x_0 \sim p_0$

The conditional velocity is $u_t(x_t \mid x_1) = x_1 - x_0$ , which is simply the direction from noise to data. The CFM objective is:

$\mathcal{L}_{\text{CFM}}(\theta) = \mathbb{E}_{t, x_0, x_1} \|v_\theta(t, x_t) - (x_1 - x_0)\|^2$

Main Theorems

Theorem

FM and CFM Objectives Have the Same Gradients

Statement

The flow matching objective $\mathcal{L}_{\text{FM}}(\theta)$ and the conditional flow matching objective $\mathcal{L}_{\text{CFM}}(\theta)$ have identical gradients with respect to $\theta$ :

$\nabla_\theta \mathcal{L}_{\text{FM}}(\theta) = \nabla_\theta \mathcal{L}_{\text{CFM}}(\theta)$

Therefore, minimizing the tractable CFM objective is equivalent to minimizing the intractable FM objective.

Intuition

The marginal velocity field $u_t(x)$ is the conditional expectation of the conditional velocity fields over all data points that could produce $x$ at time $t$ . The CFM loss averages the regression error over all conditionals. Because the network $v_\theta$ does not depend on the conditioning variable $x_1$ , the gradient passes through the expectation and the two objectives agree.

Proof Sketch

Write the FM loss as $\mathbb{E}_{t, x_t} \|v_\theta - u_t\|^2$ . Expand the square and note that the cross term $\mathbb{E}_{x_t}[v_\theta(t, x_t)^T u_t(x_t)]$ equals $\mathbb{E}_{x_1}\mathbb{E}_{x_t|x_1}[v_\theta(t, x_t)^T u_t(x_t | x_1)]$ by the law of total expectation. Since $u_t(x_t) = \mathbb{E}[u_t(x_t|x_1) | x_t]$ (the marginal velocity is the conditional expectation), the cross terms in FM and CFM losses are equal. The remaining terms either do not depend on $\theta$ or are identical between the two losses.

Why It Matters

This is the key result that makes flow matching practical. Without it, you would need to sample from the marginal $p_t$ , which requires knowing the data distribution. CFM only requires sampling noise-data pairs and interpolating, which is trivial.

report a correction →

Proposition

Straight-Line Conditional Paths

Statement

The straight-line conditional path $x_t = (1 - t)x_0 + tx_1$ with $x_0 \sim \mathcal{N}(0, I)$ and $x_1 \sim p_1$ generates a marginal velocity field $u_t$ such that the induced probability path $p_t$ continuously interpolates between $p_0$ and $p_1$ . The conditional velocity $u_t(x_t \mid x_1) = x_1 - x_0$ is constant along each path.

The marginal $p_t$ generally does not coincide with the Wasserstein-2 displacement interpolation between $p_0$ and $p_1$ , because the coupling used here is independent rather than the OT coupling. Even for two Gaussians $\mathcal{N}(0, I)$ and $\mathcal{N}(\mu, \Sigma)$ , the independent coupling gives covariance $(1-t)^2 I + t^2 \Sigma$ , whereas McCann's displacement interpolation gives $((1-t) I + t \Sigma^{1/2})^2$ (or its Bures analogue). The two paths agree only in degenerate cases such as $\Sigma = I$ with $\mu = 0$ .

Intuition

Each noise-data pair is connected by a straight line. The velocity is constant: you move from $x_0$ to $x_1$ at uniform speed. This is the simplest per-pair path, but the marginal flow over all pairs is not optimal in the Wasserstein sense unless the noise-data coupling is itself the OT coupling (achieved approximately by rectified flow or minibatch OT).

Proof Sketch

By construction, $x_t = (1-t)x_0 + tx_1$ so $dx_t/dt = x_1 - x_0$ . The marginal distribution $p_t$ is the pushforward of the joint distribution of $(x_0, x_1)$ through the map $(x_0, x_1) \mapsto (1-t)x_0 + tx_1$ . For independent Gaussians this pushforward is Gaussian with covariance $(1-t)^2 \Sigma_0 + t^2 \Sigma_1$ , which is not the Bures geodesic between the two Gaussians (the latter requires an OT-coupled joint distribution).

Why It Matters

Straight-line paths are the key innovation of conditional flow matching over earlier CNF methods. They make the velocity field as simple as possible (nearly constant in direction, varying mainly in magnitude), which is easy for a neural network to learn. This leads to fast convergence and high-quality generation with fewer sampling steps than diffusion models.

Failure Mode

Straight-line paths can cross when the source and target distributions are very different, leading to multi-valued velocity fields. In practice, this is handled by the neural network averaging over crossings, but it can cause blurriness. Minibatch optimal transport coupling (sorting noise-data pairs to minimize total distance) reduces crossing and improves quality.

report a correction →

Canonical Examples

Example

Flow matching vs. diffusion for image generation

In diffusion, you design a noise schedule $\beta_t$ , derive the forward process $q(x_t | x_0)$ , derive the reverse SDE, compute a variational bound, and train a score network. In flow matching, you define $x_t = (1-t)x_0 + tx_1$ and train $v_\theta(t, x_t)$ to predict $x_1 - x_0$ . The FM training loop is simpler: sample $x_1$ from data, sample $x_0 \sim \mathcal{N}(0,I)$ , sample $t \sim U[0,1]$ , compute $x_t = (1-t)x_0 + tx_1$ , and regress $v_\theta(t, x_t)$ onto $x_1 - x_0$ .

Example

Flow matching for molecular generation

To generate 3D molecular conformations, let $x_1$ be atom coordinates. Flow matching transports Gaussian noise to valid molecular geometries. The straight-line paths respect 3D structure without requiring domain-specific noise schedules. Equivariant flow matching adds rotation and translation invariance by using SE(3)-equivariant velocity networks.

Common Confusions

Watch Out

Flow matching is not just rebranded diffusion

While diffusion and flow matching can be derived as special cases of a general framework (stochastic interpolants), they differ in practice. Diffusion uses stochastic differential equations with noise injection at each step; flow matching uses ordinary differential equations with deterministic paths. FM paths are straighter, leading to faster sampling (fewer ODE steps) and simpler training (no noise schedule).

Watch Out

You do not need optimal transport to use flow matching

The straight-line paths in CFM are inspired by optimal transport but do not require solving an OT problem. The connection to OT is exact only for Gaussians. For general distributions, CFM uses the straight-line interpolation as a heuristic that works well in practice. Minibatch OT coupling is an optional improvement, not a requirement.

Summary

Flow matching learns a velocity field $v_\theta(t, x)$ that transports noise $p_0$ to data $p_1$ via an ODE
Conditional flow matching makes training tractable: regress on the velocity $x_1 - x_0$ along straight-line paths $x_t = (1-t)x_0 + tx_1$
The CFM and FM objectives have identical gradients, so the simple training procedure is theoretically justified
Straight-line paths connect to optimal transport and yield faster sampling than diffusion (fewer ODE steps)
The framework is domain-agnostic: images, molecules, audio, video, and any continuous data

Exercises

ExerciseCore

Problem

Derive the conditional velocity field for the straight-line path $x_t = (1-t)x_0 + tx_1$ . Verify that $x_0$ is recovered at $t = 0$ and $x_1$ at $t = 1$ .

ExerciseAdvanced

Problem

Consider a Gaussian source $p_0 = \mathcal{N}(0, I)$ and Gaussian target $p_1 = \mathcal{N}(\mu, \Sigma)$ . Show that the marginal distribution $p_t$ under straight-line interpolation is Gaussian and compute its mean and covariance.

ExerciseResearch

Problem

Explain the path-crossing problem in conditional flow matching. When do straight-line conditional paths cross, and how does minibatch optimal transport coupling mitigate this?

References

Canonical:

Lipman et al., Flow Matching for Generative Modeling (ICLR 2023)
Liu et al., Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow (ICLR 2023)

Current:

Tong et al., Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport (ICML 2024)
Esser et al., Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2024)
Goodfellow, Bengio, Courville, Deep Learning (2016), Chapters 14-20
Zhang et al., Dive into Deep Learning (2023), Chapters 14-17

Next Topics

Flow matching connects to many active research directions:

Rectified flow for distillation and few-step generation
Equivariant flows for molecular and protein generation
Discrete flow matching for text and categorical data

Last reviewed: May 27, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Diffusion Modelslayer 4 · tier 1
PDE Fundamentals for Machine Learninglayer 1 · tier 2
Ito's Lemmalayer 3 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.