ML Methods
Normalizing Flows
Generative models that transform a simple base distribution through invertible mappings, enabling exact log-likelihood computation via the change of variables formula.
Prerequisites
Why This Matters
Normalizing flows are the only deep generative model family that provides exact, tractable log-likelihoods without variational approximations or adversarial training. This makes them theoretically clean: you know exactly what you are optimizing. They also provide exact sampling and exact density evaluation in one model. However, flows have largely been displaced by diffusion models for image generation because of architectural constraints imposed by invertibility.
Understanding flows is still valuable: they clarify what you gain and lose by requiring invertibility, and the change-of-variables formula underlies many other methods including continuous normalizing flows and flow matching.
The Core Idea
Start with a simple base distribution . Apply a sequence of invertible, differentiable transformations to get . The density of is determined by the change of variables formula.
Normalizing Flow
A normalizing flow is a sequence of invertible transformations mapping a base distribution to a target distribution . "Normalizing" refers to the change of variables that ensures the transformed density integrates to 1. "Flow" refers to the successive transformations that warp the density.
The Change of Variables Formula
Change of Variables for Normalizing Flows
Statement
If where is a diffeomorphism and , then:
For a composition :
where and .
Intuition
The Jacobian determinant measures how much locally stretches or compresses volume. If expands a region by factor 10, the density in that region must decrease by factor 10 to keep the total probability at 1. The log-determinant accounts for this volume change.
Proof Sketch
Start from the requirement . Substitute , so . Then , giving . Take logs.
Why It Matters
This is the entire basis of normalizing flows. Unlike VAEs (which optimize a lower bound) or GANs (which use adversarial training), flows optimize the exact log-likelihood directly. No approximation, no mode collapse, no posterior gap. The cost is that you must design so that both and are tractable to compute.
Failure Mode
Computing for a general matrix costs . For high-dimensional data (images with ), this is prohibitive unless the Jacobian has special structure (triangular, block-diagonal, etc.). This architectural constraint is the central limitation of flows.
Architectural Solutions
Coupling Layers (RealNVP)
Coupling Layer Jacobian is Triangular
Statement
For a coupling layer that splits and computes:
where and are arbitrary neural networks, the Jacobian is lower triangular with determinant:
This costs to compute, not .
Intuition
Since (identity), the top-left block of the Jacobian is . Since depends on only through and , the off-diagonal block structure makes the Jacobian triangular. The determinant of a triangular matrix is the product of diagonal entries.
Proof Sketch
Write the full Jacobian in block form: . The determinant of a block-triangular matrix is the product of the determinants of the diagonal blocks: .
Why It Matters
This is the key architectural trick that makes flows practical. The networks and can be arbitrarily complex (deep ResNets, attention layers) without affecting the cost of the log-determinant computation. Expressiveness comes from stacking many coupling layers with alternating partitions.
Failure Mode
A single coupling layer leaves half the dimensions unchanged. You need to alternate which dimensions are "active" across layers. With poor alternation patterns, some dimensions may never interact, limiting expressiveness.
Autoregressive Flows
Autoregressive flows (MAF, IAF) use the autoregressive property: dimension depends only on . The Jacobian is triangular by construction.
MAF (Masked Autoregressive Flow): fast density evaluation (parallel), slow sampling (sequential, one dimension at a time).
IAF (Inverse Autoregressive Flow): fast sampling (parallel), slow density evaluation. The inverse of MAF.
The tradeoff between MAF and IAF is a direct consequence of the asymmetry between forward and inverse passes in autoregressive models.
Why Flows Lost to Diffusion
Flows require exact invertibility, which constrains architecture: input and output must have the same dimensionality, and every layer must be invertible. This prevents using standard architectures (U-Nets, standard ResNets). Diffusion models avoid this by learning a denoising process that does not require invertibility, allowing more expressive architectures. The result: diffusion models achieve better sample quality on images with simpler training procedures.
Flows remain useful for density estimation, variational inference (as flexible posterior approximations), and physics simulations where exact likelihood matters.
Common Confusions
Flows are not just fancy coordinate transforms
While each layer is a coordinate transformation, the composition of many layers with learned parameters can represent highly complex distributions. The universal approximation results for flows (Huang et al., 2018) show that sufficiently deep flows can approximate any target density.
The base distribution choice matters less than you think
A standard Gaussian base is used in nearly all flow models. The flow layers are expressive enough to warp any unimodal base into a complex multimodal target. Using a more complex base distribution rarely helps in practice.
Key Takeaways
- Flows compute exact log-likelihoods via the change of variables formula
- The computational bottleneck is the Jacobian determinant, which costs in general but with coupling or autoregressive structure
- Coupling layers (RealNVP) let and be arbitrary networks while keeping the determinant tractable
- MAF is fast for density evaluation; IAF is fast for sampling
- Diffusion models displaced flows for image generation because invertibility constrains architecture, but flows remain valuable where exact likelihood is needed
Exercises
Problem
Write the change of variables formula for a 1D normalizing flow where . What is ?
Problem
A coupling layer splits as and . The scale network outputs and the translation network outputs . Compute the output and the log-determinant of the Jacobian when .
References
Canonical:
- Dinh et al., "Density estimation using Real-NVP" (2017), Section 3
- Rezende & Mohamed, "Variational Inference with Normalizing Flows" (2015), Section 3
Current:
-
Papamakarios et al., "Normalizing Flows for Probabilistic Modeling and Inference" (2021), Chapters 3-4
-
Kobyzev et al., "Normalizing Flows: An Introduction and Review" (2020)
-
Bishop, Pattern Recognition and Machine Learning (2006), Chapters 1-14
Next Topics
- Diffusion models: the generative paradigm that displaced flows
- Energy-based models: density modeling without normalization
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- The Jacobian MatrixLayer 0A