Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Integration and Change of Variables

Riemann integration, improper integrals, the substitution rule, multivariate change of variables via the Jacobian determinant, and Fubini theorem. The computational backbone of probability and ML.

CoreTier 2Stable~40 min
0

Why This Matters

Integration is the computational engine of probability and statistics. Every expectation, every marginal distribution, every normalizing constant, and every Bayesian posterior requires evaluating an integral. The change-of-variables formula is what allows you to transform distributions (e.g., from a Gaussian to any other distribution via a smooth map). Fubini's theorem is what lets you compute multivariate integrals by iterating single-variable integrals.

Riemann Integral Review

Definition

Riemann Integral

For a bounded function f:[a,b]Rf: [a, b] \to \mathbb{R}, the Riemann integral abf(x)dx\int_a^b f(x) \, dx is defined as the limit of Riemann sums:

abf(x)dx=limni=1nf(xi)Δxi\int_a^b f(x) \, dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x_i

where {x0,x1,,xn}\{x_0, x_1, \ldots, x_n\} is a partition of [a,b][a, b], xix_i^* is a sample point in [xi1,xi][x_{i-1}, x_i], and Δxi=xixi1\Delta x_i = x_i - x_{i-1}. The limit exists (and is independent of the choice of partitions and sample points) when ff is Riemann integrable. Every continuous function on [a,b][a, b] is Riemann integrable.

Improper Integrals

Definition

Improper Integral

When the domain is unbounded or the integrand is unbounded, define the integral as a limit:

af(x)dx=limRaRf(x)dx\int_a^\infty f(x) \, dx = \lim_{R \to \infty} \int_a^R f(x) \, dx

The integral converges if this limit exists and is finite. Example: the Gaussian normalizing constant ex2/2dx=2π\int_{-\infty}^{\infty} e^{-x^2/2} dx = \sqrt{2\pi} is an improper integral that converges.

Improper integrals arise constantly in ML: the normalization of probability density functions, expectations over unbounded domains, and integrals involving heavy-tailed distributions.

Change of Variables (One Dimension)

Definition

Substitution Rule

If g:[a,b]Rg: [a, b] \to \mathbb{R} is C1C^1 and ff is continuous, then:

abf(g(x))g(x)dx=g(a)g(b)f(u)du\int_a^b f(g(x)) g'(x) \, dx = \int_{g(a)}^{g(b)} f(u) \, du

This is the substitution u=g(x)u = g(x), du=g(x)dxdu = g'(x) \, dx.

Multivariate Change of Variables

Theorem

Change of Variables Formula

Statement

Let ϕ:UV\phi: U \to V be a C1C^1 diffeomorphism between open subsets U,VRnU, V \subseteq \mathbb{R}^n. For any integrable function f:VRf: V \to \mathbb{R}:

Vf(y)dy=Uf(ϕ(x))detDϕ(x)dx\int_V f(y) \, dy = \int_U f(\phi(x)) \, |\det D\phi(x)| \, dx

where Dϕ(x)D\phi(x) is the n×nn \times n Jacobian matrix of ϕ\phi at xx, and detDϕ(x)|\det D\phi(x)| is the absolute value of its determinant.

Intuition

The Jacobian determinant measures how ϕ\phi stretches or compresses volume. A small cube of volume dVdV at xx maps to a region of approximate volume detDϕ(x)dV|\det D\phi(x)| \, dV at ϕ(x)\phi(x). The formula says: to integrate over the image, integrate over the preimage and multiply by this volume scaling factor.

Proof Sketch

For a linear map ϕ(x)=Ax\phi(x) = Ax, the result follows from the definition of the determinant as the volume scaling factor. For nonlinear ϕ\phi, approximate it locally by its linearization Dϕ(x)D\phi(x) on small cubes, apply the linear result, and sum. The rigorous proof uses the Lebesgue measure and approximation by simple functions.

Why It Matters

This formula is used everywhere in ML and statistics:

  1. Probability: if XX has density pXp_X and Y=ϕ(X)Y = \phi(X), then pY(y)=pX(ϕ1(y))detDϕ1(y)p_Y(y) = p_X(\phi^{-1}(y)) \cdot |\det D\phi^{-1}(y)|
  2. Normalizing flows: the log-likelihood involves logdetDϕ(x)\log |\det D\phi(x)|, and flow architectures are designed to make this determinant cheap to compute
  3. Bayesian inference: computing posteriors requires integrating over parameter spaces, often after a change of variables

Failure Mode

The formula requires ϕ\phi to be a diffeomorphism (smooth with smooth inverse). If ϕ\phi is not injective, you must partition the domain into regions where it is injective and sum the contributions. If detDϕ=0\det D\phi = 0 at some points (critical points), the formula still holds but the contribution from those points is zero.

Example

Polar coordinates

The transformation ϕ(r,θ)=(rcosθ,rsinθ)\phi(r, \theta) = (r\cos\theta, r\sin\theta) maps (0,)×[0,2π)(0, \infty) \times [0, 2\pi) to R2{0}\mathbb{R}^2 \setminus \{0\}. The Jacobian:

Dϕ=(cosθrsinθsinθrcosθ),detDϕ=rD\phi = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \quad |\det D\phi| = r

So R2f(x,y)dxdy=02π0f(rcosθ,rsinθ)rdrdθ\int_{\mathbb{R}^2} f(x, y) \, dx \, dy = \int_0^{2\pi} \int_0^\infty f(r\cos\theta, r\sin\theta) \, r \, dr \, d\theta.

This is how you compute the Gaussian integral: R2e(x2+y2)/2dxdy=02π0er2/2rdrdθ=2π\int_{\mathbb{R}^2} e^{-(x^2+y^2)/2} dx \, dy = \int_0^{2\pi} \int_0^\infty e^{-r^2/2} r \, dr \, d\theta = 2\pi.

Fubini's Theorem

Theorem

Fubini's Theorem

Statement

If f:X×YRf: X \times Y \to \mathbb{R} is integrable (i.e., X×Yf(x,y)d(x,y)<\int_{X \times Y} |f(x, y)| \, d(x, y) < \infty), then:

X×Yf(x,y)d(x,y)=X(Yf(x,y)dy)dx=Y(Xf(x,y)dx)dy\int_{X \times Y} f(x, y) \, d(x, y) = \int_X \left(\int_Y f(x, y) \, dy\right) dx = \int_Y \left(\int_X f(x, y) \, dx\right) dy

The order of integration can be swapped.

Intuition

If the total integral is finite, you can compute a double integral by integrating one variable at a time, in either order. This is what makes multivariate integration tractable: you reduce it to a sequence of one-dimensional integrals.

Proof Sketch

The proof uses the monotone convergence theorem and the construction of product measures. For non-negative functions, the result follows from Tonelli's theorem (which does not require integrability, only measurability and non-negativity). Fubini extends this to signed functions by decomposing into positive and negative parts.

Why It Matters

Fubini's theorem is the justification for: (1) computing marginal distributions by integrating out variables, (2) switching the order of expectation and summation, (3) computing normalizing constants by iterated integration, and (4) the tower property of conditional expectation.

Failure Mode

The integrability condition is necessary. If f(x,y)d(x,y)=\int |f(x,y)| \, d(x,y) = \infty, the iterated integrals may exist but give different values depending on the order of integration. The classic counterexample uses f(x,y)=(x2y2)/(x2+y2)2f(x,y) = (x^2 - y^2)/(x^2 + y^2)^2 on [0,1]2[0,1]^2.

Applications in ML

Marginalizing Distributions

Given a joint density p(x,y)p(x, y), the marginal density of xx is:

p(x)=p(x,y)dyp(x) = \int p(x, y) \, dy

This uses Fubini to reduce a multivariate integral to a single-variable one.

Computing Normalizing Constants

A density p(x)=1Zp~(x)p(x) = \frac{1}{Z} \tilde{p}(x) where Z=p~(x)dxZ = \int \tilde{p}(x) \, dx. In Bayesian inference, computing ZZ (the evidence) often requires a change of variables to make the integral tractable.

Normalizing Flows

A normalizing flow transforms a simple base distribution pZ(z)p_Z(z) through a diffeomorphism ff to get pX(x)=pZ(f1(x))detDf1(x)p_X(x) = p_Z(f^{-1}(x)) \cdot |\det Df^{-1}(x)|. The change-of-variables formula makes this exact.

Common Confusions

Watch Out

The Jacobian determinant is the absolute value

In the change-of-variables formula for integrals, you use detDϕ|\det D\phi|, not detDϕ\det D\phi. The absolute value ensures the integral is non-negative regardless of whether the transformation preserves or reverses orientation. For probability density transformations, forgetting the absolute value gives wrong densities.

Watch Out

Fubini requires integrability, Tonelli does not

Tonelli's theorem (for non-negative functions) allows you to swap integration order without checking integrability first. This is useful because you can establish integrability by computing the iterated integral. Fubini applies to signed functions but requires you to verify integrability of f|f| first.

Summary

  • Substitution: f(g(x))g(x)dx=f(u)du\int f(g(x))g'(x) \, dx = \int f(u) \, du with u=g(x)u = g(x)
  • Multivariate change of variables: multiply by detDϕ(x)|\det D\phi(x)| when transforming coordinates
  • The Jacobian determinant measures local volume change
  • Fubini: swap integration order when f<\int |f| < \infty
  • These tools compute expectations, marginals, normalizing constants, and flow densities

Exercises

ExerciseCore

Problem

Compute 0xex2/2dx\int_0^\infty x e^{-x^2/2} \, dx using the substitution u=x2/2u = x^2/2.

ExerciseAdvanced

Problem

Let XN(0,1)X \sim \mathcal{N}(0, 1) and Y=eXY = e^X (so YY is log-normal). Use the change-of-variables formula to derive the density of YY.

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapters 6 and 10
  • Folland, Real Analysis (1999), Chapter 2 (Lebesgue integration and product measures)
  • Apostol, Mathematical Analysis (1974), Chapters 10-11 (Riemann integration and multivariable change of variables)

Current:

  • Kobyzev et al., "Normalizing Flows: An Introduction and Review of Current Methods" (2021). Change-of-variables in deep learning.
  • Billingsley, Probability and Measure (1995), Chapter 3 (integration and Fubini's theorem in measure-theoretic context)
  • Spivak, Calculus on Manifolds (1965), Chapter 3 (integration on R^n and the change-of-variables formula)

Next Topics

Last reviewed: April 2026

Next Topics