Foundations
Integration and Change of Variables
Riemann integration, improper integrals, the substitution rule, multivariate change of variables via the Jacobian determinant, and Fubini theorem. The computational backbone of probability and ML.
Why This Matters
Integration is the computational engine of probability and statistics. Every expectation, every marginal distribution, every normalizing constant, and every Bayesian posterior requires evaluating an integral. The change-of-variables formula is what allows you to transform distributions (e.g., from a Gaussian to any other distribution via a smooth map). Fubini's theorem is what lets you compute multivariate integrals by iterating single-variable integrals.
Riemann Integral Review
Riemann Integral
For a bounded function , the Riemann integral is defined as the limit of Riemann sums:
where is a partition of , is a sample point in , and . The limit exists (and is independent of the choice of partitions and sample points) when is Riemann integrable. Every continuous function on is Riemann integrable.
Improper Integrals
Improper Integral
When the domain is unbounded or the integrand is unbounded, define the integral as a limit:
The integral converges if this limit exists and is finite. Example: the Gaussian normalizing constant is an improper integral that converges.
Improper integrals arise constantly in ML: the normalization of probability density functions, expectations over unbounded domains, and integrals involving heavy-tailed distributions.
Change of Variables (One Dimension)
Substitution Rule
If is and is continuous, then:
This is the substitution , .
Multivariate Change of Variables
Change of Variables Formula
Statement
Let be a diffeomorphism between open subsets . For any integrable function :
where is the Jacobian matrix of at , and is the absolute value of its determinant.
Intuition
The Jacobian determinant measures how stretches or compresses volume. A small cube of volume at maps to a region of approximate volume at . The formula says: to integrate over the image, integrate over the preimage and multiply by this volume scaling factor.
Proof Sketch
For a linear map , the result follows from the definition of the determinant as the volume scaling factor. For nonlinear , approximate it locally by its linearization on small cubes, apply the linear result, and sum. The rigorous proof uses the Lebesgue measure and approximation by simple functions.
Why It Matters
This formula is used everywhere in ML and statistics:
- Probability: if has density and , then
- Normalizing flows: the log-likelihood involves , and flow architectures are designed to make this determinant cheap to compute
- Bayesian inference: computing posteriors requires integrating over parameter spaces, often after a change of variables
Failure Mode
The formula requires to be a diffeomorphism (smooth with smooth inverse). If is not injective, you must partition the domain into regions where it is injective and sum the contributions. If at some points (critical points), the formula still holds but the contribution from those points is zero.
Polar coordinates
The transformation maps to . The Jacobian:
So .
This is how you compute the Gaussian integral: .
Fubini's Theorem
Fubini's Theorem
Statement
If is integrable (i.e., ), then:
The order of integration can be swapped.
Intuition
If the total integral is finite, you can compute a double integral by integrating one variable at a time, in either order. This is what makes multivariate integration tractable: you reduce it to a sequence of one-dimensional integrals.
Proof Sketch
The proof uses the monotone convergence theorem and the construction of product measures. For non-negative functions, the result follows from Tonelli's theorem (which does not require integrability, only measurability and non-negativity). Fubini extends this to signed functions by decomposing into positive and negative parts.
Why It Matters
Fubini's theorem is the justification for: (1) computing marginal distributions by integrating out variables, (2) switching the order of expectation and summation, (3) computing normalizing constants by iterated integration, and (4) the tower property of conditional expectation.
Failure Mode
The integrability condition is necessary. If , the iterated integrals may exist but give different values depending on the order of integration. The classic counterexample uses on .
Applications in ML
Marginalizing Distributions
Given a joint density , the marginal density of is:
This uses Fubini to reduce a multivariate integral to a single-variable one.
Computing Normalizing Constants
A density where . In Bayesian inference, computing (the evidence) often requires a change of variables to make the integral tractable.
Normalizing Flows
A normalizing flow transforms a simple base distribution through a diffeomorphism to get . The change-of-variables formula makes this exact.
Common Confusions
The Jacobian determinant is the absolute value
In the change-of-variables formula for integrals, you use , not . The absolute value ensures the integral is non-negative regardless of whether the transformation preserves or reverses orientation. For probability density transformations, forgetting the absolute value gives wrong densities.
Fubini requires integrability, Tonelli does not
Tonelli's theorem (for non-negative functions) allows you to swap integration order without checking integrability first. This is useful because you can establish integrability by computing the iterated integral. Fubini applies to signed functions but requires you to verify integrability of first.
Summary
- Substitution: with
- Multivariate change of variables: multiply by when transforming coordinates
- The Jacobian determinant measures local volume change
- Fubini: swap integration order when
- These tools compute expectations, marginals, normalizing constants, and flow densities
Exercises
Problem
Compute using the substitution .
Problem
Let and (so is log-normal). Use the change-of-variables formula to derive the density of .
References
Canonical:
- Rudin, Principles of Mathematical Analysis (1976), Chapters 6 and 10
- Folland, Real Analysis (1999), Chapter 2 (Lebesgue integration and product measures)
- Apostol, Mathematical Analysis (1974), Chapters 10-11 (Riemann integration and multivariable change of variables)
Current:
- Kobyzev et al., "Normalizing Flows: An Introduction and Review of Current Methods" (2021). Change-of-variables in deep learning.
- Billingsley, Probability and Measure (1995), Chapter 3 (integration and Fubini's theorem in measure-theoretic context)
- Spivak, Calculus on Manifolds (1965), Chapter 3 (integration on R^n and the change-of-variables formula)
Next Topics
- Common probability distributions: where these integration tools are applied
- Measure-theoretic probability: the rigorous foundation for integration in probability
Last reviewed: April 2026