Foundations
Cramér-Wold Theorem
A multivariate distribution is uniquely determined by all of its one-dimensional projections. This reduces multivariate convergence in distribution to checking univariate projections, and is the standard tool for proving multivariate CLT.
Prerequisites
Why This Matters
The central limit theorem in one dimension says . But in statistics and ML, you almost always work with vectors: the MLE , the gradient , the sample covariance matrix entries. The multivariate CLT says , but proving convergence in distribution for random vectors is harder than for scalars.
The Cramér-Wold theorem solves this: to prove a random vector converges in distribution, it suffices to prove that every one-dimensional projection converges. This reduces a -dimensional problem to infinitely many one-dimensional problems, each of which can be handled by the scalar CLT.
The Theorem
Cramér-Wold Theorem
Statement
Let be random vectors in . Then:
A multivariate distribution is uniquely determined by the collection of all its one-dimensional marginals (projections onto arbitrary directions).
Intuition
If two distributions agree on every 1D shadow (projection), they must be the same distribution. Conversely, if two sequences of distributions get close in every 1D shadow, they get close in the full -dimensional space. The projection is a scalar random variable, so you can use all the scalar tools (characteristic functions, univariate CLT, moment conditions) to check convergence direction by direction.
Proof Sketch
The characteristic function of is , the characteristic function of evaluated at .
If for all , then by Levy's continuity theorem, for each . But , so for all . By the multivariate Levy continuity theorem, .
Why It Matters
The standard proof of the multivariate CLT uses Cramér-Wold: to show , fix any and note that is a sample mean of scalars with variance . The scalar CLT gives . Since this holds for all , Cramér-Wold gives the full multivariate result.
This same technique proves asymptotic normality of multivariate MLE, multivariate delta method results, and joint convergence of multiple statistics.
Failure Mode
You must check ALL directions , not just the coordinate directions. Checking only (the standard basis) establishes convergence of each coordinate marginally, but marginal convergence does not imply joint convergence. The full collection of projections captures the dependence structure that marginals miss.
Application: Multivariate CLT Proof
The multivariate CLT follows immediately from the scalar CLT plus Cramér-Wold:
- Let be i.i.d. with mean and covariance .
- Fix any . Define . Then are i.i.d. scalars with mean and variance .
- By the scalar CLT: .
- But is the distribution of where .
- Since step 3 holds for all , Cramér-Wold gives .
This proof is three lines once you have the scalar CLT and Cramér-Wold. Without Cramér-Wold, you would need to work directly with multivariate characteristic functions, which is messier.
Common Confusions
Marginal convergence is not the same as joint convergence
If , then and (marginals converge). But the converse is false: marginal convergence does not imply joint convergence. Cramér-Wold fixes this by checking ALL linear combinations, not just the individual coordinates. The projection captures the dependence between and .
Cramér-Wold does not require checking infinitely many directions in practice
In theory, you must check all . In practice, you usually verify the condition for a generic (by computing the variance and applying the scalar CLT), which works simultaneously for all . You almost never need to check directions one by one.
Exercises
Problem
Use the Cramér-Wold theorem to show that if and is a fixed matrix, then .
Problem
Give an example of random vectors in such that and but does not converge in distribution to .
References
Canonical:
- Billingsley, Convergence of Probability Measures (2nd ed., 1999), Section 29
- van der Vaart, Asymptotic Statistics (1998), Theorem 2.4 (Cramér-Wold device)
- Durrett, Probability: Theory and Examples (5th ed., 2019), Theorem 3.9.5
Historical:
- Cramér & Wold, "Some Theorems on Distribution Functions" (1936)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Central Limit TheoremLayer 0B
- Law of Large NumbersLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Measure-Theoretic ProbabilityLayer 0B