Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Cramér-Wold Theorem

A multivariate distribution is uniquely determined by all of its one-dimensional projections. This reduces multivariate convergence in distribution to checking univariate projections, and is the standard tool for proving multivariate CLT.

AdvancedTier 2Stable~30 min
0

Why This Matters

The central limit theorem in one dimension says n(Xˉnμ)dN(0,σ2)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2). But in statistics and ML, you almost always work with vectors: the MLE θ^Rd\hat{\theta} \in \mathbb{R}^d, the gradient LRd\nabla \mathcal{L} \in \mathbb{R}^d, the sample covariance matrix entries. The multivariate CLT says n(Xˉnμ)dN(0,Σ)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma), but proving convergence in distribution for random vectors is harder than for scalars.

The Cramér-Wold theorem solves this: to prove a random vector converges in distribution, it suffices to prove that every one-dimensional projection converges. This reduces a dd-dimensional problem to infinitely many one-dimensional problems, each of which can be handled by the scalar CLT.

The Theorem

Theorem

Cramér-Wold Theorem

Statement

Let Xn,XX_n, X be random vectors in Rd\mathbb{R}^d. Then:

XndXif and only iftXndtXfor all tRdX_n \xrightarrow{d} X \quad \text{if and only if} \quad t^\top X_n \xrightarrow{d} t^\top X \quad \text{for all } t \in \mathbb{R}^d

A multivariate distribution is uniquely determined by the collection of all its one-dimensional marginals (projections onto arbitrary directions).

Intuition

If two distributions agree on every 1D shadow (projection), they must be the same distribution. Conversely, if two sequences of distributions get close in every 1D shadow, they get close in the full dd-dimensional space. The projection tXt^\top X is a scalar random variable, so you can use all the scalar tools (characteristic functions, univariate CLT, moment conditions) to check convergence direction by direction.

Proof Sketch

The characteristic function of tXt^\top X is φtX(s)=E[eistX]=φX(st)\varphi_{t^\top X}(s) = \mathbb{E}[e^{i s t^\top X}] = \varphi_X(st), the characteristic function of XX evaluated at stst.

If tXndtXt^\top X_n \xrightarrow{d} t^\top X for all tt, then by Levy's continuity theorem, φtXn(1)φtX(1)\varphi_{t^\top X_n}(1) \to \varphi_{t^\top X}(1) for each tt. But φtXn(1)=φXn(t)\varphi_{t^\top X_n}(1) = \varphi_{X_n}(t), so φXn(t)φX(t)\varphi_{X_n}(t) \to \varphi_X(t) for all tRdt \in \mathbb{R}^d. By the multivariate Levy continuity theorem, XndXX_n \xrightarrow{d} X.

Why It Matters

The standard proof of the multivariate CLT uses Cramér-Wold: to show n(Xˉnμ)dN(0,Σ)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma), fix any tRdt \in \mathbb{R}^d and note that tn(Xˉnμ)=n(tXˉntμ)t^\top \sqrt{n}(\bar{X}_n - \mu) = \sqrt{n}(t^\top \bar{X}_n - t^\top \mu) is a sample mean of scalars tXit^\top X_i with variance tΣtt^\top \Sigma t. The scalar CLT gives tn(Xˉnμ)dN(0,tΣt)t^\top \sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, t^\top \Sigma t). Since this holds for all tt, Cramér-Wold gives the full multivariate result.

This same technique proves asymptotic normality of multivariate MLE, multivariate delta method results, and joint convergence of multiple statistics.

Failure Mode

You must check ALL directions tt, not just the coordinate directions. Checking only t=e1,,edt = e_1, \ldots, e_d (the standard basis) establishes convergence of each coordinate marginally, but marginal convergence does not imply joint convergence. The full collection of projections captures the dependence structure that marginals miss.

Application: Multivariate CLT Proof

The multivariate CLT follows immediately from the scalar CLT plus Cramér-Wold:

  1. Let X1,,XnRdX_1, \ldots, X_n \in \mathbb{R}^d be i.i.d. with mean μ\mu and covariance Σ\Sigma.
  2. Fix any tRdt \in \mathbb{R}^d. Define Yi=tXiY_i = t^\top X_i. Then YiY_i are i.i.d. scalars with mean tμt^\top \mu and variance tΣtt^\top \Sigma t.
  3. By the scalar CLT: n(tXˉntμ)dN(0,tΣt)\sqrt{n}(t^\top \bar{X}_n - t^\top \mu) \xrightarrow{d} \mathcal{N}(0, t^\top \Sigma t).
  4. But N(0,tΣt)\mathcal{N}(0, t^\top \Sigma t) is the distribution of tZt^\top Z where ZN(0,Σ)Z \sim \mathcal{N}(0, \Sigma).
  5. Since step 3 holds for all tt, Cramér-Wold gives n(Xˉnμ)dN(0,Σ)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma).

This proof is three lines once you have the scalar CLT and Cramér-Wold. Without Cramér-Wold, you would need to work directly with multivariate characteristic functions, which is messier.

Common Confusions

Watch Out

Marginal convergence is not the same as joint convergence

If (Xn,Yn)d(X,Y)(X_n, Y_n) \xrightarrow{d} (X, Y), then XndXX_n \xrightarrow{d} X and YndYY_n \xrightarrow{d} Y (marginals converge). But the converse is false: marginal convergence does not imply joint convergence. Cramér-Wold fixes this by checking ALL linear combinations, not just the individual coordinates. The projection t(Xn,Yn)=t1Xn+t2Ynt^\top (X_n, Y_n) = t_1 X_n + t_2 Y_n captures the dependence between XnX_n and YnY_n.

Watch Out

Cramér-Wold does not require checking infinitely many directions in practice

In theory, you must check all tRdt \in \mathbb{R}^d. In practice, you usually verify the condition for a generic tt (by computing the variance tΣtt^\top \Sigma t and applying the scalar CLT), which works simultaneously for all tt. You almost never need to check directions one by one.

Exercises

ExerciseCore

Problem

Use the Cramér-Wold theorem to show that if XndN(0,Id)X_n \xrightarrow{d} \mathcal{N}(0, I_d) and ARk×dA \in \mathbb{R}^{k \times d} is a fixed matrix, then AXndN(0,AA)AX_n \xrightarrow{d} \mathcal{N}(0, AA^\top).

ExerciseAdvanced

Problem

Give an example of random vectors (Xn,Yn)(X_n, Y_n) in R2\mathbb{R}^2 such that XndXX_n \xrightarrow{d} X and YndYY_n \xrightarrow{d} Y but (Xn,Yn)(X_n, Y_n) does not converge in distribution to (X,Y)(X, Y).

References

Canonical:

  • Billingsley, Convergence of Probability Measures (2nd ed., 1999), Section 29
  • van der Vaart, Asymptotic Statistics (1998), Theorem 2.4 (Cramér-Wold device)
  • Durrett, Probability: Theory and Examples (5th ed., 2019), Theorem 3.9.5

Historical:

  • Cramér & Wold, "Some Theorems on Distribution Functions" (1936)

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics