Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Concentration Probability

Measure Concentration and Geometric Functional Analysis

High-dimensional geometry is counterintuitive: Lipschitz functions concentrate, random projections preserve distances, and most of a sphere's measure sits near the equator. Johnson-Lindenstrauss, Gaussian concentration, and Levy's lemma.

AdvancedTier 2Stable~55 min
0

Why This Matters

High-dimensional spaces behave nothing like low-dimensional ones. In Rd\mathbb{R}^d with d=1000d = 1000, a random vector is almost surely close to d\sqrt{d} in norm. Two independent random unit vectors are almost surely nearly orthogonal. A Lipschitz function of a Gaussian vector is almost surely close to its mean.

These phenomena are not curiosities. They are the reason dimensionality reduction works, random projections preserve structure, and empirical averages concentrate. The Johnson-Lindenstrauss lemma, which enables practical dimensionality reduction from dd to O(logn/ϵ2)O(\log n / \epsilon^2), is a direct consequence of Gaussian concentration.

Gaussian Concentration

Definition

Lipschitz Function

A function f:RdRf: \mathbb{R}^d \to \mathbb{R} is LL-Lipschitz if f(x)f(y)Lxy2|f(x) - f(y)| \leq L\|x - y\|_2 for all x,yx, y. The smallest such LL is the Lipschitz constant.

Theorem

Gaussian Concentration of Lipschitz Functions

Statement

Let XN(0,Id)X \sim N(0, I_d) and f:RdRf: \mathbb{R}^d \to \mathbb{R} be LL-Lipschitz. Then for all t>0t > 0:

P(f(X)E[f(X)]t)2exp(t22L2)P(|f(X) - \mathbb{E}[f(X)]| \geq t) \leq 2\exp\left(-\frac{t^2}{2L^2}\right)

The function f(X)f(X) is sub-Gaussian with parameter LL.

Intuition

Each coordinate of XX contributes independently to f(X)f(X), and the Lipschitz condition limits how much each coordinate can influence the output. The dd independent contributions each have bounded effect, and their sum concentrates by the same mechanism as a sum of bounded random variables. The dimension dd does not appear in the bound.

Proof Sketch

Use the Gaussian Poincare inequality: Var(f(X))E[f(X)2]L2\text{Var}(f(X)) \leq \mathbb{E}[\|\nabla f(X)\|^2] \leq L^2. For the tail bound, apply the Herbst argument: compute the log-MGF ψ(λ)=logE[eλ(f(X)E[f(X)])]\psi(\lambda) = \log \mathbb{E}[e^{\lambda(f(X) - \mathbb{E}[f(X)])}] and show ψ(λ)λ2L2/2\psi(\lambda) \leq \lambda^2 L^2/2 using the Gaussian log-Sobolev inequality. Then apply the Chernoff bound.

Why It Matters

This single result has enormous consequences. It implies that the norm X\|X\| concentrates around d\sqrt{d}, inner products X,Y\langle X, Y \rangle concentrate around 0, and random projections preserve distances. Most of high-dimensional probability follows from applying this inequality to appropriately chosen Lipschitz functions.

Failure Mode

The result requires Gaussianity (or more generally, a distribution satisfying a log-Sobolev inequality). For heavy-tailed distributions, Lipschitz functions do not concentrate at the Gaussian rate. The bound also requires ff to be globally Lipschitz; local Lipschitz conditions are not sufficient.

Johnson-Lindenstrauss Lemma

Lemma

Johnson-Lindenstrauss Lemma

Statement

For any set of nn points x1,,xnRdx_1, \ldots, x_n \in \mathbb{R}^d and any ϵ(0,1)\epsilon \in (0, 1), there exists a linear map A:RdRkA: \mathbb{R}^d \to \mathbb{R}^k with k=O(logn/ϵ2)k = O(\log n / \epsilon^2) such that for all pairs i,ji, j:

(1ϵ)xixj2AxiAxj2(1+ϵ)xixj2(1 - \epsilon)\|x_i - x_j\|^2 \leq \|Ax_i - Ax_j\|^2 \leq (1 + \epsilon)\|x_i - x_j\|^2

The map A=1kΠA = \frac{1}{\sqrt{k}} \Pi where Π\Pi is a k×dk \times d matrix with i.i.d. N(0,1)N(0,1) entries works with high probability.

Intuition

A random projection of a vector xx has squared norm that concentrates around x2\|x\|^2. This is because Ax2=1kj=1k(πjTx)2\|Ax\|^2 = \frac{1}{k}\sum_{j=1}^k (\pi_j^T x)^2 where each πjTxN(0,x2)\pi_j^T x \sim N(0, \|x\|^2). The sum of kk squared Gaussians concentrates around its mean by chi-squared concentration. Taking a union bound over all (n2)\binom{n}{2} pairs gives the result.

Proof Sketch

Fix a single pair (xi,xj)(x_i, x_j) and let v=xixjv = x_i - x_j. Then Av2=v2kl=1kZl2\|Av\|^2 = \frac{\|v\|^2}{k} \sum_{l=1}^k Z_l^2 where ZlN(0,1)Z_l \sim N(0,1) i.i.d. By sub-exponential concentration of chi-squared:

P(Av2v21>ϵ)2exp(ckϵ2)P\left(\left|\frac{\|Av\|^2}{\|v\|^2} - 1\right| > \epsilon\right) \leq 2\exp(-ck\epsilon^2)

for ϵ<1\epsilon < 1. With k=Clogn/ϵ2k = C\log n / \epsilon^2 and a union bound over (n2)<n2\binom{n}{2} < n^2 pairs, the failure probability is at most 2n2exp(cClogn)<δ2n^2 \exp(-cC\log n) < \delta for CC large enough.

Why It Matters

This lemma says that nn points in arbitrarily high dimension can be embedded into O(logn)O(\log n) dimensions while approximately preserving all pairwise distances. The target dimension depends only on nn and ϵ\epsilon, not on the original dimension dd. This is the theoretical foundation for random projection methods in dimensionality reduction, approximate nearest neighbor search, and compressed sensing.

Failure Mode

The O(logn/ϵ2)O(\log n / \epsilon^2) target dimension is tight: there exist point sets requiring Ω(logn/ϵ2)\Omega(\log n / \epsilon^2) dimensions for any linear map. For ϵ\epsilon very small, the target dimension can still be large. The lemma also only preserves Euclidean distances; other metrics require different techniques.

Concentration on the Sphere

Lemma

Levy's Lemma (Concentration on the Sphere)

Statement

Let XX be uniformly distributed on Sd1S^{d-1} and f:Sd1Rf: S^{d-1} \to \mathbb{R} be LL-Lipschitz. Let MfM_f be the median of f(X)f(X). Then:

P(f(X)Mft)2exp((d1)t22L2)P(|f(X) - M_f| \geq t) \leq 2\exp\left(-\frac{(d-1)t^2}{2L^2}\right)

The concentration improves with dimension: higher dimension means tighter concentration.

Intuition

On a high-dimensional sphere, most of the surface area is concentrated near any equator. A Lipschitz function cannot vary much on this concentrated strip. The effective number of "independent directions" on the sphere is d1d-1, which is why the exponent contains d1d-1.

Proof Sketch

The proof uses the spherical isoperimetric inequality: among all sets of a given measure on Sd1S^{d-1}, spherical caps have the smallest boundary. For a cap of measure 1/21/2 (the hemisphere), the tt-enlargement has measure 1exp((d1)t2/2)\geq 1 - \exp(-(d-1)t^2/2). For a Lipschitz function, the set {fMf+t}\{f \leq M_f + t\} contains the t/Lt/L-enlargement of {fMf}\{f \leq M_f\}.

Why It Matters

Levy's lemma makes precise the idea that high-dimensional spheres are "effectively one-dimensional" from the perspective of Lipschitz functions. Any continuous measurement on a high-dimensional sphere will give nearly the same value for almost every point. This is a key ingredient in proofs of the Johnson-Lindenstrauss lemma via spherical projections.

Failure Mode

The bound degrades for non-Lipschitz functions. For discontinuous functions or functions with large Lipschitz constant relative to the sphere's radius, the concentration is useless. Also, the bound is vacuous for d=1d = 1 (the circle), where no concentration occurs.

Why High-Dimensional Geometry is Counterintuitive

Three specific phenomena that defy low-dimensional intuition:

  1. Norm concentration: if XN(0,Id)X \sim N(0, I_d), then X/d1\|X\| / \sqrt{d} \to 1 in probability. The norm is approximately deterministic.

  2. Near-orthogonality: if X,YN(0,Id)X, Y \sim N(0, I_d) independently, then X,Y/(XY)\langle X, Y \rangle / (\|X\|\|Y\|) has standard deviation 1/d1/\sqrt{d}. Random vectors are nearly orthogonal.

  3. Volume in corners: most of the volume of a high-dimensional cube [1,1]d[-1, 1]^d is near the corners (near radius d\sqrt{d}), not near the center. The inscribed ball has negligible volume for large dd.

Common Confusions

Watch Out

JL says nothing about structure preservation beyond distances

The JL lemma preserves pairwise Euclidean distances. It does not preserve cluster structure, manifold geometry, or topological properties. Two point sets with the same pairwise distance matrix are mapped identically by JL. For structure beyond distances, you need methods like t-SNE or UMAP that optimize different objectives.

Watch Out

Gaussian concentration dimension-free does not mean dimension irrelevant

The bound P(f(X)E[f]t)2et2/(2L2)P(|f(X) - \mathbb{E}[f]| \geq t) \leq 2e^{-t^2/(2L^2)} does not contain dd, but the Lipschitz constant LL and the mean E[f]\mathbb{E}[f] can depend on dd. For example, X\|X\| is 11-Lipschitz with mean d\approx \sqrt{d}. The bound says X\|X\| deviates from d\sqrt{d} by at most O(1)O(1), which is a relative deviation of O(1/d)O(1/\sqrt{d}).

Summary

  • Lipschitz functions of Gaussians are sub-Gaussian with parameter equal to the Lipschitz constant. No dependence on dimension.
  • JL: nn points in Rd\mathbb{R}^d can be projected to O(logn/ϵ2)O(\log n / \epsilon^2) dimensions preserving distances up to (1±ϵ)(1 \pm \epsilon)
  • Levy's lemma: on Sd1S^{d-1}, concentration improves with dimension
  • High-dimensional norm concentrates: Xd\|X\| \approx \sqrt{d} for XN(0,Id)X \sim N(0, I_d)
  • Random vectors are nearly orthogonal in high dimensions

Exercises

ExerciseCore

Problem

Let XN(0,Id)X \sim N(0, I_d). Show that f(X)=Xf(X) = \|X\| is 11-Lipschitz and use Gaussian concentration to bound P(XE[X]t)P(|\|X\| - \mathbb{E}[\|X\|]| \geq t).

ExerciseAdvanced

Problem

Prove the JL lemma for a single pair: if vRdv \in \mathbb{R}^d and A=1kΠA = \frac{1}{\sqrt{k}}\Pi with Π\Pi having i.i.d. N(0,1)N(0,1) entries, show that P(Av2/v21>ϵ)2exp(ckϵ2)P(|\|Av\|^2 / \|v\|^2 - 1| > \epsilon) \leq 2\exp(-ck\epsilon^2) for ϵ(0,1)\epsilon \in (0, 1).

References

Canonical:

Current:

  • Boucheron, Lugosi, Massart, Concentration Inequalities, Chapter 5

  • Wainwright, High-Dimensional Statistics (2019), Chapter 2

  • van Handel, Probability in High Dimension (2016), Chapters 1-3

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics