Chi-Squared Concentration

Sneiderman, Robby

Concentration Probability

Chi-Squared Concentration

Two-sided exponential concentration for chi-squared sums of squared standard Gaussians: P(|Z/k - 1| > t) <= 2 exp(-t^2 k / 6) for t in (0, 1/2). Drives sub-exponential tails, variance-component inference, and Lipschitz Gaussian concentration.

AdvancedTier 1StableSupporting~25 min

Prerequisites

Concentration Inequalities Chernoff Bounds Common Probability Distributions Moment Generating Functions

Prereq Map

Why This Matters

A chi-squared random variable with $k$ degrees of freedom is a sum of $k$ independent squared standard Gaussians. Its concentration around its mean $k$ shows up everywhere: variance-component inference, $\chi^2$ goodness of fit, the squared-norm of a high-dimensional Gaussian, the noise term in Hanson-Wright, and the proxy term in Lipschitz Gaussian concentration. The result quoted on this page is the one most often used as a black box in high-dimensional statistics:

$\Pr\!\left[\left|\tfrac{Z}{k} - 1\right| > t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right) \qquad \text{for } t \in (0, 1/2),$

which is sub-exponential rather than sub-Gaussian: the exponent is linear in $k$ and quadratic in the relative deviation $t$ , with the same $k$ -scaling as the central limit theorem but valid non-asymptotically. The displayed form is restricted to small relative deviations $t \in (0, 1/2)$ , the regime where the elementary inequality $t - \log(1 + t) \geq t^2/3$ used in the proof is valid; for larger $t$ , use the sharp Chernoff exponent $(k/2)(t - \log(1 + t))$ directly.

Mental Model

Three orienting facts.

Mean and variance. For $Z = \sum_{i=1}^k X_i^2$ with $X_i \sim \mathcal{N}(0, 1)$ i.i.d., $\mathbb{E}[Z] = k$ and $\mathrm{Var}(Z) = 2 k$ . Standardized, $(Z - k) / \sqrt{2 k}$ is asymptotically $\mathcal{N}(0, 1)$ by the classical CLT. Sub-exponential concentration is what carries that asymptotic intuition into the finite-sample regime.
Squared standardness is sub-exponential, not sub-Gaussian. A single $X_i^2$ has MGF $\mathbb{E}[e^{\lambda X_i^2}] = (1 - 2 \lambda)^{-1/2}$ only for $\lambda < 1/2$ . The MGF blows up at $\lambda = 1/2$ , and that pole is exactly why the upper-tail constant is $1/6$ rather than the $1/2$ that a Gaussian tail would give.
The two tails are not symmetric for arbitrary $t$ . The lower tail is bounded by $\exp(-\varepsilon^2 k / 4)$ via the same Chernoff-method argument; the upper tail is bounded by $\exp(-\varepsilon^2 k / 6)$ . The two-sided $1/6$ in the displayed bound is the worst constant of the two, chosen so that one statement covers both tails.

Formal Setup

Let $X_1, \ldots, X_k$ be i.i.d. $\mathcal{N}(0, 1)$ and define $Z = \sum_{i=1}^k X_i^2 \sim \chi^2_k$ . The cumulant generating function of $Z$ is

$\Lambda_Z(\lambda) = -\tfrac{k}{2} \log(1 - 2 \lambda), \qquad \lambda < 1/2,$

obtained by raising the per-summand MGF $(1 - 2 \lambda)^{-1/2}$ to the $k$ -th power.

Theorem

Chi-Squared Upper Tail

Statement

For every $t \in (0, 1/2)$ ,

$\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\frac{t^2 k}{6}\right).$

Exact statement

Pr [Z \geq (1 + t) k] \leq exp (- \frac{t ^{2} k}{6})

LaTeX source for copy/export

\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\frac{t^2 k}{6}\right)

Proof Sketch

Step 1: Chernoff for $Z$ . For $\lambda \in (0, 1/2)$ ,

$\Pr[Z \geq (1 + t) k] \leq \mathbb{E}[e^{\lambda Z}]\, e^{-\lambda (1 + t) k} = \exp\!\left(-\tfrac{k}{2} \log(1 - 2 \lambda) - \lambda (1 + t) k\right).$

Step 2: optimize. Differentiating in $\lambda$ and setting to zero gives $\lambda^* = t / (2 (1 + t)) \in (0, 1/2)$ . Substituting back yields the sharp Chernoff exponent

$\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\tfrac{k}{2}\bigl(t - \log(1 + t)\bigr)\right).$

Step 3: simplify on $(0, 1/2)$ . The Taylor expansion $t - \log(1 + t) = t^2 / 2 - t^3 / 3 + t^4 / 4 - \cdots$ is an alternating series with strictly decreasing terms for $t \in (0, 1)$ , so truncating after two terms gives the lower bound

$t - \log(1 + t) \geq \tfrac{t^2}{2} - \tfrac{t^3}{3}.$

For $t \in (0, 1/2)$ , $t^3 / 3 \leq t^2 / 6$ (since $t \leq 1/2$ means $t^3 \leq t^2 / 2$ ), so

$t - \log(1 + t) \geq \tfrac{t^2}{2} - \tfrac{t^2}{6} = \tfrac{t^2}{3}.$

Plugging into the sharp exponent gives the displayed $t^2 k / 6$ bound.

Why It Matters

The sharp Chernoff exponent $(k/2)(t - \log(1 + t))$ is the Cramér rate function for the gamma distribution and is what large-deviations theory delivers in the $k \to \infty$ limit. The simplified $t^2 k / 6$ form on $(0, 1/2)$ is what the rest of high-dimensional statistics uses as a black box for small-deviation bounds.

Failure Mode

The simplification $t - \log(1 + t) \geq t^2 / 3$ holds on $(0, 1/2)$ via the Taylor argument above and continues to hold by direct computation up to $t \approx 0.787$ , but fails for larger $t$ (e.g., at $t = 1$ , $1 - \log 2 \approx 0.307 < 1/3 \approx 0.333$ ). For $t$ outside the clean small-deviation regime, use the sharp Chernoff exponent $(k/2)(t - \log(1 + t))$ directly, or pass to the linear-in- $t$ regime characteristic of sub-exponential tails.

report a correction →

Theorem

Chi-Squared Lower Tail

Statement

For every $t \in (0, 1)$ ,

$\Pr[Z \leq (1 - t) k] \leq \exp\!\left(-\frac{t^2 k}{4}\right).$

Exact statement

Pr [Z \leq (1 - t) k] \leq exp (- \frac{t ^{2} k}{4})

LaTeX source for copy/export

\Pr[Z \leq (1 - t) k] \leq \exp\!\left(-\frac{t^2 k}{4}\right)

Proof Sketch

Apply the Chernoff method with $\lambda < 0$ (i.e., bound $\mathbb{E}[e^{-\mu Z}]$ for $\mu > 0$ and use Markov on $e^{-\mu Z}$ ):

$\Pr[Z \leq (1 - t) k] \leq \mathbb{E}[e^{-\mu Z}]\, e^{\mu (1 - t) k} = \exp\!\left(-\tfrac{k}{2} \log(1 + 2 \mu) + \mu (1 - t) k\right).$

Optimizing over $\mu > 0$ gives $\mu^* = t / (2 (1 - t))$ and the sharp exponent $-(k/2)(\log(1 - t)^{-1} - t) = -(k/2)(\log\!\bigl(\tfrac{1}{1 - t}\bigr) - t)$ . Equivalently, the sharp exponent is $-\tfrac{k}{2}(\log(1 - t) + t) = \tfrac{k}{2}(-\log(1 - t) - t)$ , which on $t \in (0, 1)$ satisfies $-\log(1 - t) - t \geq t^2 / 2$ . This last inequality is verified directly from the Taylor series $-\log(1 - t) = t + t^2/2 + t^3/3 + \cdots$ . Substituting yields the displayed $t^2 k / 4$ exponent.

Why It Matters

The lower-tail constant $1/4$ is sharper than the upper-tail $1/6$ . The asymmetry is real: deviating below the mean is harder than deviating above, because squared Gaussians have nonnegative support and the upper tail can absorb mass from very large $|X_i|$ , while the lower tail is bounded by $0$ .

Failure Mode

The lower-tail bound is restricted to $t \in (0, 1)$ because $(1 - t) k$ must remain positive. For $t \to 1^-$ the bound degenerates polynomially, and a sharper Chernoff exponent is needed to track the rate at which the chi-squared mass approaches the origin.

report a correction →

Theorem

Chi-Squared Two-Sided Bound

Statement

For every $t \in (0, 1/2)$ ,

$\Pr\!\left[\left|\tfrac{Z}{k} - 1\right| \geq t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right).$

Exact statement

Pr [∣ Z / k - 1 ∣ \geq t] \leq 2 exp (- \frac{t ^{2} k}{6})

LaTeX source for copy/export

\Pr\!\left[\left|Z/k - 1\right| \geq t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right)

Intuition

Combining the upper-tail $\exp(-t^2 k / 6)$ on $(0, 1/2)$ and the lower-tail $\exp(-t^2 k / 4)$ on $(0, 1)$ via union bound gives $2 \exp(-t^2 k / 6)$ on the common range $(0, 1/2)$ , with the worse constant $1/6$ appearing in the exponent. The bound is sharp up to constants and matches the variance scaling $\mathrm{Var}(Z/k) = 2/k$ .

Why It Matters

This is the single statement that carries chi-squared concentration into applications. It bounds the squared norm of a Gaussian vector, the variance estimate for normal data, and the noise residual after a projection.

Failure Mode

The constant $1/6$ is not optimal in either tail individually; the upper-tail constant is $1/3$ in the $t - \log(1+t) \geq t^2/3$ form on $(0, 1/2)$ and the lower-tail is $1/2$ on $(0, 1)$ . When the asymmetry matters (for example in sharp variance-estimation analyses) the two-sided bound is too crude. For deviations beyond $t = 1/2$ use the sharp Chernoff exponent on each tail separately.

report a correction →

Common Confusions

Watch Out

Chi-squared is sub-exponential, not sub-Gaussian

A standard Gaussian satisfies $\mathbb{E}[e^{\lambda X}] = e^{\lambda^2 / 2}$ for every $\lambda$ . A squared standard Gaussian satisfies $\mathbb{E}[e^{\lambda X^2}] = (1 - 2 \lambda)^{-1/2}$ only on $\lambda \in [0, 1/2)$ . The MGF blows up at the boundary $\lambda = 1/2$ , which is the formal definition of a sub-exponential variable. Tail bounds that are quadratic in $t$ for small $t$ become linear in $t$ for large $t$ .

Watch Out

The constants 1/3, 1/4, 1/6 are not arbitrary

Each constant traces back to a specific elementary inequality. The upper-tail $1/3$ comes from $t - \log(1 + t) \geq t^2 / 3$ on $(0, 1/2)$ (via the Taylor truncation $t - \log(1+t) \geq t^2/2 - t^3/3$ combined with $t^3/3 \leq t^2/6$ when $t \leq 1/2$ ). The lower-tail $1/2$ comes from $-\log(1 - t) - t \geq t^2 / 2$ on $(0, 1)$ . The two-sided $1/6$ is the worse upper-tail $1/3$ after absorbing the factor of $1/2$ from the cumulant function $\Lambda_Z(\lambda)$ expansion. The upper-tail inequality $t - \log(1+t) \geq t^2/3$ does not hold on all of $(0, 3)$ — it fails near $t = 1$ — so the simplified $t^2 k / 6$ form is restricted to $(0, 1/2)$ .

Watch Out

Independence of the squared variables, not of the original Gaussians

The chi-squared MGF factorizes because the $X_i$ are i.i.d., and the squares $X_i^2$ inherit independence. If the underlying Gaussians are correlated (e.g., a quadratic form $X^\top A X$ for non-diagonal $A$ ), the MGF is more complicated and the relevant concentration result is Hanson-Wright, not the displayed chi-squared bound.

Exercises

ExerciseCore

Problem

Verify the elementary inequality $t - \log(1 + t) \geq t^2 / 3$ on $t \in (0, 1/2)$ used in the upper-tail proof. Then exhibit a value of $t \in (1/2, 3)$ where the inequality fails to show that the natural extension to larger $t$ is incorrect.

ExerciseAdvanced

Problem

Let $X = (X_1, \ldots, X_k)$ be a centered isotropic Gaussian in $\mathbb{R}^k$ (i.e., $\mathbb{E}[X X^\top] = I_k$ ). Use the chi-squared two-sided bound to prove that for every $\eta \in (0, 1/2)$ ,

$\Pr\!\left[\left|\,\|X\|^2 - k\,\right| > \eta k\right] \leq 2 \exp(-\eta^2 k / 6).$

Then translate this into a bound on $\|X\|$ itself.

References

Canonical:

Laurent, B., & Massart, P. (2000). "Adaptive estimation of a quadratic functional by model selection." Annals of Statistics, 28(5), 1302-1338. The classical sharp form $\Pr[Z \geq k + 2 \sqrt{k t} + 2 t] \leq e^{-t}$ is Lemma 1; the displayed $t^2 k / 6$ form follows by a change of variable.
Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration Inequalities. Oxford University Press. Section 2.4 develops the chi-squared bound from the MGF and Section 5.1 puts it inside the gamma family.
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.12 in Appendix B states the displayed two-sided form.

Current:

Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press. Example 2.4 in Section 2.2 derives the chi-squared MGF and identifies $\chi^2_k$ as the canonical sub-exponential variable with parameters $(\nu^2, b) = (4k, 4)$ .
Vershynin, R. (2018). High-Dimensional Probability. Cambridge University Press. Theorem 3.1.1 (concentration of the norm) gives the high-dimensional vector form.
van Handel, R. (2016). Probability in High Dimension. Lecture notes, Princeton. Chapter 3 derives the chi-squared two-sided bound directly from the gamma MGF.

Next Topics

Sub-exponential random variables: the abstract framework around the chi-squared MGF, where the squared Gaussian is the canonical example
Hanson-Wright inequality: the generalization to quadratic forms $X^\top A X$ for sub-Gaussian $X$ and arbitrary $A$
Bernstein's inequality: the variance-aware scalar cousin used when the summands are bounded rather than Gaussian

Last reviewed: May 8, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Common Probability Distributionslayer 0A · tier 1
Chernoff Boundslayer 1 · tier 1
Concentration Inequalitieslayer 1 · tier 1
Moment Generating Functionslayer 0A · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.