Skip to main content

Concentration Probability

Chi-Squared Concentration

Two-sided exponential concentration for chi-squared sums of squared standard Gaussians: P(|Z/k - 1| > t) <= 2 exp(-t^2 k / 6) for t in (0, 1/2). Drives sub-exponential tails, variance-component inference, and Lipschitz Gaussian concentration.

AdvancedTier 1StableSupporting~25 min

Why This Matters

A chi-squared random variable with kk degrees of freedom is a sum of kk independent squared standard Gaussians. Its concentration around its mean kk shows up everywhere: variance-component inference, χ2\chi^2 goodness of fit, the squared-norm of a high-dimensional Gaussian, the noise term in Hanson-Wright, and the proxy term in Lipschitz Gaussian concentration. The result quoted on this page is the one most often used as a black box in high-dimensional statistics:

Pr ⁣[Zk1>t]2exp ⁣(t2k6)for t(0,1/2),\Pr\!\left[\left|\tfrac{Z}{k} - 1\right| > t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right) \qquad \text{for } t \in (0, 1/2),

which is sub-exponential rather than sub-Gaussian: the exponent is linear in kk and quadratic in the relative deviation tt, with the same kk-scaling as the central limit theorem but valid non-asymptotically. The displayed form is restricted to small relative deviations t(0,1/2)t \in (0, 1/2), the regime where the elementary inequality tlog(1+t)t2/3t - \log(1 + t) \geq t^2/3 used in the proof is valid; for larger tt, use the sharp Chernoff exponent (k/2)(tlog(1+t))(k/2)(t - \log(1 + t)) directly.

Mental Model

Three orienting facts.

  1. Mean and variance. For Z=i=1kXi2Z = \sum_{i=1}^k X_i^2 with XiN(0,1)X_i \sim \mathcal{N}(0, 1) i.i.d., E[Z]=k\mathbb{E}[Z] = k and Var(Z)=2k\mathrm{Var}(Z) = 2 k. Standardized, (Zk)/2k(Z - k) / \sqrt{2 k} is asymptotically N(0,1)\mathcal{N}(0, 1) by the classical CLT. Sub-exponential concentration is what carries that asymptotic intuition into the finite-sample regime.
  2. Squared standardness is sub-exponential, not sub-Gaussian. A single Xi2X_i^2 has MGF E[eλXi2]=(12λ)1/2\mathbb{E}[e^{\lambda X_i^2}] = (1 - 2 \lambda)^{-1/2} only for λ<1/2\lambda < 1/2. The MGF blows up at λ=1/2\lambda = 1/2, and that pole is exactly why the upper-tail constant is 1/61/6 rather than the 1/21/2 that a Gaussian tail would give.
  3. The two tails are not symmetric for arbitrary tt. The lower tail is bounded by exp(ε2k/4)\exp(-\varepsilon^2 k / 4) via the same Chernoff-method argument; the upper tail is bounded by exp(ε2k/6)\exp(-\varepsilon^2 k / 6). The two-sided 1/61/6 in the displayed bound is the worst constant of the two, chosen so that one statement covers both tails.

Formal Setup

Let X1,,XkX_1, \ldots, X_k be i.i.d. N(0,1)\mathcal{N}(0, 1) and define Z=i=1kXi2χk2Z = \sum_{i=1}^k X_i^2 \sim \chi^2_k. The cumulant generating function of ZZ is

ΛZ(λ)=k2log(12λ),λ<1/2,\Lambda_Z(\lambda) = -\tfrac{k}{2} \log(1 - 2 \lambda), \qquad \lambda < 1/2,

obtained by raising the per-summand MGF (12λ)1/2(1 - 2 \lambda)^{-1/2} to the kk-th power.

Theorem

Chi-Squared Upper Tail

Statement

For every t(0,1/2)t \in (0, 1/2),

Pr[Z(1+t)k]exp ⁣(t2k6).\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\frac{t^2 k}{6}\right).

Exact statement

LaTeX source for copy/export

\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\frac{t^2 k}{6}\right)

Proof Sketch

Step 1: Chernoff for ZZ. For λ(0,1/2)\lambda \in (0, 1/2),

Pr[Z(1+t)k]E[eλZ]eλ(1+t)k=exp ⁣(k2log(12λ)λ(1+t)k).\Pr[Z \geq (1 + t) k] \leq \mathbb{E}[e^{\lambda Z}]\, e^{-\lambda (1 + t) k} = \exp\!\left(-\tfrac{k}{2} \log(1 - 2 \lambda) - \lambda (1 + t) k\right).

Step 2: optimize. Differentiating in λ\lambda and setting to zero gives λ=t/(2(1+t))(0,1/2)\lambda^* = t / (2 (1 + t)) \in (0, 1/2). Substituting back yields the sharp Chernoff exponent

Pr[Z(1+t)k]exp ⁣(k2(tlog(1+t))).\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\tfrac{k}{2}\bigl(t - \log(1 + t)\bigr)\right).

Step 3: simplify on (0,1/2)(0, 1/2). The Taylor expansion tlog(1+t)=t2/2t3/3+t4/4t - \log(1 + t) = t^2 / 2 - t^3 / 3 + t^4 / 4 - \cdots is an alternating series with strictly decreasing terms for t(0,1)t \in (0, 1), so truncating after two terms gives the lower bound

tlog(1+t)t22t33.t - \log(1 + t) \geq \tfrac{t^2}{2} - \tfrac{t^3}{3}.

For t(0,1/2)t \in (0, 1/2), t3/3t2/6t^3 / 3 \leq t^2 / 6 (since t1/2t \leq 1/2 means t3t2/2t^3 \leq t^2 / 2), so

tlog(1+t)t22t26=t23.t - \log(1 + t) \geq \tfrac{t^2}{2} - \tfrac{t^2}{6} = \tfrac{t^2}{3}.

Plugging into the sharp exponent gives the displayed t2k/6t^2 k / 6 bound.

Why It Matters

The sharp Chernoff exponent (k/2)(tlog(1+t))(k/2)(t - \log(1 + t)) is the Cramér rate function for the gamma distribution and is what large-deviations theory delivers in the kk \to \infty limit. The simplified t2k/6t^2 k / 6 form on (0,1/2)(0, 1/2) is what the rest of high-dimensional statistics uses as a black box for small-deviation bounds.

Failure Mode

The simplification tlog(1+t)t2/3t - \log(1 + t) \geq t^2 / 3 holds on (0,1/2)(0, 1/2) via the Taylor argument above and continues to hold by direct computation up to t0.787t \approx 0.787, but fails for larger tt (e.g., at t=1t = 1, 1log20.307<1/30.3331 - \log 2 \approx 0.307 < 1/3 \approx 0.333). For tt outside the clean small-deviation regime, use the sharp Chernoff exponent (k/2)(tlog(1+t))(k/2)(t - \log(1 + t)) directly, or pass to the linear-in-tt regime characteristic of sub-exponential tails.

Theorem

Chi-Squared Lower Tail

Statement

For every t(0,1)t \in (0, 1),

Pr[Z(1t)k]exp ⁣(t2k4).\Pr[Z \leq (1 - t) k] \leq \exp\!\left(-\frac{t^2 k}{4}\right).

Exact statement

LaTeX source for copy/export

\Pr[Z \leq (1 - t) k] \leq \exp\!\left(-\frac{t^2 k}{4}\right)

Proof Sketch

Apply the Chernoff method with λ<0\lambda < 0 (i.e., bound E[eμZ]\mathbb{E}[e^{-\mu Z}] for μ>0\mu > 0 and use Markov on eμZe^{-\mu Z}):

Pr[Z(1t)k]E[eμZ]eμ(1t)k=exp ⁣(k2log(1+2μ)+μ(1t)k).\Pr[Z \leq (1 - t) k] \leq \mathbb{E}[e^{-\mu Z}]\, e^{\mu (1 - t) k} = \exp\!\left(-\tfrac{k}{2} \log(1 + 2 \mu) + \mu (1 - t) k\right).

Optimizing over μ>0\mu > 0 gives μ=t/(2(1t))\mu^* = t / (2 (1 - t)) and the sharp exponent (k/2)(log(1t)1t)=(k/2)(log ⁣(11t)t)-(k/2)(\log(1 - t)^{-1} - t) = -(k/2)(\log\!\bigl(\tfrac{1}{1 - t}\bigr) - t). Equivalently, the sharp exponent is k2(log(1t)+t)=k2(log(1t)t)-\tfrac{k}{2}(\log(1 - t) + t) = \tfrac{k}{2}(-\log(1 - t) - t), which on t(0,1)t \in (0, 1) satisfies log(1t)tt2/2-\log(1 - t) - t \geq t^2 / 2. This last inequality is verified directly from the Taylor series log(1t)=t+t2/2+t3/3+-\log(1 - t) = t + t^2/2 + t^3/3 + \cdots. Substituting yields the displayed t2k/4t^2 k / 4 exponent.

Why It Matters

The lower-tail constant 1/41/4 is sharper than the upper-tail 1/61/6. The asymmetry is real: deviating below the mean is harder than deviating above, because squared Gaussians have nonnegative support and the upper tail can absorb mass from very large Xi|X_i|, while the lower tail is bounded by 00.

Failure Mode

The lower-tail bound is restricted to t(0,1)t \in (0, 1) because (1t)k(1 - t) k must remain positive. For t1t \to 1^- the bound degenerates polynomially, and a sharper Chernoff exponent is needed to track the rate at which the chi-squared mass approaches the origin.

Theorem

Chi-Squared Two-Sided Bound

Statement

For every t(0,1/2)t \in (0, 1/2),

Pr ⁣[Zk1t]2exp ⁣(t2k6).\Pr\!\left[\left|\tfrac{Z}{k} - 1\right| \geq t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right).

Exact statement

LaTeX source for copy/export

\Pr\!\left[\left|Z/k - 1\right| \geq t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right)

Intuition

Combining the upper-tail exp(t2k/6)\exp(-t^2 k / 6) on (0,1/2)(0, 1/2) and the lower-tail exp(t2k/4)\exp(-t^2 k / 4) on (0,1)(0, 1) via union bound gives 2exp(t2k/6)2 \exp(-t^2 k / 6) on the common range (0,1/2)(0, 1/2), with the worse constant 1/61/6 appearing in the exponent. The bound is sharp up to constants and matches the variance scaling Var(Z/k)=2/k\mathrm{Var}(Z/k) = 2/k.

Why It Matters

This is the single statement that carries chi-squared concentration into applications. It bounds the squared norm of a Gaussian vector, the variance estimate for normal data, and the noise residual after a projection.

Failure Mode

The constant 1/61/6 is not optimal in either tail individually; the upper-tail constant is 1/31/3 in the tlog(1+t)t2/3t - \log(1+t) \geq t^2/3 form on (0,1/2)(0, 1/2) and the lower-tail is 1/21/2 on (0,1)(0, 1). When the asymmetry matters (for example in sharp variance-estimation analyses) the two-sided bound is too crude. For deviations beyond t=1/2t = 1/2 use the sharp Chernoff exponent on each tail separately.

Common Confusions

Watch Out

Chi-squared is sub-exponential, not sub-Gaussian

A standard Gaussian satisfies E[eλX]=eλ2/2\mathbb{E}[e^{\lambda X}] = e^{\lambda^2 / 2} for every λ\lambda. A squared standard Gaussian satisfies E[eλX2]=(12λ)1/2\mathbb{E}[e^{\lambda X^2}] = (1 - 2 \lambda)^{-1/2} only on λ[0,1/2)\lambda \in [0, 1/2). The MGF blows up at the boundary λ=1/2\lambda = 1/2, which is the formal definition of a sub-exponential variable. Tail bounds that are quadratic in tt for small tt become linear in tt for large tt.

Watch Out

The constants 1/3, 1/4, 1/6 are not arbitrary

Each constant traces back to a specific elementary inequality. The upper-tail 1/31/3 comes from tlog(1+t)t2/3t - \log(1 + t) \geq t^2 / 3 on (0,1/2)(0, 1/2) (via the Taylor truncation tlog(1+t)t2/2t3/3t - \log(1+t) \geq t^2/2 - t^3/3 combined with t3/3t2/6t^3/3 \leq t^2/6 when t1/2t \leq 1/2). The lower-tail 1/21/2 comes from log(1t)tt2/2-\log(1 - t) - t \geq t^2 / 2 on (0,1)(0, 1). The two-sided 1/61/6 is the worse upper-tail 1/31/3 after absorbing the factor of 1/21/2 from the cumulant function ΛZ(λ)\Lambda_Z(\lambda) expansion. The upper-tail inequality tlog(1+t)t2/3t - \log(1+t) \geq t^2/3 does not hold on all of (0,3)(0, 3) — it fails near t=1t = 1 — so the simplified t2k/6t^2 k / 6 form is restricted to (0,1/2)(0, 1/2).

Watch Out

Independence of the squared variables, not of the original Gaussians

The chi-squared MGF factorizes because the XiX_i are i.i.d., and the squares Xi2X_i^2 inherit independence. If the underlying Gaussians are correlated (e.g., a quadratic form XAXX^\top A X for non-diagonal AA), the MGF is more complicated and the relevant concentration result is Hanson-Wright, not the displayed chi-squared bound.

Exercises

ExerciseCore

Problem

Verify the elementary inequality tlog(1+t)t2/3t - \log(1 + t) \geq t^2 / 3 on t(0,1/2)t \in (0, 1/2) used in the upper-tail proof. Then exhibit a value of t(1/2,3)t \in (1/2, 3) where the inequality fails to show that the natural extension to larger tt is incorrect.

ExerciseAdvanced

Problem

Let X=(X1,,Xk)X = (X_1, \ldots, X_k) be a centered isotropic Gaussian in Rk\mathbb{R}^k (i.e., E[XX]=Ik\mathbb{E}[X X^\top] = I_k). Use the chi-squared two-sided bound to prove that for every η(0,1/2)\eta \in (0, 1/2),

Pr ⁣[X2k>ηk]2exp(η2k/6).\Pr\!\left[\left|\,\|X\|^2 - k\,\right| > \eta k\right] \leq 2 \exp(-\eta^2 k / 6).

Then translate this into a bound on X\|X\| itself.

References

Canonical:

  • Laurent, B., & Massart, P. (2000). "Adaptive estimation of a quadratic functional by model selection." Annals of Statistics, 28(5), 1302-1338. The classical sharp form Pr[Zk+2kt+2t]et\Pr[Z \geq k + 2 \sqrt{k t} + 2 t] \leq e^{-t} is Lemma 1; the displayed t2k/6t^2 k / 6 form follows by a change of variable.
  • Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration Inequalities. Oxford University Press. Section 2.4 develops the chi-squared bound from the MGF and Section 5.1 puts it inside the gamma family.
  • Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.12 in Appendix B states the displayed two-sided form.

Current:

  • Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press. Example 2.4 in Section 2.2 derives the chi-squared MGF and identifies χk2\chi^2_k as the canonical sub-exponential variable with parameters (ν2,b)=(4k,4)(\nu^2, b) = (4k, 4).
  • Vershynin, R. (2018). High-Dimensional Probability. Cambridge University Press. Theorem 3.1.1 (concentration of the norm) gives the high-dimensional vector form.
  • van Handel, R. (2016). Probability in High Dimension. Lecture notes, Princeton. Chapter 3 derives the chi-squared two-sided bound directly from the gamma MGF.

Next Topics

Last reviewed: May 8, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

4

Derived topics

0

No published topic currently declares this as a prerequisite.