Concentration Probability
Chi-Squared Concentration
Two-sided exponential concentration for chi-squared sums of squared standard Gaussians: P(|Z/k - 1| > t) <= 2 exp(-t^2 k / 6) for t in (0, 1/2). Drives sub-exponential tails, variance-component inference, and Lipschitz Gaussian concentration.
Prerequisites
Why This Matters
A chi-squared random variable with degrees of freedom is a sum of independent squared standard Gaussians. Its concentration around its mean shows up everywhere: variance-component inference, goodness of fit, the squared-norm of a high-dimensional Gaussian, the noise term in Hanson-Wright, and the proxy term in Lipschitz Gaussian concentration. The result quoted on this page is the one most often used as a black box in high-dimensional statistics:
which is sub-exponential rather than sub-Gaussian: the exponent is linear in and quadratic in the relative deviation , with the same -scaling as the central limit theorem but valid non-asymptotically. The displayed form is restricted to small relative deviations , the regime where the elementary inequality used in the proof is valid; for larger , use the sharp Chernoff exponent directly.
Mental Model
Three orienting facts.
- Mean and variance. For with i.i.d., and . Standardized, is asymptotically by the classical CLT. Sub-exponential concentration is what carries that asymptotic intuition into the finite-sample regime.
- Squared standardness is sub-exponential, not sub-Gaussian. A single has MGF only for . The MGF blows up at , and that pole is exactly why the upper-tail constant is rather than the that a Gaussian tail would give.
- The two tails are not symmetric for arbitrary . The lower tail is bounded by via the same Chernoff-method argument; the upper tail is bounded by . The two-sided in the displayed bound is the worst constant of the two, chosen so that one statement covers both tails.
Formal Setup
Let be i.i.d. and define . The cumulant generating function of is
obtained by raising the per-summand MGF to the -th power.
Chi-Squared Upper Tail
Statement
For every ,
Exact statement
LaTeX source for copy/export
\Pr[Z \geq (1 + t) k] \leq \exp\!\left(-\frac{t^2 k}{6}\right)Proof Sketch
Step 1: Chernoff for . For ,
Step 2: optimize. Differentiating in and setting to zero gives . Substituting back yields the sharp Chernoff exponent
Step 3: simplify on . The Taylor expansion is an alternating series with strictly decreasing terms for , so truncating after two terms gives the lower bound
For , (since means ), so
Plugging into the sharp exponent gives the displayed bound.
Why It Matters
The sharp Chernoff exponent is the Cramér rate function for the gamma distribution and is what large-deviations theory delivers in the limit. The simplified form on is what the rest of high-dimensional statistics uses as a black box for small-deviation bounds.
Failure Mode
The simplification holds on via the Taylor argument above and continues to hold by direct computation up to , but fails for larger (e.g., at , ). For outside the clean small-deviation regime, use the sharp Chernoff exponent directly, or pass to the linear-in- regime characteristic of sub-exponential tails.
Chi-Squared Lower Tail
Statement
For every ,
Exact statement
LaTeX source for copy/export
\Pr[Z \leq (1 - t) k] \leq \exp\!\left(-\frac{t^2 k}{4}\right)Proof Sketch
Apply the Chernoff method with (i.e., bound for and use Markov on ):
Optimizing over gives and the sharp exponent . Equivalently, the sharp exponent is , which on satisfies . This last inequality is verified directly from the Taylor series . Substituting yields the displayed exponent.
Why It Matters
The lower-tail constant is sharper than the upper-tail . The asymmetry is real: deviating below the mean is harder than deviating above, because squared Gaussians have nonnegative support and the upper tail can absorb mass from very large , while the lower tail is bounded by .
Failure Mode
The lower-tail bound is restricted to because must remain positive. For the bound degenerates polynomially, and a sharper Chernoff exponent is needed to track the rate at which the chi-squared mass approaches the origin.
Chi-Squared Two-Sided Bound
Statement
For every ,
Exact statement
LaTeX source for copy/export
\Pr\!\left[\left|Z/k - 1\right| \geq t\right] \leq 2 \exp\!\left(-\frac{t^2 k}{6}\right)Intuition
Combining the upper-tail on and the lower-tail on via union bound gives on the common range , with the worse constant appearing in the exponent. The bound is sharp up to constants and matches the variance scaling .
Why It Matters
This is the single statement that carries chi-squared concentration into applications. It bounds the squared norm of a Gaussian vector, the variance estimate for normal data, and the noise residual after a projection.
Failure Mode
The constant is not optimal in either tail individually; the upper-tail constant is in the form on and the lower-tail is on . When the asymmetry matters (for example in sharp variance-estimation analyses) the two-sided bound is too crude. For deviations beyond use the sharp Chernoff exponent on each tail separately.
Common Confusions
Chi-squared is sub-exponential, not sub-Gaussian
A standard Gaussian satisfies for every . A squared standard Gaussian satisfies only on . The MGF blows up at the boundary , which is the formal definition of a sub-exponential variable. Tail bounds that are quadratic in for small become linear in for large .
The constants 1/3, 1/4, 1/6 are not arbitrary
Each constant traces back to a specific elementary inequality. The upper-tail comes from on (via the Taylor truncation combined with when ). The lower-tail comes from on . The two-sided is the worse upper-tail after absorbing the factor of from the cumulant function expansion. The upper-tail inequality does not hold on all of — it fails near — so the simplified form is restricted to .
Independence of the squared variables, not of the original Gaussians
The chi-squared MGF factorizes because the are i.i.d., and the squares inherit independence. If the underlying Gaussians are correlated (e.g., a quadratic form for non-diagonal ), the MGF is more complicated and the relevant concentration result is Hanson-Wright, not the displayed chi-squared bound.
Exercises
Problem
Verify the elementary inequality on used in the upper-tail proof. Then exhibit a value of where the inequality fails to show that the natural extension to larger is incorrect.
Problem
Let be a centered isotropic Gaussian in (i.e., ). Use the chi-squared two-sided bound to prove that for every ,
Then translate this into a bound on itself.
References
Canonical:
- Laurent, B., & Massart, P. (2000). "Adaptive estimation of a quadratic functional by model selection." Annals of Statistics, 28(5), 1302-1338. The classical sharp form is Lemma 1; the displayed form follows by a change of variable.
- Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration Inequalities. Oxford University Press. Section 2.4 develops the chi-squared bound from the MGF and Section 5.1 puts it inside the gamma family.
- Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.12 in Appendix B states the displayed two-sided form.
Current:
- Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press. Example 2.4 in Section 2.2 derives the chi-squared MGF and identifies as the canonical sub-exponential variable with parameters .
- Vershynin, R. (2018). High-Dimensional Probability. Cambridge University Press. Theorem 3.1.1 (concentration of the norm) gives the high-dimensional vector form.
- van Handel, R. (2016). Probability in High Dimension. Lecture notes, Princeton. Chapter 3 derives the chi-squared two-sided bound directly from the gamma MGF.
Next Topics
- Sub-exponential random variables: the abstract framework around the chi-squared MGF, where the squared Gaussian is the canonical example
- Hanson-Wright inequality: the generalization to quadratic forms for sub-Gaussian and arbitrary
- Bernstein's inequality: the variance-aware scalar cousin used when the summands are bounded rather than Gaussian
Last reviewed: May 8, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
4- Common Probability Distributionslayer 0A · tier 1
- Chernoff Boundslayer 1 · tier 1
- Concentration Inequalitieslayer 1 · tier 1
- Moment Generating Functionslayer 0A · tier 2
Derived topics
0No published topic currently declares this as a prerequisite.