Concentration Probability
Hanson-Wright Inequality
Concentration of quadratic forms X^T A X for sub-Gaussian random vectors: the two-term bound involving the Frobenius norm (Gaussian regime) and operator norm (extreme regime).
Prerequisites
Why This Matters
Scalar concentration inequalities (Hoeffding, Bernstein) control linear functions of independent random variables. sums of the form . But many quantities in statistics and machine learning are quadratic: sample covariance entries for , kernel evaluations , chi-squared statistics, and second-order U-statistics. For these, you need concentration of the quadratic form where is a random vector with independent entries.
The Hanson-Wright inequality is the definitive tool for this. It gives a two-term bound that captures two different regimes of deviation, and it is tight up to constants.
Mental Model
Consider the quadratic form where has independent sub-Gaussian entries. This is a sum of terms involving products of random variables, not just single variables. Products are harder to control because the tails are heavier (the product of two sub-Gaussians is sub-exponential, not sub-Gaussian).
The Hanson-Wright bound says: the deviation of from its expectation is controlled by two terms:
- Frobenius term : dominates for small deviations, behaves like Gaussian concentration (sub-Gaussian tail )
- Operator term : dominates for large deviations, behaves like sub-exponential concentration (tail )
The transition between regimes happens at .
Formal Setup and Notation
Let be a random vector with independent, centered, sub-Gaussian entries: and for all .
Let be a fixed (deterministic) matrix.
Quadratic Form
The quadratic form associated with matrix and random vector is:
Its expectation is . For isotropic (), this simplifies to .
Frobenius Norm
The Frobenius norm of is . It measures the total "mass" of across all entries. In the Hanson-Wright bound, controls the variance of the quadratic form: the Gaussian-regime fluctuations scale like .
Operator Norm
The operator norm is . It measures the maximum directional stretch. In Hanson-Wright, controls the extreme-regime tail: large deviations are governed by the single largest singular value of .
Core Relationship Between Norms
For any matrix :
The gap between the two norms measures how "spread out" the matrix is. If has rank 1, and the two terms in Hanson-Wright are comparable. If (identity), while , and the Frobenius term dominates for all reasonable deviations.
Main Theorems
Hanson-Wright Inequality
Statement
Let have independent, centered, sub-Gaussian components with . For any matrix and any :
where is a universal constant.
Intuition
The bound has two regimes:
Small deviations (): The term is smaller, so the bound is . This is a sub-Gaussian tail. It arises because for small deviations, the quadratic form behaves like a sum of many weakly correlated terms, and the CLT-like behavior dominates.
Large deviations (): The term is smaller, so the bound is . This is a sub-exponential tail. It arises because extreme deviations are driven by the largest eigenvalue direction of , where the quadratic form is for a single sub-Gaussian variable .
The crossover at is where the two terms are equal. Below , Gaussian behavior; above , exponential behavior.
Proof Sketch
The proof proceeds in three steps:
Step 1 (Decoupling). Replace with a decoupled form where is an independent copy of . The decoupling inequality states that for symmetric :
This reduces the problem from a quadratic form (products ) to a bilinear form (products ), which is easier to handle because and are independent.
Step 2 (Condition and apply Hoeffding). Condition on . Then is a sum of independent sub-Gaussian variables with variance proxy controlled by . Apply sub-Gaussian concentration to get a bound in terms of .
Step 3 (Control ). Use the bound for the operator-norm term and (when is isotropic) for the Frobenius term. Combine using a case split on whether is typical or large.
Why It Matters
The Hanson-Wright inequality is essential whenever you work with second-order statistics of random vectors:
-
Chi-squared concentration: , so and the bound gives . The Frobenius regime () dominates for ; the operator regime () dominates for .
-
Random kernel evaluations: inner products for random are quadratic in the joint vector .
-
Covariance estimation: off-diagonal entries of involve terms like , which are quadratic forms.
-
Second-order chaos: any polynomial of degree 2 in sub-Gaussian variables.
Failure Mode
The constant in the inequality is universal but unspecified. for precise numerical bounds in applications, you may need to track it through the proof. Also, Hanson-Wright requires independent entries; for dependent sub-Gaussian vectors, you need modified versions (e.g., for vectors with sub-Gaussian norm bounds but dependent entries, the inequality may still hold but with replaced by the sub-Gaussian norm of the entire vector).
Decoupling Inequality for Quadratic Forms
Statement
Let have independent centered entries and let be a symmetric matrix with . Let be an independent copy of . Then for all convex, increasing functions on :
where is a universal constant.
Intuition
Decoupling replaces the "entangled" quadratic form (where each appears in multiple terms) with the "decoupled" bilinear form (where each and appear in separate roles). The decoupled version is easier to analyze because once you condition on , the sum is linear in , and all the standard sub-Gaussian tools apply.
Proof Sketch
The proof uses a symmetrization-style argument. Introduce the decoupled form and use the independence of and together with the symmetry of to show that the tails of the coupled form are controlled by those of the decoupled form. The universal constant absorbs a factor from the randomization step.
Why It Matters
Decoupling is the key technical device that makes Hanson-Wright provable. It converts a hard problem (concentrating a quadratic form) into an easier one (concentrating a bilinear form, which is linear once you condition on one factor). This technique appears throughout the theory of U-statistics and chaos processes.
Failure Mode
Decoupling requires the diagonal of to be zero (or handled separately). The diagonal terms are not quadratic in the same sense. they are sums of independent sub-exponential variables and are controlled separately using standard sub-exponential concentration.
Two Regimes Explained
The Hanson-Wright bound can be rewritten in high-probability form. With probability at least :
The two terms correspond to the two norms:
| Regime | Tail behavior | Dominated by | Example |
|---|---|---|---|
| Gaussian ( small) | Chi-squared with | ||
| Extreme ( large) | Chi-squared with |
Canonical Examples
Chi-squared concentration
Let and . Then with . The norms are and . Hanson-Wright gives:
For (relative deviation ): the bound is when (Gaussian regime) and when (extreme regime). This matches the known chi-squared tail behavior.
Random kernel evaluation
Let be independent vectors with i.i.d. sub-Gaussian entries of norm . The inner product where and is a block matrix with off-diagonal blocks and zero diagonal blocks.
Then and . Hanson-Wright gives:
So random inner products in concentrate around 0 with fluctuations of order .
Quadratic form with rank-r matrix
If has rank with eigenvalues , then and .
The crossover point is . If is rank-1 (), then and the two regimes are identical: transitions directly to at .
Common Confusions
Hanson-Wright is not just Hoeffding applied to products
A naive approach would be to note that each term is sub-exponential (product of two sub-Gaussians) and apply a sub-exponential concentration bound. This gives a bound in terms of , which captures the Frobenius regime but misses the tighter operator-norm regime for large deviations. Hanson-Wright is strictly stronger because it also captures the term through the decoupling argument.
The matrix A need not be symmetric or positive semidefinite
The Hanson-Wright inequality applies to any matrix , not just symmetric or PSD ones. For a general , because for any antisymmetric (since is a scalar that equals its own negative). So you can always reduce to the symmetric part.
Sub-Gaussian entries, not sub-Gaussian vectors
Hanson-Wright requires the entries of to be independent sub-Gaussian, not just the vector to have sub-Gaussian norm. For random vectors with dependent entries (like uniform on the sphere), the standard Hanson-Wright does not apply directly. Modified versions exist but require different techniques (e.g., transportation-cost arguments).
Summary
- Hanson-Wright controls for sub-Gaussian
- Two-term bound:
- Frobenius norm controls the Gaussian (small deviation) regime
- Operator norm controls the extreme (large deviation) regime
- Crossover at
- Decoupling is the key proof technique: replace with using an independent copy
- For : recovers chi-squared concentration
- Applies to random kernel evaluations, covariance estimation, second-order chaos
Exercises
Problem
Let have i.i.d. entries and let where is the all-ones vector. The quadratic form . Compute and , and use Hanson-Wright to bound the deviation of from its expectation.
Problem
Let have i.i.d. sub-Gaussian entries with parameter and let be the orthogonal projection onto a -dimensional subspace. Use Hanson-Wright to show that concentrates around with sub-Gaussian fluctuations of order .
Problem
The Hanson-Wright inequality gives a bound of order in the Gaussian regime. Show that this is tight (up to constants) by computing the variance of when and verifying that for symmetric .
References
Canonical:
- Hanson & Wright, "A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables" (1971)
- Rudelson & Vershynin, "Hanson-Wright Inequality and Sub-Gaussian Concentration" (2013)
- Vershynin, High-Dimensional Probability (2018), Chapter 6
Current:
-
Wainwright, High-Dimensional Statistics (2019), Chapter 6
-
Adamczak, "A Note on the Hanson-Wright Inequality for Random Vectors with Dependencies" (2015)
-
Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapters 2-6
Next Topics
Building on quadratic form concentration:
- Random matrix theory overview: asymptotic behavior of eigenvalues and eigenvectors of large random matrices
- Kernels and RKHS: random kernel evaluations use Hanson-Wright for concentration of kernel matrices
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Sub-Gaussian Random VariablesLayer 2
- Concentration InequalitiesLayer 1
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Expectation, Variance, Covariance, and MomentsLayer 0A
- Matrix ConcentrationLayer 3
- Sub-Exponential Random VariablesLayer 2