Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Concentration Probability

Hanson-Wright Inequality

Concentration of quadratic forms X^T A X for sub-Gaussian random vectors: the two-term bound involving the Frobenius norm (Gaussian regime) and operator norm (extreme regime).

AdvancedTier 2Stable~55 min
0

Why This Matters

Scalar concentration inequalities (Hoeffding, Bernstein) control linear functions of independent random variables. sums of the form iaiXi\sum_i a_i X_i. But many quantities in statistics and machine learning are quadratic: sample covariance entries 1niXiXj\frac{1}{n}\sum_i X_i X_j for iji \neq j, kernel evaluations k(x,x)=ϕ(x)ϕ(x)k(x, x') = \phi(x)^\top \phi(x'), chi-squared statistics, and second-order U-statistics. For these, you need concentration of the quadratic form XAXX^\top A X where XX is a random vector with independent entries.

The Hanson-Wright inequality is the definitive tool for this. It gives a two-term bound that captures two different regimes of deviation, and it is tight up to constants.

Mental Model

Consider the quadratic form XAXX^\top A X where XRnX \in \mathbb{R}^n has independent sub-Gaussian entries. This is a sum of terms AijXiXjA_{ij} X_i X_j involving products of random variables, not just single variables. Products are harder to control because the tails are heavier (the product of two sub-Gaussians is sub-exponential, not sub-Gaussian).

The Hanson-Wright bound says: the deviation of XAXX^\top A X from its expectation is controlled by two terms:

  1. Frobenius term AF\|A\|_F: dominates for small deviations, behaves like Gaussian concentration (sub-Gaussian tail ect2e^{-ct^2})
  2. Operator term Aop\|A\|_{\text{op}}: dominates for large deviations, behaves like sub-exponential concentration (tail ecte^{-ct})

The transition between regimes happens at tAF2/Aopt \approx \|A\|_F^2 / \|A\|_{\text{op}}.

Formal Setup and Notation

Let X=(X1,,Xn)RnX = (X_1, \ldots, X_n) \in \mathbb{R}^n be a random vector with independent, centered, sub-Gaussian entries: E[Xi]=0\mathbb{E}[X_i] = 0 and Xiψ2K\|X_i\|_{\psi_2} \leq K for all ii.

Let ARn×nA \in \mathbb{R}^{n \times n} be a fixed (deterministic) matrix.

Definition

Quadratic Form

The quadratic form associated with matrix AA and random vector XX is:

XAX=i,j=1nAijXiXjX^\top A X = \sum_{i,j=1}^n A_{ij} X_i X_j

Its expectation is E[XAX]=iAiiE[Xi2]=tr(Adiag(E[Xi2]))\mathbb{E}[X^\top A X] = \sum_i A_{ii} \mathbb{E}[X_i^2] = \text{tr}(A \cdot \text{diag}(\mathbb{E}[X_i^2])). For isotropic XX (E[Xi2]=1\mathbb{E}[X_i^2] = 1), this simplifies to tr(A)\text{tr}(A).

Definition

Frobenius Norm

The Frobenius norm of AA is AF=i,jAij2=tr(AA)\|A\|_F = \sqrt{\sum_{i,j} A_{ij}^2} = \sqrt{\text{tr}(A^\top A)}. It measures the total "mass" of AA across all entries. In the Hanson-Wright bound, AF\|A\|_F controls the variance of the quadratic form: the Gaussian-regime fluctuations scale like AF\|A\|_F.

Definition

Operator Norm

The operator norm is Aop=supv=1Av\|A\|_{\text{op}} = \sup_{\|v\|=1} \|Av\|. It measures the maximum directional stretch. In Hanson-Wright, Aop\|A\|_{\text{op}} controls the extreme-regime tail: large deviations are governed by the single largest singular value of AA.

Core Relationship Between Norms

For any n×nn \times n matrix AA:

AopAFnAop\|A\|_{\text{op}} \leq \|A\|_F \leq \sqrt{n} \|A\|_{\text{op}}

The gap between the two norms measures how "spread out" the matrix is. If AA has rank 1, AF=Aop\|A\|_F = \|A\|_{\text{op}} and the two terms in Hanson-Wright are comparable. If A=InA = I_n (identity), AF=n\|A\|_F = \sqrt{n} while Aop=1\|A\|_{\text{op}} = 1, and the Frobenius term dominates for all reasonable deviations.

Main Theorems

Theorem

Hanson-Wright Inequality

Statement

Let X=(X1,,Xn)X = (X_1, \ldots, X_n) have independent, centered, sub-Gaussian components with Xiψ2K\|X_i\|_{\psi_2} \leq K. For any n×nn \times n matrix AA and any t>0t > 0:

P ⁣(XAXE[XAX]t)2exp ⁣(cmin ⁣(t2K4AF2,  tK2Aop))\mathbb{P}\!\bigl(|X^\top A X - \mathbb{E}[X^\top A X]| \geq t\bigr) \leq 2\exp\!\Bigl(-c \min\!\Bigl(\frac{t^2}{K^4 \|A\|_F^2},\; \frac{t}{K^2 \|A\|_{\text{op}}}\Bigr)\Bigr)

where c>0c > 0 is a universal constant.

Intuition

The bound has two regimes:

Small deviations (tK2AF2/Aopt \lesssim K^2 \|A\|_F^2 / \|A\|_{\text{op}}): The t2/AF2t^2 / \|A\|_F^2 term is smaller, so the bound is exp(ct2/(K4AF2))\exp(-ct^2/(K^4\|A\|_F^2)). This is a sub-Gaussian tail. It arises because for small deviations, the quadratic form behaves like a sum of many weakly correlated terms, and the CLT-like behavior dominates.

Large deviations (tK2AF2/Aopt \gtrsim K^2 \|A\|_F^2 / \|A\|_{\text{op}}): The t/Aopt / \|A\|_{\text{op}} term is smaller, so the bound is exp(ct/(K2Aop))\exp(-ct/(K^2\|A\|_{\text{op}})). This is a sub-exponential tail. It arises because extreme deviations are driven by the largest eigenvalue direction of AA, where the quadratic form is λmax(A)Xv2\lambda_{\max}(A) \cdot X_v^2 for a single sub-Gaussian variable Xv=vXX_v = v^\top X.

The crossover at t=K2AF2/Aopt^* = K^2 \|A\|_F^2 / \|A\|_{\text{op}} is where the two terms are equal. Below tt^*, Gaussian behavior; above tt^*, exponential behavior.

Proof Sketch

The proof proceeds in three steps:

Step 1 (Decoupling). Replace XAXX^\top A X with a decoupled form X~AY\tilde{X}^\top A Y where YY is an independent copy of XX. The decoupling inequality states that for symmetric AA:

P(XAXE[XAX]t)CP(XAYct)\mathbb{P}(|X^\top A X - \mathbb{E}[X^\top A X]| \geq t) \leq C \cdot \mathbb{P}(|X^\top A Y| \geq ct)

This reduces the problem from a quadratic form (products XiXjX_i X_j) to a bilinear form (products XiYjX_i Y_j), which is easier to handle because XX and YY are independent.

Step 2 (Condition and apply Hoeffding). Condition on YY. Then XAY=iXi(AY)iX^\top A Y = \sum_i X_i (AY)_i is a sum of independent sub-Gaussian variables with variance proxy controlled by AY2\|AY\|^2. Apply sub-Gaussian concentration to get a bound in terms of AY\|AY\|.

Step 3 (Control AY\|AY\|). Use the bound AYAopY\|AY\| \leq \|A\|_{\text{op}} \|Y\| for the operator-norm term and E[AY2]=AF2\mathbb{E}[\|AY\|^2] = \|A\|_F^2 (when YY is isotropic) for the Frobenius term. Combine using a case split on whether Y\|Y\| is typical or large.

Why It Matters

The Hanson-Wright inequality is essential whenever you work with second-order statistics of random vectors:

  • Chi-squared concentration: X2=XIX\|X\|^2 = X^\top I X, so A=IA = I and the bound gives P(X2nt)2exp(cmin(t2/n,t))\mathbb{P}(|\|X\|^2 - n| \geq t) \leq 2\exp(-c\min(t^2/n, t)). The Frobenius regime (t2/nt^2/n) dominates for tnt \leq n; the operator regime (tt) dominates for tnt \geq n.

  • Random kernel evaluations: inner products xyx^\top y for random x,yx, y are quadratic in the joint vector (x,y)(x, y).

  • Covariance estimation: off-diagonal entries of Σ^\hat{\Sigma} involve terms like 1nkXk(i)Xk(j)\frac{1}{n}\sum_k X_k^{(i)} X_k^{(j)}, which are quadratic forms.

  • Second-order chaos: any polynomial of degree 2 in sub-Gaussian variables.

Failure Mode

The constant cc in the inequality is universal but unspecified. for precise numerical bounds in applications, you may need to track it through the proof. Also, Hanson-Wright requires independent entries; for dependent sub-Gaussian vectors, you need modified versions (e.g., for vectors with sub-Gaussian norm bounds but dependent entries, the inequality may still hold but with KK replaced by the sub-Gaussian norm of the entire vector).

Lemma

Decoupling Inequality for Quadratic Forms

Statement

Let XX have independent centered entries and let AA be a symmetric matrix with Aii=0A_{ii} = 0. Let YY be an independent copy of XX. Then for all convex, increasing functions Φ\Phi on [0,)[0, \infty):

E ⁣[Φ ⁣(XAX)]E ⁣[Φ ⁣(CXAY)]\mathbb{E}\!\Bigl[\Phi\!\bigl(|X^\top A X|\bigr)\Bigr] \leq \mathbb{E}\!\Bigl[\Phi\!\bigl(C \cdot |X^\top A Y|\bigr)\Bigr]

where CC is a universal constant.

Intuition

Decoupling replaces the "entangled" quadratic form ijAijXiXj\sum_{i \neq j} A_{ij} X_i X_j (where each XiX_i appears in multiple terms) with the "decoupled" bilinear form i,jAijXiYj\sum_{i,j} A_{ij} X_i Y_j (where each XiX_i and YjY_j appear in separate roles). The decoupled version is easier to analyze because once you condition on YY, the sum is linear in XX, and all the standard sub-Gaussian tools apply.

Proof Sketch

The proof uses a symmetrization-style argument. Introduce the decoupled form and use the independence of XX and YY together with the symmetry of AA to show that the tails of the coupled form are controlled by those of the decoupled form. The universal constant CC absorbs a factor from the randomization step.

Why It Matters

Decoupling is the key technical device that makes Hanson-Wright provable. It converts a hard problem (concentrating a quadratic form) into an easier one (concentrating a bilinear form, which is linear once you condition on one factor). This technique appears throughout the theory of U-statistics and chaos processes.

Failure Mode

Decoupling requires the diagonal of AA to be zero (or handled separately). The diagonal terms AiiXi2A_{ii} X_i^2 are not quadratic in the same sense. they are sums of independent sub-exponential variables and are controlled separately using standard sub-exponential concentration.

Two Regimes Explained

The Hanson-Wright bound can be rewritten in high-probability form. With probability at least 1δ1 - \delta:

XAXE[XAX]K2 ⁣(AFlog(2/δ)+Aoplog(2/δ))|X^\top A X - \mathbb{E}[X^\top A X]| \lesssim K^2\!\left(\|A\|_F \sqrt{\log(2/\delta)} + \|A\|_{\text{op}} \log(2/\delta)\right)

The two terms correspond to the two norms:

RegimeTail behaviorDominated byExample
Gaussian (tt small)ect2e^{-ct^2}AF\|A\|_FChi-squared with tnt \ll n
Extreme (tt large)ecte^{-ct}Aop\|A\|_{\text{op}}Chi-squared with tnt \gg n

Canonical Examples

Example

Chi-squared concentration

Let XN(0,In)X \sim \mathcal{N}(0, I_n) and A=InA = I_n. Then XAX=X2χn2X^\top A X = \|X\|^2 \sim \chi^2_n with E[X2]=n\mathbb{E}[\|X\|^2] = n. The norms are IF=n\|I\|_F = \sqrt{n} and Iop=1\|I\|_{\text{op}} = 1. Hanson-Wright gives:

P(X2nt)2exp ⁣(cmin(t2/n,  t))\mathbb{P}(|\|X\|^2 - n| \geq t) \leq 2\exp\!\bigl(-c\min(t^2/n,\; t)\bigr)

For t=ϵnt = \epsilon n (relative deviation ϵ\epsilon): the bound is exp(cϵ2n)\exp(-c\epsilon^2 n) when ϵ1\epsilon \leq 1 (Gaussian regime) and exp(cϵn)\exp(-c\epsilon n) when ϵ1\epsilon \geq 1 (extreme regime). This matches the known chi-squared tail behavior.

Example

Random kernel evaluation

Let x,yRdx, y \in \mathbb{R}^d be independent vectors with i.i.d. sub-Gaussian entries of norm KK. The inner product xy=ZAZx^\top y = Z^\top A Z where Z=(x,y)R2dZ = (x, y) \in \mathbb{R}^{2d} and AA is a 2d×2d2d \times 2d block matrix with off-diagonal blocks Id/2I_d/2 and zero diagonal blocks.

Then AF=d/2\|A\|_F = \sqrt{d}/\sqrt{2} and Aop=1/2\|A\|_{\text{op}} = 1/2. Hanson-Wright gives:

P(xyt)2exp ⁣(cmin(t2/d,  t))\mathbb{P}(|x^\top y| \geq t) \leq 2\exp\!\bigl(-c\min(t^2/d,\; t)\bigr)

So random inner products in Rd\mathbb{R}^d concentrate around 0 with fluctuations of order d\sqrt{d}.

Example

Quadratic form with rank-r matrix

If AA has rank rr with eigenvalues λ1λr>0\lambda_1 \geq \cdots \geq \lambda_r > 0, then AF=iλi2\|A\|_F = \sqrt{\sum_i \lambda_i^2} and Aop=λ1\|A\|_{\text{op}} = \lambda_1.

The crossover point is tiλi2/λ1t^* \approx \sum_i \lambda_i^2 / \lambda_1. If AA is rank-1 (A=λvvA = \lambda vv^\top), then AF=λ=Aop\|A\|_F = |\lambda| = \|A\|_{\text{op}} and the two regimes are identical: ect2/λ2e^{-ct^2/\lambda^2} transitions directly to ect/λe^{-ct/|\lambda|} at tλt \approx |\lambda|.

Common Confusions

Watch Out

Hanson-Wright is not just Hoeffding applied to products

A naive approach would be to note that each term AijXiXjA_{ij} X_i X_j is sub-exponential (product of two sub-Gaussians) and apply a sub-exponential concentration bound. This gives a bound in terms of ijAij2=AF2\sum_{ij} A_{ij}^2 = \|A\|_F^2, which captures the Frobenius regime but misses the tighter operator-norm regime for large deviations. Hanson-Wright is strictly stronger because it also captures the Aop\|A\|_{\text{op}} term through the decoupling argument.

Watch Out

The matrix A need not be symmetric or positive semidefinite

The Hanson-Wright inequality applies to any matrix AA, not just symmetric or PSD ones. For a general AA, XAX=X((A+A)/2)XX^\top A X = X^\top ((A + A^\top)/2) X because XBX=0X^\top B X = 0 for any antisymmetric BB (since XBXX^\top B X is a scalar that equals its own negative). So you can always reduce to the symmetric part.

Watch Out

Sub-Gaussian entries, not sub-Gaussian vectors

Hanson-Wright requires the entries of XX to be independent sub-Gaussian, not just the vector XX to have sub-Gaussian norm. For random vectors with dependent entries (like uniform on the sphere), the standard Hanson-Wright does not apply directly. Modified versions exist but require different techniques (e.g., transportation-cost arguments).

Summary

  • Hanson-Wright controls XAXE[XAX]|X^\top A X - \mathbb{E}[X^\top A X]| for sub-Gaussian XX
  • Two-term bound: exp(cmin(t2/AF2,t/Aop))\exp(-c\min(t^2/\|A\|_F^2, t/\|A\|_{\text{op}}))
  • Frobenius norm AF\|A\|_F controls the Gaussian (small deviation) regime
  • Operator norm Aop\|A\|_{\text{op}} controls the extreme (large deviation) regime
  • Crossover at t=AF2/Aopt^* = \|A\|_F^2 / \|A\|_{\text{op}}
  • Decoupling is the key proof technique: replace XAXX^\top A X with XAYX^\top A Y using an independent copy YY
  • For A=IA = I: recovers chi-squared concentration
  • Applies to random kernel evaluations, covariance estimation, second-order chaos

Exercises

ExerciseCore

Problem

Let XRnX \in \mathbb{R}^n have i.i.d. N(0,1)\mathcal{N}(0, 1) entries and let A=1n(11I)A = \frac{1}{n}(11^\top - I) where 11 is the all-ones vector. The quadratic form XAX=1n(iXi)21niXi2X^\top A X = \frac{1}{n}(\sum_i X_i)^2 - \frac{1}{n}\sum_i X_i^2. Compute AF\|A\|_F and Aop\|A\|_{\text{op}}, and use Hanson-Wright to bound the deviation of XAXX^\top A X from its expectation.

ExerciseAdvanced

Problem

Let XRnX \in \mathbb{R}^n have i.i.d. sub-Gaussian entries with parameter KK and let PP be the orthogonal projection onto a kk-dimensional subspace. Use Hanson-Wright to show that PX2=XPX\|PX\|^2 = X^\top P X concentrates around kk with sub-Gaussian fluctuations of order k\sqrt{k}.

ExerciseResearch

Problem

The Hanson-Wright inequality gives a bound of order AFlog(1/δ)\|A\|_F\sqrt{\log(1/\delta)} in the Gaussian regime. Show that this is tight (up to constants) by computing the variance of XAXX^\top A X when XN(0,I)X \sim \mathcal{N}(0, I) and verifying that Var(XAX)=2AF2\text{Var}(X^\top A X) = 2\|A\|_F^2 for symmetric AA.

References

Canonical:

  • Hanson & Wright, "A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables" (1971)
  • Rudelson & Vershynin, "Hanson-Wright Inequality and Sub-Gaussian Concentration" (2013)
  • Vershynin, High-Dimensional Probability (2018), Chapter 6

Current:

  • Wainwright, High-Dimensional Statistics (2019), Chapter 6

  • Adamczak, "A Note on the Hanson-Wright Inequality for Random Vectors with Dependencies" (2015)

  • Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapters 2-6

Next Topics

Building on quadratic form concentration:

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics