Concentration Probability
Sub-Exponential Random Variables
The distributional class between sub-Gaussian and heavy-tailed: heavier tails than Gaussian, the psi_1 norm, Bernstein condition, and the two-regime concentration bound.
Why This Matters
Sub-Gaussian random variables have tails decaying like and give the cleanest concentration bounds. But many quantities in ML are not sub-Gaussian:
- Chi-squared random variables (sums of squared Gaussians)
- Products of sub-Gaussian random variables
- Exponential random variables (waiting times)
- Squared losses in regression with Gaussian noise
These have tails decaying like . exponentially, but with a linear exponent rather than quadratic. The sub-exponential class captures exactly this level of tail behavior, and the Bernstein inequality for sub-exponential sums provides the right concentration bound: sub-Gaussian for small deviations, exponential for large deviations.
Mental Model
The tail decay hierarchy is:
| Class | Tail decay | MGF | Example |
|---|---|---|---|
| Sub-Gaussian | Finite for all | Bounded, Gaussian | |
| Sub-exponential | Finite for $ | \lambda | |
| Heavy-tailed | May not exist | Cauchy, Pareto |
Sub-exponential is the intermediate class. The MGF exists in a neighborhood of zero (not everywhere), which means the Chernoff method works for small but breaks for large . This produces a concentration bound with two regimes.
Formal Setup and Definitions
Let be a centered random variable ().
Sub-Exponential Random Variable (MGF characterization)
A centered random variable is sub-exponential with parameters if for all :
The parameter plays the role of a "variance proxy" (controlling the sub-Gaussian regime), and controls the radius where the MGF bound holds.
Sub-Exponential Norm (Orlicz psi_1 norm)
The sub-exponential norm (or -norm) of is:
A random variable is sub-exponential if and only if .
Compare to the sub-Gaussian norm: . The norm uses (linear in ) while uses (quadratic). The linear growth in the exponent is what makes sub-exponential tails heavier than sub-Gaussian.
Bernstein Condition
A centered random variable satisfies the Bernstein condition with parameters if for all integers :
This is equivalent (up to constants) to being sub-exponential with parameters . The condition controls all moments: the -th moment grows at most like , compared to sub-Gaussian where moments grow like .
Equivalent Characterizations
The following are equivalent (up to constants in parameters):
- MGF condition: for
- Tail condition: for all
- Moment condition: for all
- Orlicz norm:
- Bernstein condition: for
Compare to sub-Gaussian characterization 3: . Sub-exponential moments grow like ; sub-Gaussian moments grow like . The extra factor of is the precise difference between the two classes.
Main Theorems
Products of Sub-Gaussians are Sub-Exponential
Statement
If and are sub-Gaussian random variables (not necessarily independent), then the product is sub-exponential, with:
Intuition
Multiplying two "Gaussian-like" variables produces something "exponential-like." A concrete example: if , then is sub-Gaussian, but is chi-squared (with 1 degree of freedom) and sub-exponential. The tails get heavier because multiplying two variables that can be large makes the product even larger.
Proof Sketch
For any :
Split into cases: either or (or both). By union bound:
Since is sub-Gaussian: . Similarly for . So , which is a sub-exponential tail.
For the norm bound, use the identity and the relationship between and norms.
Why It Matters
This explains why chi-squared variables, quadratic forms, and many statistics in learning theory are sub-exponential: they involve products or squares of sub-Gaussian quantities. When you see or in a bound, expect sub-exponential concentration, not sub-Gaussian.
Failure Mode
The product of two sub-exponential variables is generally not sub-exponential (it may be even heavier-tailed). Sub-exponential is not closed under multiplication, unlike sub-Gaussian being closed under addition. This limits the composability of sub-exponential bounds.
Bernstein Inequality for Sub-Exponential Sums
Statement
Let be independent centered sub-exponential random variables with parameters . Then for any :
where is a universal constant. For i.i.d. variables with common parameter and sample mean :
Intuition
The bound has two regimes separated by the threshold :
Small deviations (): The term dominates the minimum. The bound is . a sub-Gaussian tail. For moderate deviations, the sub-exponential variable behaves as if it were sub-Gaussian.
Large deviations (): The term dominates. The bound is . an exponential (not Gaussian) tail. For large deviations, the heavier tails kick in and concentration is weaker.
The transition between regimes is smooth. The sub-Gaussian regime gives the familiar rates for moderate confidence levels.
Proof Sketch
Use the Chernoff method. For :
Optimize over :
- If the unconstrained optimum satisfies , we are in the sub-Gaussian regime: bound is .
- If , set (at the boundary): bound is . For large , this gives .
Taking the minimum of the two cases gives the stated bound.
Why It Matters
This is the correct concentration bound for quantities like where each is Gaussian. The sub-Gaussian bound does not apply (these are not sub-Gaussian), but the Bernstein bound does. The two-regime behavior is what you actually observe in practice: moderate deviations look Gaussian, but extreme deviations reveal the heavier tails.
Failure Mode
The bound requires knowing (or bounding) both and . For the sample variance of Gaussian data, and . If these parameters are poorly estimated, the bound may be loose in one or both regimes.
Key Examples of Sub-Exponential Variables
Chi-squared with k degrees of freedom
If independently, then with and .
The centered variable is sub-exponential with parameters and . It is not sub-Gaussian because each has MGF which diverges at . A sub-Gaussian MGF would be finite for all .
The Bernstein bound gives: .
Exponential distribution
If , then is centered with for .
This is finite only for , confirming sub-exponential (not sub-Gaussian) behavior. The norm is .
Product of two independent standard normals
Let independently. The product has and the tail bound:
which is sub-exponential (tail after substitution ). Indeed .
Sub-Exponential Closure Properties
-
Sum of independents: If are independent sub-exponential, then is sub-exponential with . (Note: unlike sub-Gaussian, the norm adds linearly, not in quadrature.)
-
Scalar multiplication: .
-
Product of sub-Gaussians: .
-
Not closed under products: The product of two sub-exponential variables may be heavier than sub-exponential.
Connection to Bernstein's Classical Inequality
The classical Bernstein inequality (from the concentration-inequalities page) for bounded random variables:
is a special case of the sub-exponential Bernstein inequality. Bounded random variables are sub-exponential (since they are even sub-Gaussian), and the denominator captures both regimes: sub-Gaussian when and exponential when .
Common Confusions
Sub-exponential is BETWEEN sub-Gaussian and heavy-tailed
Every sub-Gaussian variable is sub-exponential, but not vice versa. Sub-exponential is a weaker condition (allows heavier tails). The hierarchy is: bounded sub-Gaussian sub-exponential finite variance all distributions.
The MGF bound holds only for small lambda
For sub-Gaussian variables, for all . For sub-exponential, the bound holds only for . Beyond this, the MGF may diverge. This is why the Chernoff method produces two regimes: you can only optimize up to the boundary .
psi_1 norm vs psi_2 norm
(sub-Gaussian) uses in the definition. the square of in the exponent. (sub-exponential) uses . The absolute value of . The square in is what forces the tails to be Gaussian-like. Every sub-Gaussian variable is sub-exponential because for appropriate .
Summary
- Sub-exponential tails: (linear exponent)
- Sub-Gaussian tails: (quadratic exponent)
- Products of sub-Gaussians are sub-exponential ()
- Chi-squared and squared losses are sub-exponential, not sub-Gaussian
- Bernstein bound has two regimes: sub-Gaussian for small , exponential for large
- Threshold between regimes:
- norm adds linearly for sums; norm adds in quadrature
Exercises
Problem
Show that is sub-exponential but not sub-Gaussian. Compute the MGF and show it is finite only for .
Problem
Let . Show that (centered chi-squared with 1 degree of freedom) is sub-exponential by verifying the tail bound for large .
Problem
Let independently. Using the Bernstein inequality for sub-exponential variables, bound . Identify the two regimes.
Related Comparisons
References
Canonical:
- Vershynin, High-Dimensional Probability (2018), Chapter 2.7-2.8
- Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 2
Current:
-
Wainwright, High-Dimensional Statistics (2019), Chapter 2
-
Rigollet & Hutter, High-Dimensional Statistics (MIT lecture notes, 2023)
-
van Handel, Probability in High Dimension (2016), Chapters 1-3
Next Topics
Building on sub-exponential theory:
- Matrix concentration: extending sub-exponential bounds to matrix-valued random variables
- Epsilon-nets and covering numbers: combining concentration with geometric discretization
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
Builds on This
- Matrix ConcentrationLayer 3