What Each Measures
Both sub-Gaussian and sub-exponential are tail-decay classes: they describe how fast decreases as . A random variable is classified based on how its tails compare to specific reference distributions.
Sub-Gaussian: tails decay at least as fast as a Gaussian. Prototypical example: any bounded random variable.
Sub-Exponential: tails decay at least as fast as an exponential. Strictly heavier than sub-Gaussian, but still lighter than polynomial tails.
Side-by-Side Definitions
Sub-Gaussian Random Variable
A centered random variable is sub-Gaussian with parameter if any of the following equivalent conditions holds:
- Tail condition: for all
- MGF condition: for all
- Moment condition: for all
The sub-Gaussian norm is .
Sub-Exponential Random Variable
A centered random variable is sub-exponential with parameters if any of the following equivalent conditions holds:
- Tail condition: for all sufficiently large
- MGF condition: for
- Moment condition: for all
The sub-exponential norm is .
Tail Behavior: The Core Difference
The fundamental distinction is in the tail decay rate:
| Property | Sub-Gaussian | Sub-Exponential |
|---|---|---|
| Tail decay | for all | for large |
| MGF finite | For all | Only for $ |
| Moments | ||
| Orlicz norm | : finite | : finite |
The difference is sharpest in the tails. For small , both classes behave similarly (Gaussian-like). For large , sub-Gaussian tails decay quadratically in the exponent (), while sub-exponential tails decay only linearly (). This means sub-exponential variables have occasional large values that are much more likely than a Gaussian would predict.
Concentration for Sums
The tail classes directly determine how sums concentrate:
Hoeffding-type Bound for Sub-Gaussian Sums
Statement
If are independent, centered, sub-Gaussian with parameters , then:
The sum is also sub-Gaussian. Gaussian-quality concentration everywhere.
Bernstein-type Bound for Sub-Exponential Sums
Statement
If are independent, centered, sub-exponential with parameters , then:
Two regimes: sub-Gaussian () for small , sub-exponential () for large . The transition occurs at .
This is exactly the Bernstein phenomenon: sums of sub-exponential variables have Gaussian concentration near the mean and exponential concentration in the tails.
The Fundamental Relationship: Products and Squares
Products of Sub-Gaussians Are Sub-Exponential
Statement
If and are sub-Gaussian, then is sub-exponential. Specifically:
In particular, if is sub-Gaussian, then is sub-exponential.
Intuition
A sub-Gaussian variable has . When you square it, , which is finite only for sufficiently large . This is the sub-exponential condition, not the sub-Gaussian one. Squaring "promotes" the tail from to : the quadratic exponent becomes linear.
Canonical Examples
Chi-squared: sub-exponential but not sub-Gaussian
Let be i.i.d. . Each is sub-Gaussian. The chi-squared statistic is a sum of , each of which is sub-exponential (as the square of a sub-Gaussian). So is sub-exponential.
But is not sub-Gaussian. Its tail satisfies for large , not . The MGF is finite only for , not for all .
This is the prototypical example: chi-squared statistics arise everywhere in statistics and ML (e.g., quadratic forms, variance estimates, kernel evaluations), and they require sub-exponential theory.
Bounded variables: sub-Gaussian
If almost surely, then is sub-Gaussian with parameter . This is Hoeffding's lemma. Bounded variables are the best-behaved class: they are sub-Gaussian, and Hoeffding's inequality is a special case of sub-Gaussian concentration.
Exponential distribution: sub-exponential but not sub-Gaussian
If , then is sub-exponential with parameter . Its tail is , which decays linearly in , not quadratically. The MGF exists only for .
The Bernstein Condition
The Bernstein condition provides a clean characterization of when a variable is sub-exponential via its moments:
Bernstein Condition
A centered random variable satisfies the Bernstein condition with parameter if for all integers :
This is equivalent to being sub-exponential. The condition says that higher moments grow at most factorially (like those of an exponential distribution), not faster.
The contrast: sub-Gaussian variables have moments growing like , which is slower than factorial. Sub-exponential variables have factorial moment growth, which is faster but still controlled.
Where Each Fails
Sub-Gaussian fails for heavy-tailed data
Many real-world distributions have heavier tails than sub-Gaussian: log-normal returns in finance, power-law degree distributions in networks, and noise in robust statistics. For these, even sub-exponential can be too restrictive, and you may need polynomial tail bounds (finite moments only) or heavy-tailed concentration tools like the median-of-means estimator.
Sub-Exponential fails for the heaviest tails
If for all , the variable is not sub-exponential. Pareto distributions, Cauchy distributions, and other power-law-tailed variables fall outside both classes. For these you need truncation or robust estimation techniques.
What to Memorize
| Sub-Gaussian | Sub-Exponential | |
|---|---|---|
| Tail | (large ) | |
| MGF domain | All | $ |
| Norm | ||
| Moment growth | ||
| Inclusion | Sub-Gaussian Sub-Exponential | (strictly larger) |
| Closed under | Addition, linear combinations | Addition, linear combinations |
| Products | is sub-exponential | No clean closure |
Key facts to internalize:
- Every sub-Gaussian variable is sub-exponential (but not vice versa)
- sub-Gaussian sub-exponential
- is the canonical sub-exponential-but-not-sub-Gaussian example
- Bernstein's inequality is the concentration result for sub-exponential sums
When a Researcher Would Use Each
Bounding the sample mean of bounded losses
The loss is sub-Gaussian. Use sub-Gaussian concentration (Hoeffding). This is the standard setting in learning theory and gives the cleanest bounds.
Bounding quadratic forms or variance estimates
The sample variance involves squared terms. Each is sub-exponential when is sub-Gaussian. Use sub-exponential concentration (Bernstein-type bounds). This arises in covariance estimation, kernel methods, and random matrix theory.
Analyzing inner products of random vectors
If have sub-Gaussian entries, their inner product is a sum of products of sub-Gaussians, hence a sum of sub-exponential variables. Use sub-exponential concentration for the sum. This is central to compressed sensing and random projection arguments.
Common Confusions
Sub-exponential does not mean exponential distribution
The term "sub-exponential" means "tails no heavier than exponential." An exponential random variable is sub-exponential, but so are many others (chi-squared, squared Gaussians, products of Gaussians). The name describes a tail class, not a specific distribution.
The two-regime behavior of Bernstein is not a weakness
Bernstein's bound has a sub-Gaussian regime (small ) and a sub-exponential regime (large ). This is not a flaw. It is an accurate reflection of how sub-exponential sums actually behave. The bound is sharp in both regimes. The transition point is where the tail character changes from Gaussian to exponential.