Concentration Probability
Chernoff Bounds
The Chernoff method: the universal technique for deriving exponential tail bounds by optimizing over the moment generating function, yielding the tightest possible exponential concentration.
Prerequisites
Why This Matters
The Chernoff method is not a single inequality. It is a technique, and it is the single most important proof technique in concentration inequalities. Every exponential tail bound you have seen (Hoeffding, Bernstein, sub-Gaussian bounds, Bennett) is derived by the same recipe:
- Exponentiate: convert into
- Apply Markov:
- Optimize: choose to minimize the bound
This three-step recipe. The Chernoff method. produces the tightest exponential bound achievable from the moment generating function. If you master this one idea, you can derive Hoeffding, Bernstein, and sub-Gaussian bounds from scratch.
Mental Model
Think of the Chernoff method as applying Markov's inequality to the "best possible" monotone function of . Markov says for any non-negative increasing . The exponential is the optimal choice because it grows the fastest (among functions whose expectation is finite), penalizing large values of most aggressively. The free parameter lets you tune the exponential to the specific threshold .
Why is this tighter than Chebyshev? Chebyshev uses , which gives polynomial tails (). The exponential gives exponential tails ( or ), which are dramatically better for large .
Formal Setup and Notation
Let be a real-valued random variable with moment generating function (MGF) .
Moment Generating Function
The moment generating function of is for those where the expectation is finite. The MGF "generates" moments: . The existence of the MGF in a neighborhood of zero is equivalent to the distribution having exponentially decaying tails.
Log-Moment Generating Function (Cumulant Generating Function)
The log-MGF or cumulant generating function is . It is convex in . The Chernoff bound becomes: . The Legendre transform is the rate function of large deviations theory.
Core Method
The Chernoff Method (General Upper Tail)
Statement
For any random variable whose MGF exists for some , and any :
where is the Fenchel-Legendre dual of the log-MGF.
Intuition
The Chernoff method converts a tail probability into an optimization problem. For each , the bound is valid. Different values of give different bounds, and you pick the best one. The optimal satisfies : the tilted distribution has mean exactly .
Geometrically: is the Legendre transform of , which measures how "surprising" the event is relative to the distribution of . Larger means more surprising, hence less probable.
Proof Sketch
For any :
The first equality is because is strictly increasing. The inequality is Markov applied to the non-negative random variable . Since this holds for all , take the infimum.
Why It Matters
The Chernoff method is a meta-theorem: it transforms the problem of bounding tails into the problem of bounding the MGF, which is often easier.
Different MGF bounds lead to different named inequalities:
- If : sub-Gaussian bound (Hoeffding-type)
- If : Bernstein-type bound
- If : multiplicative Chernoff for Binomials
- If : exponential tail bound
Every exponential concentration inequality is a Chernoff bound with a specific MGF estimate plugged in.
Failure Mode
The Chernoff method requires the MGF to exist in a neighborhood of . For heavy-tailed distributions (Pareto, Cauchy), the MGF is infinite for all , and the Chernoff method gives nothing. For these, you need moment-based methods (Chebyshev for , or higher-moment bounds for ).
Multiplicative Chernoff Bounds
The most important special case: sums of independent Bernoulli (or more generally, -valued) random variables. The "multiplicative" form expresses deviations as a fraction of the mean, which gives tighter bounds than additive Hoeffding when the mean is small.
Multiplicative Chernoff Bound
Statement
Let be independent random variables with . Let and . Then:
Upper tail: For :
Simplified upper tail: For :
Lower tail: For :
Intuition
The multiplicative form measures deviations relative to the mean. If , a deviation of is a 50% overshoot (). The bound depends on : the product of the mean and the squared relative deviation.
Why is this better than Hoeffding for small means? Hoeffding gives . With and small : Hoeffding gives while multiplicative Chernoff gives . When (rare events), Chernoff is dramatically tighter.
Proof Sketch
Step 1: Apply the Chernoff method to with parameter : .
Step 2: Bound the MGF of each : since , using .
Step 3: Multiply: .
Step 4: Total bound: .
Step 5: Optimize: set to get . The simplified form follows from the inequality for .
Why It Matters
The multiplicative Chernoff bound is the standard tool for:
- Randomized algorithms: bounding the probability that a random hash function has too many collisions
- Sampling: how many samples to estimate a proportion within relative error
- Network analysis: bounding node degrees in random graphs
- Binomial concentration: any setting where you sum independent 0/1 indicators and the expected sum may be small
The multiplicative form is especially important when : for rare events, Hoeffding wastes a factor of in the exponent.
Failure Mode
The simplified form is valid only for (at most doubling the mean). For , you must use the exact form or the weaker bound for . Also, the bound requires independence; for dependent indicators, you need Azuma-Hoeffding or other martingale methods.
Why Chernoff Gives the Tightest Exponential Bounds
The Chernoff bound is the best possible exponential bound derivable from the MGF. More precisely:
Claim. For any random variable , among all bounds of the form that hold for all distributions with the same MGF, the Chernoff bound with is optimal.
Why. By the Legendre transform, . Any valid exponential bound based on the MGF must have for the worst-case . Taking the supremum over gives exactly .
Connection to large deviations. For i.i.d. sums , the Chernoff bound gives . The remarkable fact (Cramér's theorem) is that this is asymptotically exact:
So the Chernoff method not only gives an upper bound. It finds the correct exponential rate. This is the starting point of large deviations theory.
Proof Ideas and Templates Used
The Chernoff method is a proof template consisting of three steps:
- Exponentiate: introduce the free parameter
- Apply Markov: convert tail probability to MGF
- Optimize: choose to minimize the bound
This template is universal. The art lies in step 2.5: bounding the MGF. Different MGF bounds produce different named inequalities:
| MGF bound | Named result | Tail type |
|---|---|---|
| Hoeffding's lemma () | Hoeffding's inequality | |
| Bernstein condition (variance + bound) | Bernstein's inequality | |
| Sub-Gaussian definition | Sub-Gaussian tail | |
| Exact Binomial MGF | Multiplicative Chernoff |
Canonical Examples
Chernoff for a standard Gaussian
Let . The MGF is . The Chernoff bound gives:
Optimize: gives . So .
This is the standard Gaussian tail bound. The exact Gaussian tail is , so Chernoff is tight up to the polynomial factor .
Estimating a rare event probability
Flip a coin with heads probability a total of times. Let = number of heads, so .
What is ? This is , i.e., .
Multiplicative Chernoff: .
Hoeffding: .
Chernoff gives vs. Hoeffding's . The multiplicative Chernoff is dramatically tighter because it uses the fact that is small relative to .
Common Confusions
Chernoff is a method, not a single inequality
When people say "Chernoff bound," they sometimes mean the general method (optimize over in the MGF bound) and sometimes mean the specific multiplicative bound for sums of independent [0,1] variables. Be aware of which one is meant. Hoeffding's inequality is a Chernoff bound, derived by plugging Hoeffding's lemma into the Chernoff method. The method is more general than any single application.
The Chernoff bound is one-sided
The basic Chernoff method with bounds the upper tail . For the lower tail , use (equivalently, apply the upper-tail bound to ). For two-sided bounds, combine the two one-sided bounds via a union bound, picking up a factor of 2.
MGF must exist for Chernoff to work
The Chernoff method requires for some . This fails for heavy-tailed distributions: the Cauchy distribution has no finite moments at all, and the exponential distribution has for . For exponential random variables, you can still apply Chernoff but only for , limiting the range of values you can bound.
Connection to Large Deviations: Sanov's Theorem Preview
The Chernoff bound for i.i.d. sums is the starting point of large deviations theory. Cramér's theorem says , and this extends to a much more general setting.
Sanov's theorem generalizes this to empirical distributions: if is the empirical distribution of i.i.d. draws from , then the probability that lands in a set of distributions decays as , where is the KL divergence.
This connects Chernoff bounds to information theory: the "rate" at which empirical observations deviate from the truth is measured by KL divergence.
Summary
- The Chernoff method: , optimize over
- This gives the tightest exponential bound from the MGF
- Multiplicative Chernoff for Binomials: for
- Multiplicative form is tighter than Hoeffding when
- Every exponential concentration inequality (Hoeffding, Bernstein, etc.) is a Chernoff bound with a specific MGF estimate
- The optimal rate equals the large deviations rate function (Cramér's theorem)
- Requires the MGF to exist; fails for heavy-tailed distributions
Exercises
Problem
Let (exponential with rate 1). Use the Chernoff method to derive a tail bound for for .
Problem
A randomized algorithm succeeds with probability on each independent trial. You run it times and take a majority vote. Use the multiplicative Chernoff bound to bound the probability that fewer than 50 trials succeed (i.e., the majority vote fails).
Problem
Derive Hoeffding's inequality from the Chernoff method. Specifically, let be independent with . Starting from the Chernoff bound, use Hoeffding's lemma () to derive:
Related Comparisons
References
Canonical:
- Mitzenmacher & Upfal, Probability and Computing (2017), Chapter 4
- Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 2
- Vershynin, High-Dimensional Probability (2018), Chapter 2
Current:
-
Motwani & Raghavan, Randomized Algorithms (1995), Chapter 4
-
Wainwright, High-Dimensional Statistics (2019), Chapter 2
-
van Handel, Probability in High Dimension (2016), Chapters 1-3
Next Topics
Building on the Chernoff method:
- Sub-Gaussian random variables: the abstract framework that captures all distributions whose MGF satisfies a Gaussian-type bound
- Sub-exponential random variables: the next distributional class, for distributions whose MGF exists only in a limited range
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.