Concentration Probability
McDiarmid's Inequality
The bounded-differences inequality: if changing one input to a function changes the output by at most c_i, the function concentrates around its mean with sub-Gaussian tails.
Why This Matters
McDiarmid's inequality is the "Swiss army knife" of concentration inequalities in machine learning theory. Any time you have a function of independent random variables where changing one variable doesn't change the output too much, you get exponential concentration around the mean.
This single inequality is used to prove:
- Generalization bounds (empirical risk concentrates around population risk)
- Rademacher complexity concentration ( concentrates around )
- Stability-based generalization (changing one training example changes the output by at most )
- Cross-validation concentration
- Random graph properties
Mental Model
Imagine a function of independent random inputs. If replacing any single with a different value changes by at most , then is "not too sensitive" to any single input. The bounded-differences condition quantifies this sensitivity. McDiarmid says: low sensitivity implies concentration.
Formal Setup and Notation
Let be independent random variables (not necessarily identically distributed) taking values in sets .
Bounded Differences Condition
A function satisfies the bounded differences condition with constants if for every and all :
Changing the -th input while keeping all others fixed changes by at most .
Main Theorems
McDiarmid's Inequality (Bounded Differences)
Statement
If satisfies the bounded differences condition with constants , then for all :
and
Intuition
The concentration is sub-Gaussian with effective variance . If each (as in empirical averages), the bound becomes , recovering Hoeffding-style concentration. The key insight: you don't need the function to be a sum. Any function with bounded sensitivity concentrates.
Proof Sketch
Step 1: Construct a Doob martingale. Define for , so and .
Step 2: The differences form a martingale difference sequence. By the bounded-differences condition: almost surely.
Step 3: Apply Azuma-Hoeffding to the martingale :
Why It Matters
McDiarmid's inequality is the workhorse behind most high-probability generalization bounds in learning theory. The proof template. Doob martingale + Azuma-Hoeffding. is reusable: any time you can bound how sensitive a quantity is to individual data points, you get concentration for free.
Failure Mode
The bound requires the bounded-differences constants to be worst-case over all inputs. If the function is usually insensitive but occasionally very sensitive (e.g., outliers cause large changes), the worst-case can be much larger than the typical sensitivity, making the bound loose. Variance-sensitive versions (Bernstein-type) can help.
Proof Ideas and Templates Used
The McDiarmid proof uses two standard tools:
-
Doob martingale construction: build a martingale by sequentially conditioning on each variable. This is the standard way to reduce a function of independent variables to a sum of bounded increments.
-
Azuma-Hoeffding inequality: concentration for bounded-increment martingales. This is the martingale version of Hoeffding's inequality.
This "Doob + Azuma" pattern is used throughout learning theory and high-dimensional probability.
Canonical Examples
Empirical risk concentrates
Let for a fixed hypothesis with loss . Changing one sample changes by at most . So:
This is just Hoeffding's inequality. McDiarmid generalizes it to non-sum functions.
Rademacher complexity concentrates
The empirical Rademacher complexity is a function of . Changing one changes by at most (where bounds the function values). McDiarmid gives:
This is why we can use the empirical Rademacher complexity (computed from data) in generalization bounds instead of the population version.
Longest increasing subsequence
Let be the length of the longest increasing subsequence of a random permutation. Changing one element changes by at most 1 (you can only extend or shorten the LIS by 1). So for all and:
This is a non-trivial concentration result for a combinatorial quantity.
Common Confusions
McDiarmid is NOT just Hoeffding for sums
Hoeffding's inequality applies to sums of independent bounded random variables. McDiarmid's inequality applies to any function of independent variables that satisfies bounded differences. Sums are a special case (, where is the range of ), but McDiarmid also handles non-linear, non-additive functions like suprema, medians, and combinatorial quantities.
The bounded-differences condition is worst-case
must be a uniform bound over all possible inputs. If the function is usually stable but occasionally jumps, the worst-case may be much larger than the typical change. In such cases, variance-sensitive inequalities (like the self-bounding property or entropy methods) give tighter results.
Summary
- McDiarmid: if when the -th input changes, then concentrates with rate
- The proof goes: Doob martingale → Azuma-Hoeffding
- Special case of sums: recovers Hoeffding's inequality
- Used everywhere in learning theory: generalization bounds, Rademacher concentration, stability arguments
- The bounded-differences constants must be worst-case
Exercises
Problem
Let be i.i.d. with , and let . Find the bounded-differences constants and apply McDiarmid to bound .
Problem
Prove that the sample median of i.i.d. observations in satisfies the bounded-differences condition with (approximately ), and use McDiarmid to show the median concentrates.
Related Comparisons
References
Canonical:
- McDiarmid, "On the Method of Bounded Differences" (1989), Surveys in Combinatorics
- Shalev-Shwartz & Ben-David, Understanding Machine Learning, Chapter 26
- Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 6
Current:
-
Vershynin, High-Dimensional Probability (2018), Chapter 2
-
Wainwright, High-Dimensional Statistics (2019), Chapter 2
-
van Handel, Probability in High Dimension (2016), Chapters 1-3
Next Topics
Natural next steps from McDiarmid:
- Algorithmic stability: uses McDiarmid to convert stability bounds into generalization bounds
- Rademacher complexity: uses McDiarmid to show empirical Rademacher concentrates around population Rademacher
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.