Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Concentration Probability

McDiarmid's Inequality

The bounded-differences inequality: if changing one input to a function changes the output by at most c_i, the function concentrates around its mean with sub-Gaussian tails.

AdvancedTier 1Stable~55 min

Why This Matters

McDiarmid's inequality is the "Swiss army knife" of concentration inequalities in machine learning theory. Any time you have a function of independent random variables where changing one variable doesn't change the output too much, you get exponential concentration around the mean.

This single inequality is used to prove:

  • Generalization bounds (empirical risk concentrates around population risk)
  • Rademacher complexity concentration (R^S\hat{\mathfrak{R}}_S concentrates around Rn\mathfrak{R}_n)
  • Stability-based generalization (changing one training example changes the output by at most 1/n1/n)
  • Cross-validation concentration
  • Random graph properties

Mental Model

Imagine a function f(X1,,Xn)f(X_1, \ldots, X_n) of nn independent random inputs. If replacing any single XiX_i with a different value XiX_i' changes ff by at most cic_i, then ff is "not too sensitive" to any single input. The bounded-differences condition quantifies this sensitivity. McDiarmid says: low sensitivity implies concentration.

Formal Setup and Notation

Let X1,,XnX_1, \ldots, X_n be independent random variables (not necessarily identically distributed) taking values in sets X1,,Xn\mathcal{X}_1, \ldots, \mathcal{X}_n.

Definition

Bounded Differences Condition

A function f:X1××XnRf: \mathcal{X}_1 \times \cdots \times \mathcal{X}_n \to \mathbb{R} satisfies the bounded differences condition with constants c1,,cnc_1, \ldots, c_n if for every i{1,,n}i \in \{1, \ldots, n\} and all x1,,xn,xiXix_1, \ldots, x_n, x_i' \in \mathcal{X}_i:

supx1,,xn,xif(x1,,xi,,xn)f(x1,,xi,,xn)ci\sup_{x_1, \ldots, x_n, x_i'} |f(x_1, \ldots, x_i, \ldots, x_n) - f(x_1, \ldots, x_i', \ldots, x_n)| \leq c_i

Changing the ii-th input while keeping all others fixed changes ff by at most cic_i.

Main Theorems

Theorem

McDiarmid's Inequality (Bounded Differences)

Statement

If ff satisfies the bounded differences condition with constants c1,,cnc_1, \ldots, c_n, then for all t>0t > 0:

Pr[f(X1,,Xn)E[f(X1,,Xn)]t]exp ⁣(2t2i=1nci2)\Pr[f(X_1, \ldots, X_n) - \mathbb{E}[f(X_1, \ldots, X_n)] \geq t] \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)

and

Pr[f(X1,,Xn)E[f]t]2exp ⁣(2t2i=1nci2)\Pr[|f(X_1, \ldots, X_n) - \mathbb{E}[f]| \geq t] \leq 2\exp\!\left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)

Intuition

The concentration is sub-Gaussian with effective variance σ2=14ici2\sigma^2 = \frac{1}{4}\sum_i c_i^2. If each ci=c/nc_i = c/n (as in empirical averages), the bound becomes exp(2nt2/c2)\exp(-2nt^2/c^2), recovering Hoeffding-style concentration. The key insight: you don't need the function to be a sum. Any function with bounded sensitivity concentrates.

Proof Sketch

Step 1: Construct a Doob martingale. Define Zk=E[fX1,,Xk]Z_k = \mathbb{E}[f \mid X_1, \ldots, X_k] for k=0,1,,nk = 0, 1, \ldots, n, so Z0=E[f]Z_0 = \mathbb{E}[f] and Zn=f(X1,,Xn)Z_n = f(X_1, \ldots, X_n).

Step 2: The differences Dk=ZkZk1D_k = Z_k - Z_{k-1} form a martingale difference sequence. By the bounded-differences condition: Dkck|D_k| \leq c_k almost surely.

Step 3: Apply Azuma-Hoeffding to the martingale (Zk)(Z_k):

Pr[ZnZ0t]=Pr ⁣[k=1nDkt]exp ⁣(2t2kck2)\Pr[Z_n - Z_0 \geq t] = \Pr\!\left[\sum_{k=1}^n D_k \geq t\right] \leq \exp\!\left(-\frac{2t^2}{\sum_k c_k^2}\right)

Why It Matters

McDiarmid's inequality is the workhorse behind most high-probability generalization bounds in learning theory. The proof template. Doob martingale + Azuma-Hoeffding. is reusable: any time you can bound how sensitive a quantity is to individual data points, you get concentration for free.

Failure Mode

The bound requires the bounded-differences constants cic_i to be worst-case over all inputs. If the function is usually insensitive but occasionally very sensitive (e.g., outliers cause large changes), the worst-case cic_i can be much larger than the typical sensitivity, making the bound loose. Variance-sensitive versions (Bernstein-type) can help.

Proof Ideas and Templates Used

The McDiarmid proof uses two standard tools:

  1. Doob martingale construction: build a martingale by sequentially conditioning on each variable. This is the standard way to reduce a function of independent variables to a sum of bounded increments.

  2. Azuma-Hoeffding inequality: concentration for bounded-increment martingales. This is the martingale version of Hoeffding's inequality.

This "Doob + Azuma" pattern is used throughout learning theory and high-dimensional probability.

Canonical Examples

Example

Empirical risk concentrates

Let f(z1,,zn)=R^n(h)=1ni=1n(h(xi),yi)f(z_1, \ldots, z_n) = \hat{R}_n(h) = \frac{1}{n}\sum_{i=1}^n \ell(h(x_i), y_i) for a fixed hypothesis hh with loss [0,1]\ell \in [0, 1]. Changing one sample ziz_i changes ff by at most ci=1/nc_i = 1/n. So:

Pr[R^n(h)R(h)t]2exp(2nt2)\Pr[|\hat{R}_n(h) - R(h)| \geq t] \leq 2\exp(-2nt^2)

This is just Hoeffding's inequality. McDiarmid generalizes it to non-sum functions.

Example

Rademacher complexity concentrates

The empirical Rademacher complexity R^S(F)\hat{\mathfrak{R}}_S(\mathcal{F}) is a function of S=(z1,,zn)S = (z_1, \ldots, z_n). Changing one ziz_i changes R^S\hat{\mathfrak{R}}_S by at most ci=2B/nc_i = 2B/n (where BB bounds the function values). McDiarmid gives:

Pr[R^SRnt]2exp ⁣(nt22B2)\Pr[|\hat{\mathfrak{R}}_S - \mathfrak{R}_n| \geq t] \leq 2\exp\!\left(-\frac{nt^2}{2B^2}\right)

This is why we can use the empirical Rademacher complexity (computed from data) in generalization bounds instead of the population version.

Example

Longest increasing subsequence

Let f(X1,,Xn)f(X_1, \ldots, X_n) be the length of the longest increasing subsequence of a random permutation. Changing one element changes ff by at most 1 (you can only extend or shorten the LIS by 1). So ci=1c_i = 1 for all ii and:

Pr[fE[f]t]2exp(2t2/n)\Pr[|f - \mathbb{E}[f]| \geq t] \leq 2\exp(-2t^2/n)

This is a non-trivial concentration result for a combinatorial quantity.

Common Confusions

Watch Out

McDiarmid is NOT just Hoeffding for sums

Hoeffding's inequality applies to sums of independent bounded random variables. McDiarmid's inequality applies to any function of independent variables that satisfies bounded differences. Sums are a special case (f=ig(Xi)f = \sum_i g(X_i), where cic_i is the range of gg), but McDiarmid also handles non-linear, non-additive functions like suprema, medians, and combinatorial quantities.

Watch Out

The bounded-differences condition is worst-case

cic_i must be a uniform bound over all possible inputs. If the function is usually stable but occasionally jumps, the worst-case cic_i may be much larger than the typical change. In such cases, variance-sensitive inequalities (like the self-bounding property or entropy methods) give tighter results.

Summary

  • McDiarmid: if f(x)f(x)ci|f(x) - f(x')| \leq c_i when the ii-th input changes, then ff concentrates with rate exp(2t2/ci2)\exp(-2t^2/\sum c_i^2)
  • The proof goes: Doob martingale → Azuma-Hoeffding
  • Special case of sums: recovers Hoeffding's inequality
  • Used everywhere in learning theory: generalization bounds, Rademacher concentration, stability arguments
  • The bounded-differences constants must be worst-case

Exercises

ExerciseCore

Problem

Let X1,,XnX_1, \ldots, X_n be i.i.d. with Xi[0,1]X_i \in [0, 1], and let f(X1,,Xn)=maxiXif(X_1, \ldots, X_n) = \max_i X_i. Find the bounded-differences constants and apply McDiarmid to bound Pr[maxiXiE[maxiXi]t]\Pr[\max_i X_i - \mathbb{E}[\max_i X_i] \geq t].

ExerciseAdvanced

Problem

Prove that the sample median of i.i.d. observations in [0,1][0, 1] satisfies the bounded-differences condition with ci=1/n/2c_i = 1/\lceil n/2 \rceil (approximately 2/n2/n), and use McDiarmid to show the median concentrates.

Related Comparisons

References

Canonical:

  • McDiarmid, "On the Method of Bounded Differences" (1989), Surveys in Combinatorics
  • Shalev-Shwartz & Ben-David, Understanding Machine Learning, Chapter 26
  • Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 6

Current:

  • Vershynin, High-Dimensional Probability (2018), Chapter 2

  • Wainwright, High-Dimensional Statistics (2019), Chapter 2

  • van Handel, Probability in High Dimension (2016), Chapters 1-3

Next Topics

Natural next steps from McDiarmid:

  • Algorithmic stability: uses McDiarmid to convert stability bounds into generalization bounds
  • Rademacher complexity: uses McDiarmid to show empirical Rademacher concentrates around population Rademacher

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics