McDiarmid's Inequality

Sneiderman, Robby

Concentration Probability

McDiarmid's Inequality

The bounded-differences inequality: if changing any single input to a function changes the output by at most a fixed constant, the function concentrates around its mean with sub-Gaussian tails.

AdvancedTier 1StableCore spine~55 min

Prerequisites

Concentration Inequalities Subgaussian Random Variables Martingale Theory Hoeffdings Lemma

Start 8-question practice · 13 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

concentration-probability | layer 3 | tier 1. This page has 4 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Algorithmic Stability

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Source-grounded page

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

McDiarmid: bounded per-coordinate sensitivity ⇒ sub-Gaussian concentration around the mean

McDiarmid's inequality is the "Swiss army knife" of concentration inequalities in machine learning theory. Any time you have a function of independent random variables where changing one variable doesn't change the output too much, you get exponential concentration around the mean.

This single inequality is used to prove:

Generalization bounds (empirical risk concentrates around population risk)
Rademacher complexity concentration ( $\hat{\mathfrak{R}}_S$ concentrates around $\mathfrak{R}_n$ )
Stability-based generalization (changing one training example changes the output by at most $1/n$ )
Cross-validation concentration
Random graph properties

Mental Model

Imagine a function $f(X_1, \ldots, X_n)$ of $n$ independent random inputs. If replacing any single $X_i$ with a different value $X_i'$ changes $f$ by at most $c_i$ , then $f$ is "not too sensitive" to any single input. The bounded-differences condition quantifies this sensitivity. McDiarmid says: low sensitivity implies concentration.

Formal Setup and Notation

Let $X_1, \ldots, X_n$ be independent random variables (not necessarily identically distributed) taking values in sets $\mathcal{X}_1, \ldots, \mathcal{X}_n$ .

Definition

Bounded Differences Condition

A function $f: \mathcal{X}_1 \times \cdots \times \mathcal{X}_n \to \mathbb{R}$ satisfies the bounded differences condition with constants $c_1, \ldots, c_n$ if for every $i \in \{1, \ldots, n\}$ and all $x_1, \ldots, x_n, x_i' \in \mathcal{X}_i$ :

$\sup_{x_1, \ldots, x_n, x_i'} |f(x_1, \ldots, x_i, \ldots, x_n) - f(x_1, \ldots, x_i', \ldots, x_n)| \leq c_i$

Changing the $i$ -th input while keeping all others fixed changes $f$ by at most $c_i$ .

Main Theorems

Theorem

McDiarmid's Inequality (Bounded Differences)

Statement

If $f$ satisfies the bounded differences condition with constants $c_1, \ldots, c_n$ , then for all $t > 0$ :

$\Pr[f(X_1, \ldots, X_n) - \mathbb{E}[f(X_1, \ldots, X_n)] \geq t] \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)$

and

$\Pr[|f(X_1, \ldots, X_n) - \mathbb{E}[f]| \geq t] \leq 2\exp\!\left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)$

Intuition

The concentration is sub-Gaussian with effective variance $\sigma^2 = \frac{1}{4}\sum_i c_i^2$ . If each $c_i = c/n$ (as in empirical averages), the bound becomes $\exp(-2nt^2/c^2)$ , recovering Hoeffding-style concentration. The key insight: you don't need the function to be a sum. Any function with bounded sensitivity concentrates.

Proof Sketch

Step 1: Construct a Doob martingale. Define $Z_k = \mathbb{E}[f \mid X_1, \ldots, X_k]$ for $k = 0, 1, \ldots, n$ , so $Z_0 = \mathbb{E}[f]$ and $Z_n = f(X_1, \ldots, X_n)$ .

Step 2: The differences $D_k = Z_k - Z_{k-1}$ form a martingale difference sequence. By the bounded-differences condition: $|D_k| \leq c_k$ almost surely.

Step 3: Apply Azuma-Hoeffding to the martingale $(Z_k)$ :

$\Pr[Z_n - Z_0 \geq t] = \Pr\!\left[\sum_{k=1}^n D_k \geq t\right] \leq \exp\!\left(-\frac{2t^2}{\sum_k c_k^2}\right)$

Why It Matters

McDiarmid's inequality is the workhorse behind most high-probability generalization bounds in learning theory. The proof template. Doob martingale + Azuma-Hoeffding. is reusable: any time you can bound how sensitive a quantity is to individual data points, you get concentration for free.

Failure Mode

The bound requires the bounded-differences constants $c_i$ to be worst-case over all inputs. If the function is usually insensitive but occasionally very sensitive (e.g., outliers cause large changes), the worst-case $c_i$ can be much larger than the typical sensitivity, making the bound loose. Variance-sensitive versions (Bernstein-type) can help.

report a correction →

Proof Ideas and Templates Used

The McDiarmid proof uses two standard tools:

Doob martingale construction: build a martingale by sequentially conditioning on each variable. This is the standard way to reduce a function of independent variables to a sum of bounded increments.
Azuma-Hoeffding inequality: concentration for bounded-increment martingales. This is the martingale version of Hoeffding's inequality.

This "Doob + Azuma" pattern is used throughout learning theory and high-dimensional probability.

Canonical Examples

Example

Empirical risk concentrates

Let $f(z_1, \ldots, z_n) = \hat{R}_n(h) = \frac{1}{n}\sum_{i=1}^n \ell(h(x_i), y_i)$ for a fixed hypothesis $h$ with loss $\ell \in [0, 1]$ . Changing one sample $z_i$ changes $f$ by at most $c_i = 1/n$ . So:

$\Pr[|\hat{R}_n(h) - R(h)| \geq t] \leq 2\exp(-2nt^2)$

This is just Hoeffding's inequality. McDiarmid generalizes it to non-sum functions.

Example

Rademacher complexity concentrates

The empirical Rademacher complexity $\hat{\mathfrak{R}}_S(\mathcal{F})$ is a function of $S = (z_1, \ldots, z_n)$ . Changing one $z_i$ changes $\hat{\mathfrak{R}}_S$ by at most $c_i = 2B/n$ (where $B$ bounds the function values). McDiarmid gives:

$\Pr[|\hat{\mathfrak{R}}_S - \mathfrak{R}_n| \geq t] \leq 2\exp\!\left(-\frac{nt^2}{2B^2}\right)$

This is why we can use the empirical Rademacher complexity (computed from data) in generalization bounds instead of the population version.

Example

Longest increasing subsequence

Let $f(X_1, \ldots, X_n)$ be the length of the longest increasing subsequence of a random permutation. Changing one element changes $f$ by at most 1 (you can only extend or shorten the LIS by 1). So $c_i = 1$ for all $i$ and:

$\Pr[|f - \mathbb{E}[f]| \geq t] \leq 2\exp(-2t^2/n)$

This is a non-trivial concentration result for a combinatorial quantity.

Common Confusions

Watch Out

McDiarmid is NOT just Hoeffding for sums

Hoeffding's inequality applies to sums of independent bounded random variables. McDiarmid's inequality applies to any function of independent variables that satisfies bounded differences. Sums are a special case ( $f = \sum_i g(X_i)$ , where $c_i$ is the range of $g$ ), but McDiarmid also handles non-linear, non-additive functions like suprema, medians, and combinatorial quantities.

Watch Out

The bounded-differences condition is worst-case

$c_i$ must be a uniform bound over all possible inputs. If the function is usually stable but occasionally jumps, the worst-case $c_i$ may be much larger than the typical change. In such cases, variance-sensitive inequalities (like the self-bounding property or entropy methods) give tighter results.

Comparison with Hoeffding's Inequality

McDiarmid's inequality is a function-level generalization of Hoeffding's inequality. The relationship is precise and worth stating explicitly.

Hoeffding's inequality (the version for sums) states: if $X_1, \ldots, X_n$ are independent with $X_i \in [a_i, b_i]$ almost surely, then for $S_n = \sum_{i=1}^n X_i$ :

$\Pr[S_n - \mathbb{E}[S_n] \geq t] \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)$

McDiarmid recovers this as a special case. Set $f(X_1, \ldots, X_n) = \sum_i X_i$ . Changing the $i$ -th input from $x_i$ to $x_i'$ changes $f$ by exactly $x_i' - x_i \in [a_i - b_i, b_i - a_i]$ . The bounded-differences constant is $c_i = b_i - a_i$ . Plugging into McDiarmid gives the Hoeffding bound exactly.

The key difference: Hoeffding requires the function to be a sum of independent terms. McDiarmid requires only bounded sensitivity; the function can be a maximum, a minimum, a median, a combinatorial quantity, or any function you can bound coordinatewise. This table summarizes the scope:

Inequality	Function class	Bound	Key condition
Hoeffding	$f = \sum_i g(X_i)$	$\exp(-2t^2/\sum c_i^2)$	$X_i \in [a_i, b_i]$
McDiarmid	Any $f$	$\exp(-2t^2/\sum c_i^2)$	Bounded differences $c_i$
Bernstein	$f = \sum_i g(X_i)$	$\exp(-t^2/(2\sigma^2 + 2bt/3))$	Bounded variance $\sigma^2$

The Bernstein inequality is tighter than Hoeffding when the variance $\sigma^2 = \sum_i \text{Var}(X_i)$ is small relative to $\sum c_i^2 / 4$ . McDiarmid has no direct Bernstein analog in full generality, though the entropy/log-Sobolev methods in Boucheron et al. provide variance-sensitive versions.

In learning theory, McDiarmid is preferred over Hoeffding for quantities like empirical Rademacher complexity and cross-validation error because these are not simple sums: they involve suprema over function classes. Hoeffding cannot handle such functions; McDiarmid can, provided the bounded-differences constants are computable. See also concentration inequalities for a broader taxonomy.

Summary

McDiarmid: if $|f(x) - f(x')| \leq c_i$ when the $i$ -th input changes, then $f$ concentrates with rate $\exp(-2t^2/\sum c_i^2)$
The proof goes: Doob martingale → Azuma-Hoeffding
Special case of sums: recovers Hoeffding's inequality exactly, with $c_i = b_i - a_i$
McDiarmid generalizes Hoeffding from sums to arbitrary functions with bounded sensitivity
Used everywhere in learning theory: generalization bounds, Rademacher concentration, stability arguments
The bounded-differences constants must be worst-case

Exercises

ExerciseCore

Problem

Let $X_1, \ldots, X_n$ be i.i.d. with $X_i \in [0, 1]$ , and let $f(X_1, \ldots, X_n) = \max_i X_i$ . Find the bounded-differences constants and apply McDiarmid to bound $\Pr[\max_i X_i - \mathbb{E}[\max_i X_i] \geq t]$ .

ExerciseAdvanced

Problem

Prove that the sample median of i.i.d. observations in $[0, 1]$ satisfies the bounded-differences condition with $c_i = 1/\lceil n/2 \rceil$ (approximately $2/n$ ), and use McDiarmid to show the median concentrates.

Related Comparisons

Azuma-Hoeffding vs. Freedman Inequality

References

Canonical:

McDiarmid, "On the Method of Bounded Differences" (1989), Surveys in Combinatorics
Shalev-Shwartz & Ben-David, Understanding Machine Learning, Chapter 26
Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 6

Current:

Vershynin, High-Dimensional Probability (2018), Chapter 2
Wainwright, High-Dimensional Statistics (2019), Chapter 2
van Handel, Probability in High Dimension (2016), Chapters 1-3

Next Topics

Natural next steps from McDiarmid:

Algorithmic stability: uses McDiarmid to convert stability bounds into generalization bounds
Rademacher complexity: uses McDiarmid to show empirical Rademacher concentrates around population Rademacher
Concentration inequalities: the broader family of results that McDiarmid belongs to, including Bernstein, Azuma, and log-Sobolev methods

Last reviewed: April 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Concentration Inequalitieslayer 1 · tier 1
Hoeffding's Lemmalayer 1 · tier 1
Sub-Gaussian Random Variableslayer 2 · tier 1
Martingale Theorylayer 0B · tier 2

Derived topics

2

Algorithmic Stabilitylayer 3 · tier 1
Rademacher Complexitylayer 3 · tier 1

Graph-backed continuations

Algorithmic Stability Rademacher Complexity