Slud's Inequality

Sneiderman, Robby

Concentration Probability

Slud's Inequality

A lower bound on binomial tail probabilities. For X ~ Bin(n, p) with p = (1 - epsilon)/2, P(X >= n/2) is at least a normal-tail-style quantity. Used in VC lower bounds to show learning is genuinely hard.

AdvancedTier 2StableSupporting~12 min

Prerequisites

Common Probability Distributions Concentration Inequalities

Prereq Map

Why This Matters

Most concentration inequalities give upper bounds on tail probabilities. Slud's inequality goes the other way: it produces a non-trivial lower bound on the chance that a binomial random variable lands above its median when the success probability is just below $1/2$ . That kind of two-sided control is exactly what's needed to prove learning is genuinely hard. In the PAC and VC lower-bound arguments of Shalev-Shwartz and Ben-David (Theorem 5.1 and the no-free-lunch construction), Slud is the inequality that converts an information-theoretic obstruction into a quantitative tail probability that a learner cannot beat.

The intuition is simple: a Bin( $n$ , $p$ ) variable with $p = (1 - \epsilon)/2$ has mean slightly below $n/2$ , so the median lies just below $n/2$ as well. For small $\epsilon$ and moderate $n$ , the chance of crossing the threshold $n / 2$ is bounded below by the corresponding standard normal tail, and Slud quantifies that comparison non-asymptotically.

Mental Model

Think of Slud's inequality as a non-asymptotic Berry-Esseen-style lower bound for the specific event $\{X \geq n / 2\}$ when $p$ is close to $1/2$ from below. Two anchoring facts:

Normal anti-concentration. A standard normal $\mathcal{N}(0, 1)$ has $\Pr[\mathcal{N} \geq z] = \tfrac{1}{2}\bigl(1 - \mathrm{erf}(z / \sqrt{2})\bigr)$ , which is a strictly positive quantity decaying as $\exp(-z^2 / 2)$ . Lower bounds on this tail are quantitative.
Binomial-to-normal comparison. The CLT says $(X - n p) / \sqrt{n p (1 - p)} \to \mathcal{N}(0, 1)$ , and Slud's inequality gives a one-sided non-asymptotic version of that comparison for the upper tail.

Formal Statement

Theorem

Slud's Inequality

Statement

Let $X \sim \mathrm{Bin}(n, p)$ with $p = (1 - \epsilon) / 2$ for some $\epsilon \in (0, 1)$ , and assume $n \epsilon^2 \leq 1 - \epsilon^2$ . Then

$\Pr[X \geq n / 2] \geq \frac{1}{2}\!\left(1 - \sqrt{1 - \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right)}\right).$

Exact statement

Pr [X \geq n /2] \geq \frac{1}{2} (1 - 1 - exp (- \frac{n ϵ ^{2}}{1 - ϵ ^{2}}))

LaTeX source for copy/export

\Pr[X \geq n/2] \geq \frac{1}{2}\!\left(1 - \sqrt{1 - \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right)}\right)

Intuition

The quantity inside the outer parentheses is a non-asymptotic version of $\Pr\bigl[\mathcal{N}(0, 1) \geq z\bigr]$ at $z \approx \epsilon \sqrt{n / (1 - \epsilon^2)}$ . For $z$ near zero, $\Pr[\mathcal{N} \geq z] \approx 1/2 - z / \sqrt{2 \pi}$ , which matches the leading behavior of the right-hand side once the inner square root is expanded. The inequality is sharpest when $n \epsilon^2$ is of moderate order; as $n \epsilon^2$ grows, the right-hand side decays like $\exp(-n \epsilon^2 / (2 (1 - \epsilon^2)))$ , the right normal-tail order of magnitude.

Proof Sketch

Slud's original argument compares the binomial cumulative distribution function to that of a standard normal via Stirling's approximation and careful analysis of the binomial median. The cleanest route uses the fact that the bulk of binomial mass concentrates around $n p$ with variance $n p (1 - p)$ , so

$\Pr[X \geq n / 2] = \Pr\!\left[\frac{X - n p}{\sqrt{n p (1 - p)}} \geq \frac{n / 2 - n p}{\sqrt{n p (1 - p)}}\right] = \Pr\!\left[Z_n \geq z_n\right]$

with $z_n = n \epsilon / (2 \sqrt{n p (1 - p)}) = \epsilon \sqrt{n / (1 - \epsilon^2)}$ after substituting $p = (1 - \epsilon) / 2$ . Slud (1977) shows that this standardized binomial probability is at least the standard normal tail at $z_n$ , and bounding the normal tail from below by $\tfrac{1}{2}(1 - \sqrt{1 - e^{-z^2}})$ produces the displayed form.

Why It Matters

Slud is the standard tool for proving that a learner cannot achieve sample complexity better than $\Omega(d / \epsilon^2)$ on a class of VC dimension $d$ . The argument constructs a family of distributions on which any learner reduces to a binomial guessing game; Slud lower-bounds the probability of a wrong guess, and that probability is what becomes the PAC failure probability $\delta$ .

Failure Mode

The condition $n \epsilon^2 \leq 1 - \epsilon^2$ is required for the displayed form, not merely a regime where the bound is useful. The right-hand side $\tfrac{1}{2}(1 - \sqrt{1 - e^{-z^2}})$ is positive for every $z \geq 0$ , so positivity of the lower bound is not what the condition controls; rather, the condition is the regime in which the underlying binomial-to-normal comparison used in Slud's proof (and in Shalev-Shwartz and Ben-David, Lemma B.11) is established. When $n \epsilon^2 > 1 - \epsilon^2$ , the displayed form is no longer guaranteed, and the standard practice is to fall back on Slud's sharper binomial-tail-vs-normal-tail comparison $\Pr[X \geq n/2] \geq \Pr[\mathcal{N}(0, 1) \geq z_n]$ at $z_n = \epsilon \sqrt{n / (1 - \epsilon^2)}$ together with a direct normal-tail lower bound, or to compute the binomial tail directly. The useful regime for VC lower bounds, $n \epsilon^2 \asymp 1$ , sits inside the displayed form's range.

report a correction →

Exercises

ExerciseAdvanced

Problem

Let $X \sim \mathrm{Bin}(n, p)$ with $p = (1 - \epsilon) / 2$ , and assume $n \epsilon^2 \leq 1 - \epsilon^2$ . Use the inequality $1 - \sqrt{1 - x} \geq x / 2$ for $x \in [0, 1]$ to show that Slud's bound implies the simpler estimate

$\Pr[X \geq n / 2] \geq \frac{1}{4}\, \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right).$

Then use this simpler form to argue that any learner that wishes to drive the failure probability below $\delta$ must use $n \geq C \log(1 / \delta) / \epsilon^2$ samples for some absolute constant $C > 0$ .

Where This Shows Up

The two canonical uses on this site:

VC sample-complexity lower bounds. When proving $\Omega(d / \epsilon^2)$ is necessary to learn a class of VC dimension $d$ , the construction reduces a learner's failure probability to a binomial tail $\Pr[X \geq n / 2]$ for a Bin $(n, (1 - \epsilon) / 2)$ variable. Slud lower-bounds that tail, which lower-bounds the learner's failure probability.
No-free-lunch theorems. The same binomial reduction appears in no-free-lunch constructions where the adversary chooses the labeling distribution; Slud's lower bound is what makes the failure probability $\delta$ explicit.

References

Canonical:

Slud, E. V. (1977). "Distribution Inequalities for the Binomial Law." Annals of Probability, 5(3), 404-412. The original derivation.
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.11 (Slud) in Appendix B; used in Theorem 5.1's no-free-lunch lower bound.

Current:

Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press. Chapter 5 develops VC sample-complexity lower bounds via Slud-style binomial comparisons.

Next Topics

VC dimension: the combinatorial quantity whose lower bound on sample complexity Slud helps prove
PAC learning framework: where the failure probability $\delta$ that Slud lower-bounds enters formally
Common probability distributions: the binomial distribution that Slud's inequality acts on

Last reviewed: May 8, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Concentration Inequalitieslayer 1 · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.