Concentration Probability
Slud's Inequality
A lower bound on binomial tail probabilities. For X ~ Bin(n, p) with p = (1 - epsilon)/2, P(X >= n/2) is at least a normal-tail-style quantity. Used in VC lower bounds to show learning is genuinely hard.
Why This Matters
Most concentration inequalities give upper bounds on tail probabilities. Slud's inequality goes the other way: it produces a non-trivial lower bound on the chance that a binomial random variable lands above its median when the success probability is just below . That kind of two-sided control is exactly what's needed to prove learning is genuinely hard. In the PAC and VC lower-bound arguments of Shalev-Shwartz and Ben-David (Theorem 5.1 and the no-free-lunch construction), Slud is the inequality that converts an information-theoretic obstruction into a quantitative tail probability that a learner cannot beat.
The intuition is simple: a Bin(, ) variable with has mean slightly below , so the median lies just below as well. For small and moderate , the chance of crossing the threshold is bounded below by the corresponding standard normal tail, and Slud quantifies that comparison non-asymptotically.
Mental Model
Think of Slud's inequality as a non-asymptotic Berry-Esseen-style lower bound for the specific event when is close to from below. Two anchoring facts:
- Normal anti-concentration. A standard normal has , which is a strictly positive quantity decaying as . Lower bounds on this tail are quantitative.
- Binomial-to-normal comparison. The CLT says , and Slud's inequality gives a one-sided non-asymptotic version of that comparison for the upper tail.
Formal Statement
Slud's Inequality
Statement
Let with for some , and assume . Then
Exact statement
LaTeX source for copy/export
\Pr[X \geq n/2] \geq \frac{1}{2}\!\left(1 - \sqrt{1 - \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right)}\right)Intuition
The quantity inside the outer parentheses is a non-asymptotic version of at . For near zero, , which matches the leading behavior of the right-hand side once the inner square root is expanded. The inequality is sharpest when is of moderate order; as grows, the right-hand side decays like , the right normal-tail order of magnitude.
Proof Sketch
Slud's original argument compares the binomial cumulative distribution function to that of a standard normal via Stirling's approximation and careful analysis of the binomial median. The cleanest route uses the fact that the bulk of binomial mass concentrates around with variance , so
with after substituting . Slud (1977) shows that this standardized binomial probability is at least the standard normal tail at , and bounding the normal tail from below by produces the displayed form.
Why It Matters
Slud is the standard tool for proving that a learner cannot achieve sample complexity better than on a class of VC dimension . The argument constructs a family of distributions on which any learner reduces to a binomial guessing game; Slud lower-bounds the probability of a wrong guess, and that probability is what becomes the PAC failure probability .
Failure Mode
The condition is required for the displayed form, not merely a regime where the bound is useful. The right-hand side is positive for every , so positivity of the lower bound is not what the condition controls; rather, the condition is the regime in which the underlying binomial-to-normal comparison used in Slud's proof (and in Shalev-Shwartz and Ben-David, Lemma B.11) is established. When , the displayed form is no longer guaranteed, and the standard practice is to fall back on Slud's sharper binomial-tail-vs-normal-tail comparison at together with a direct normal-tail lower bound, or to compute the binomial tail directly. The useful regime for VC lower bounds, , sits inside the displayed form's range.
Exercises
Problem
Let with , and assume . Use the inequality for to show that Slud's bound implies the simpler estimate
Then use this simpler form to argue that any learner that wishes to drive the failure probability below must use samples for some absolute constant .
Where This Shows Up
The two canonical uses on this site:
- VC sample-complexity lower bounds. When proving is necessary to learn a class of VC dimension , the construction reduces a learner's failure probability to a binomial tail for a Bin variable. Slud lower-bounds that tail, which lower-bounds the learner's failure probability.
- No-free-lunch theorems. The same binomial reduction appears in no-free-lunch constructions where the adversary chooses the labeling distribution; Slud's lower bound is what makes the failure probability explicit.
References
Canonical:
- Slud, E. V. (1977). "Distribution Inequalities for the Binomial Law." Annals of Probability, 5(3), 404-412. The original derivation.
- Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.11 (Slud) in Appendix B; used in Theorem 5.1's no-free-lunch lower bound.
Current:
- Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press. Chapter 5 develops VC sample-complexity lower bounds via Slud-style binomial comparisons.
Next Topics
- VC dimension: the combinatorial quantity whose lower bound on sample complexity Slud helps prove
- PAC learning framework: where the failure probability that Slud lower-bounds enters formally
- Common probability distributions: the binomial distribution that Slud's inequality acts on
Last reviewed: May 8, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Common Probability Distributionslayer 0A · tier 1
- Concentration Inequalitieslayer 1 · tier 1
Derived topics
0No published topic currently declares this as a prerequisite.