Skip to main content

Concentration Probability

Slud's Inequality

A lower bound on binomial tail probabilities. For X ~ Bin(n, p) with p = (1 - epsilon)/2, P(X >= n/2) is at least a normal-tail-style quantity. Used in VC lower bounds to show learning is genuinely hard.

AdvancedTier 2StableSupporting~12 min

Why This Matters

Most concentration inequalities give upper bounds on tail probabilities. Slud's inequality goes the other way: it produces a non-trivial lower bound on the chance that a binomial random variable lands above its median when the success probability is just below 1/21/2. That kind of two-sided control is exactly what's needed to prove learning is genuinely hard. In the PAC and VC lower-bound arguments of Shalev-Shwartz and Ben-David (Theorem 5.1 and the no-free-lunch construction), Slud is the inequality that converts an information-theoretic obstruction into a quantitative tail probability that a learner cannot beat.

The intuition is simple: a Bin(nn, pp) variable with p=(1ϵ)/2p = (1 - \epsilon)/2 has mean slightly below n/2n/2, so the median lies just below n/2n/2 as well. For small ϵ\epsilon and moderate nn, the chance of crossing the threshold n/2n / 2 is bounded below by the corresponding standard normal tail, and Slud quantifies that comparison non-asymptotically.

Mental Model

Think of Slud's inequality as a non-asymptotic Berry-Esseen-style lower bound for the specific event {Xn/2}\{X \geq n / 2\} when pp is close to 1/21/2 from below. Two anchoring facts:

  1. Normal anti-concentration. A standard normal N(0,1)\mathcal{N}(0, 1) has Pr[Nz]=12(1erf(z/2))\Pr[\mathcal{N} \geq z] = \tfrac{1}{2}\bigl(1 - \mathrm{erf}(z / \sqrt{2})\bigr), which is a strictly positive quantity decaying as exp(z2/2)\exp(-z^2 / 2). Lower bounds on this tail are quantitative.
  2. Binomial-to-normal comparison. The CLT says (Xnp)/np(1p)N(0,1)(X - n p) / \sqrt{n p (1 - p)} \to \mathcal{N}(0, 1), and Slud's inequality gives a one-sided non-asymptotic version of that comparison for the upper tail.

Formal Statement

Theorem

Slud's Inequality

Statement

Let XBin(n,p)X \sim \mathrm{Bin}(n, p) with p=(1ϵ)/2p = (1 - \epsilon) / 2 for some ϵ(0,1)\epsilon \in (0, 1), and assume nϵ21ϵ2n \epsilon^2 \leq 1 - \epsilon^2. Then

Pr[Xn/2]12 ⁣(11exp ⁣(nϵ21ϵ2)).\Pr[X \geq n / 2] \geq \frac{1}{2}\!\left(1 - \sqrt{1 - \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right)}\right).

Exact statement

LaTeX source for copy/export

\Pr[X \geq n/2] \geq \frac{1}{2}\!\left(1 - \sqrt{1 - \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right)}\right)

Intuition

The quantity inside the outer parentheses is a non-asymptotic version of Pr[N(0,1)z]\Pr\bigl[\mathcal{N}(0, 1) \geq z\bigr] at zϵn/(1ϵ2)z \approx \epsilon \sqrt{n / (1 - \epsilon^2)}. For zz near zero, Pr[Nz]1/2z/2π\Pr[\mathcal{N} \geq z] \approx 1/2 - z / \sqrt{2 \pi}, which matches the leading behavior of the right-hand side once the inner square root is expanded. The inequality is sharpest when nϵ2n \epsilon^2 is of moderate order; as nϵ2n \epsilon^2 grows, the right-hand side decays like exp(nϵ2/(2(1ϵ2)))\exp(-n \epsilon^2 / (2 (1 - \epsilon^2))), the right normal-tail order of magnitude.

Proof Sketch

Slud's original argument compares the binomial cumulative distribution function to that of a standard normal via Stirling's approximation and careful analysis of the binomial median. The cleanest route uses the fact that the bulk of binomial mass concentrates around npn p with variance np(1p)n p (1 - p), so

Pr[Xn/2]=Pr ⁣[Xnpnp(1p)n/2npnp(1p)]=Pr ⁣[Znzn]\Pr[X \geq n / 2] = \Pr\!\left[\frac{X - n p}{\sqrt{n p (1 - p)}} \geq \frac{n / 2 - n p}{\sqrt{n p (1 - p)}}\right] = \Pr\!\left[Z_n \geq z_n\right]

with zn=nϵ/(2np(1p))=ϵn/(1ϵ2)z_n = n \epsilon / (2 \sqrt{n p (1 - p)}) = \epsilon \sqrt{n / (1 - \epsilon^2)} after substituting p=(1ϵ)/2p = (1 - \epsilon) / 2. Slud (1977) shows that this standardized binomial probability is at least the standard normal tail at znz_n, and bounding the normal tail from below by 12(11ez2)\tfrac{1}{2}(1 - \sqrt{1 - e^{-z^2}}) produces the displayed form.

Why It Matters

Slud is the standard tool for proving that a learner cannot achieve sample complexity better than Ω(d/ϵ2)\Omega(d / \epsilon^2) on a class of VC dimension dd. The argument constructs a family of distributions on which any learner reduces to a binomial guessing game; Slud lower-bounds the probability of a wrong guess, and that probability is what becomes the PAC failure probability δ\delta.

Failure Mode

The condition nϵ21ϵ2n \epsilon^2 \leq 1 - \epsilon^2 is required for the displayed form, not merely a regime where the bound is useful. The right-hand side 12(11ez2)\tfrac{1}{2}(1 - \sqrt{1 - e^{-z^2}}) is positive for every z0z \geq 0, so positivity of the lower bound is not what the condition controls; rather, the condition is the regime in which the underlying binomial-to-normal comparison used in Slud's proof (and in Shalev-Shwartz and Ben-David, Lemma B.11) is established. When nϵ2>1ϵ2n \epsilon^2 > 1 - \epsilon^2, the displayed form is no longer guaranteed, and the standard practice is to fall back on Slud's sharper binomial-tail-vs-normal-tail comparison Pr[Xn/2]Pr[N(0,1)zn]\Pr[X \geq n/2] \geq \Pr[\mathcal{N}(0, 1) \geq z_n] at zn=ϵn/(1ϵ2)z_n = \epsilon \sqrt{n / (1 - \epsilon^2)} together with a direct normal-tail lower bound, or to compute the binomial tail directly. The useful regime for VC lower bounds, nϵ21n \epsilon^2 \asymp 1, sits inside the displayed form's range.

Exercises

ExerciseAdvanced

Problem

Let XBin(n,p)X \sim \mathrm{Bin}(n, p) with p=(1ϵ)/2p = (1 - \epsilon) / 2, and assume nϵ21ϵ2n \epsilon^2 \leq 1 - \epsilon^2. Use the inequality 11xx/21 - \sqrt{1 - x} \geq x / 2 for x[0,1]x \in [0, 1] to show that Slud's bound implies the simpler estimate

Pr[Xn/2]14exp ⁣(nϵ21ϵ2).\Pr[X \geq n / 2] \geq \frac{1}{4}\, \exp\!\left(-\frac{n \epsilon^2}{1 - \epsilon^2}\right).

Then use this simpler form to argue that any learner that wishes to drive the failure probability below δ\delta must use nClog(1/δ)/ϵ2n \geq C \log(1 / \delta) / \epsilon^2 samples for some absolute constant C>0C > 0.

Where This Shows Up

The two canonical uses on this site:

  • VC sample-complexity lower bounds. When proving Ω(d/ϵ2)\Omega(d / \epsilon^2) is necessary to learn a class of VC dimension dd, the construction reduces a learner's failure probability to a binomial tail Pr[Xn/2]\Pr[X \geq n / 2] for a Bin(n,(1ϵ)/2)(n, (1 - \epsilon) / 2) variable. Slud lower-bounds that tail, which lower-bounds the learner's failure probability.
  • No-free-lunch theorems. The same binomial reduction appears in no-free-lunch constructions where the adversary chooses the labeling distribution; Slud's lower bound is what makes the failure probability δ\delta explicit.

References

Canonical:

  • Slud, E. V. (1977). "Distribution Inequalities for the Binomial Law." Annals of Probability, 5(3), 404-412. The original derivation.
  • Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning. Cambridge University Press. Lemma B.11 (Slud) in Appendix B; used in Theorem 5.1's no-free-lunch lower bound.

Current:

  • Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning: Theoretical Foundations. Cambridge University Press. Chapter 5 develops VC sample-complexity lower bounds via Slud-style binomial comparisons.

Next Topics

Last reviewed: May 8, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

0

No published topic currently declares this as a prerequisite.