Skip to main content

Mathematical Infrastructure

Borel-Cantelli Lemmas

The two lemmas that bridge in-probability convergence to almost-sure convergence. First lemma: summable probabilities imply the events occur only finitely often; Second lemma: independence plus divergence implies infinitely often.

CoreTier 1Stable~30 min
0

Why This Matters

Most learning theorems give you in-probability statements: "with probability at least 1δ1 - \delta, the empirical risk is within ε\varepsilon of the true risk." To upgrade these to "almost-sure" guarantees (which the strong law of large numbers and many martingale arguments require), you need a tool that converts a sequence of high-probability statements into a single probability-1 statement about the entire trajectory. That tool is the first Borel-Cantelli lemma.

The second lemma points the other way: it tells you when events with non-summable probabilities are guaranteed to recur infinitely often. This is the source of every "the algorithm visits every state infinitely often" guarantee in stochastic approximation and reinforcement learning, and it underlies Borel's theorem that Lebesgue-almost every real number is normal.

Both lemmas are two-line statements; between them they supply the strong law of large numbers and every "visit every state infinitely often" guarantee in stochastic approximation.

The Setup: Limit Superior of Events

For a sequence of events A1,A2,FA_1, A_2, \ldots \in \mathcal{F}, the limit superior is

{An i.o.}:=lim supnAn=N=1n=NAn.\{A_n \text{ i.o.}\} := \limsup_{n \to \infty} A_n = \bigcap_{N=1}^\infty \bigcup_{n=N}^\infty A_n.

In words: ω{An i.o.}\omega \in \{A_n \text{ i.o.}\} iff ω\omega belongs to infinitely many of the AnA_n. The phrase "AnA_n infinitely often" abbreviates this and is standard notation. The complement is the event "AnA_n holds for only finitely many nn," equivalently "AnA_n eventually fails."

The First Lemma

The first lemma needs no independence assumption. It says: if the probabilities of the events sum, the events themselves can only happen finitely often.

Lemma

First Borel-Cantelli Lemma

Statement

If n=1P(An)<,\sum_{n=1}^\infty P(A_n) < \infty, then P(An i.o.)=0.P(A_n \text{ i.o.}) = 0. Equivalently, P(ω belongs to only finitely many An)=1P(\omega \text{ belongs to only finitely many } A_n) = 1.

Intuition

Summability of P(An)P(A_n) means the total expected number of AnA_n that occur is finite. A non-negative random variable with finite expectation cannot be infinite with positive probability. So the count N(ω)=n1An(ω)N(\omega) = \sum_n \mathbf{1}_{A_n}(\omega) is finite almost surely, which is exactly P(An i.o.)=0P(A_n \text{ i.o.}) = 0.

Proof Sketch

By definition {An i.o.}nNAn\{A_n \text{ i.o.}\} \subseteq \bigcup_{n \geq N} A_n for every NN. Apply countable subadditivity: P(An i.o.)P ⁣(nNAn)nNP(An)P(A_n \text{ i.o.}) \leq P\!\left(\bigcup_{n \geq N} A_n\right) \leq \sum_{n \geq N} P(A_n). Let NN \to \infty; the tail sum vanishes by summability. So P(An i.o.)=0P(A_n \text{ i.o.}) = 0.

Why It Matters

This is the workhorse of almost-sure convergence proofs. To show Xna.s.XX_n \xrightarrow{a.s.} X, it suffices to show that for every ε>0\varepsilon > 0, nP(XnX>ε)<\sum_n P(|X_n - X| > \varepsilon) < \infty, because then XnX>ε|X_n - X| > \varepsilon holds for only finitely many nn almost surely, and ε\varepsilon is arbitrary. This recipe converts any in-probability convergence with summable rate (e.g., 1/n21/n^2) into a.s. convergence. It is also how the strong law of large numbers is upgraded from the weak law in the simplest proofs.

Failure Mode

Summability is essential. If P(An)=1/nP(A_n) = 1/n, the sum diverges and the lemma gives no conclusion; one needs the second lemma (with independence) to conclude anything. The summable rate is also essential when applied to upgrade in-probability convergence: a 1/n1/n rate of in-probability convergence does not, by Borel-Cantelli alone, give a.s. convergence.

The Second Lemma

The second lemma is a partial converse: with independence, divergence of the sum forces the events to occur infinitely often almost surely.

Lemma

Second Borel-Cantelli Lemma

Statement

If A1,A2,A_1, A_2, \ldots are mutually independent and n=1P(An)=,\sum_{n=1}^\infty P(A_n) = \infty, then P(An i.o.)=1.P(A_n \text{ i.o.}) = 1.

Intuition

Without independence, divergent probabilities tell you nothing: the events could be highly correlated and effectively be the same event repeated. Independence forces the events to "use up" their probability mass on distinct trials, so divergence of P(An)\sum P(A_n) guarantees infinitely many successes.

Proof Sketch

Show P(An eventually fails)=0P(A_n \text{ eventually fails}) = 0, equivalently P ⁣(nNAnc)=0P\!\left(\bigcap_{n \geq N} A_n^c\right) = 0 for every NN. By independence, P ⁣(n=NMAnc)=n=NM(1P(An))n=NMeP(An)=en=NMP(An)P\!\left(\bigcap_{n=N}^M A_n^c\right) = \prod_{n=N}^M (1 - P(A_n)) \leq \prod_{n=N}^M e^{-P(A_n)} = e^{-\sum_{n=N}^M P(A_n)} using 1xex1 - x \leq e^{-x}. As MM \to \infty the exponent goes to -\infty, so the product is 0. Continuity of probability passes to the infinite intersection.

Why It Matters

This is what guarantees that random search, ε\varepsilon-greedy exploration, and stochastic approximation visit every region of state space infinitely often, given enough independent randomness. It also gives Borel's theorem on normal numbers: a Lebesgue-almost-every real number has each digit appearing with the right frequency, because each digit-occurrence event has summable probability bounded away from zero.

Failure Mode

Independence cannot be dropped. Counterexample: take An=AA_n = A for all nn with P(A)=1/2P(A) = 1/2. Then P(An)=\sum P(A_n) = \infty trivially, but {An i.o.}=A\{A_n \text{ i.o.}\} = A, so P(An i.o.)=1/21P(A_n \text{ i.o.}) = 1/2 \neq 1. Pairwise independence is also generally not enough; mutual (or at least sufficient) independence is what the proof needs. There are weakened versions of the second lemma using only pairwise independence with extra moment conditions (Erdos-Renyi, Kochen-Stone), useful when full independence fails.

The Zero-One Dichotomy

Together, the two lemmas yield a striking conclusion for independent events.

Borel zero-one law for independent events. If A1,A2,A_1, A_2, \ldots are mutually independent, then P(An i.o.)P(A_n \text{ i.o.}) is either 0 or 1, with the two cases distinguished by whether P(An)\sum P(A_n) converges or diverges.

So for independent sequences there is no "in-between" probability: either the events stop happening almost surely, or they keep happening almost surely. The summability of probabilities is the threshold. This is a miniature version of Kolmogorov's general 0-1 law for tail sigma-algebras, which extends the dichotomy beyond the structure of "AnA_n infinitely often" to any event in the tail sigma-algebra of an independent sequence.

Applications

Strong law of large numbers (sketch). Let X1,X2,X_1, X_2, \ldots be i.i.d. with finite fourth moment and mean zero. Markov's inequality gives P(Xˉn>ε)E[Xˉn4]/ε4P(|\bar X_n| > \varepsilon) \leq \mathbb{E}[\bar X_n^4] / \varepsilon^4. Compute E[Xˉn4]=O(1/n2)\mathbb{E}[\bar X_n^4] = O(1/n^2) using independence (cross terms vanish). Then nP(Xˉn>ε)Cn1/n2<\sum_n P(|\bar X_n| > \varepsilon) \leq C \sum_n 1/n^2 < \infty, and the first Borel-Cantelli lemma gives Xˉn0\bar X_n \to 0 almost surely. (The classical SLLN under just first moment requires more work, but the fourth-moment case follows directly from this recipe.) See the law of large numbers page for the full treatment.

Visit-every-state guarantees in MDPs. In a finite-state Markov decision process under ε\varepsilon-greedy exploration, every state is visited infinitely often almost surely. The events "state ss is visited at time nn" have probabilities bounded below by a positive constant under ε\varepsilon-greedy (for ergodic chains), and the visits are sufficiently independent across time for the second lemma to apply (with care; usually one applies a martingale strengthening rather than the raw second lemma). See the MDP page for the careful statement.

Borel's normal number theorem. Almost every real x[0,1]x \in [0, 1] has the property that, in its decimal expansion, each digit d{0,,9}d \in \{0, \ldots, 9\} appears with limiting frequency 1/101/10. Proof outline: under uniform distribution on [0,1][0, 1], the digit at position nn is independent across nn and uniform on {0,,9}\{0, \ldots, 9\}. Apply SLLN (which itself uses the first Borel-Cantelli lemma above) to the indicator of "digit nn equals dd" for each dd.

Convergence rate as a Borel-Cantelli condition. A common statement of the form "the high-probability bound holds for all sufficiently large nn" is shorthand for "by Borel-Cantelli with summable failure rate, the bound holds eventually almost surely." The recipe: if your bound has failure probability δn\delta_n with nδn<\sum_n \delta_n < \infty, you have an a.s. guarantee.

Common Confusions

Watch Out

The first lemma needs no independence; the second does

The first lemma works for any sequence of events whatsoever, independent or not. Independence is the price for the second lemma's stronger conclusion. This asymmetry is fundamental: divergent probabilities of dependent events tell you nothing about whether the events recur. It is independence that forces "divergent sum equals infinitely often."

Watch Out

Pairwise independence is generally not enough for the second lemma

The second lemma requires mutual independence in the standard proof. Pairwise independence (every pair independent) is strictly weaker and does not suffice. Erdos-Renyi and Kochen-Stone proved variants under weaker hypotheses, but the usual statement uses full mutual independence; do not apply the second lemma with only pairwise independence unless you are using a strengthened version explicitly.

Watch Out

A 1/n rate of in-probability convergence does not automatically give a.s.

To upgrade in-probability convergence to a.s. convergence via Borel-Cantelli, you need nP(XnX>ε)<\sum_n P(|X_n - X| > \varepsilon) < \infty. A rate of 1/n1/n is not summable. With rate 1/n1/n, the most you can extract directly is a.s. convergence along the sub-sequence nk=k2n_k = k^2 (whose probabilities 1/k21/k^2 do sum), giving sub-sequential a.s. convergence. To get full-sequence a.s. convergence with this rate, you typically need a martingale or maximal-inequality argument, not bare Borel-Cantelli.

Summary

  • The first Borel-Cantelli lemma needs no independence: summable probabilities imply the events occur only finitely often almost surely.
  • The second lemma needs mutual independence: divergent probabilities then imply the events occur infinitely often almost surely.
  • Together, for independent sequences, P(An i.o.)P(A_n \text{ i.o.}) is exactly 0 or 1, governed by convergence vs divergence of P(An)\sum P(A_n).
  • Standard recipe: to upgrade in-probability convergence to a.s. convergence, get a summable failure rate and apply the first lemma.
  • Standard application: visiting every state infinitely often in stochastic exploration, normal numbers, the simplest fourth-moment proof of SLLN.

Exercises

ExerciseCore

Problem

Let X1,X2,X_1, X_2, \ldots be i.i.d. with P(Xi>t)=etP(X_i > t) = e^{-t} for t0t \geq 0 (standard exponential). Show that lim supnXnlogn=1\limsup_{n \to \infty} \frac{X_n}{\log n} = 1 almost surely. (Treat it as given that the limsup is a tail event; the exercise is to show the value is 1.)

ExerciseAdvanced

Problem

Construct a sequence of events A1,A2,A_1, A_2, \ldots on a probability space such that nP(An)=\sum_n P(A_n) = \infty but P(An i.o.)=0P(A_n \text{ i.o.}) = 0. Then construct a sequence with nP(An)=\sum_n P(A_n) = \infty and P(An i.o.)=cP(A_n \text{ i.o.}) = c for some c(0,1)c \in (0, 1). Both constructions show why independence is necessary in the second Borel-Cantelli lemma.

References

Standard graduate texts:

  • Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Section 4
  • Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 2.3
  • Williams, "Probability with Martingales" (Cambridge, 1991), Sections 2.7-2.8
  • Resnick, "A Probability Path" (Birkhauser, 1999), Section 4.5

Original sources:

  • Borel, "Les probabilites denombrables et leurs applications arithmetiques" (Rendiconti del Circolo Matematico di Palermo, 1909)
  • Cantelli, "Sulla probabilita come limite della frequenza" (Atti della Reale Accademia Nazionale dei Lincei, 1917)

Refinements and applications:

  • Erdos and Renyi, "On Cantor's series with convergent 1/qn\sum 1/q_n" (Annales Universitatis Scientiarum Budapestinensis, 1959) - relaxes independence
  • Kochen and Stone, "A note on the Borel-Cantelli lemma" (Illinois Journal of Mathematics, 1964)

Next Topics

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Next Topics