Mathematical Infrastructure
Borel-Cantelli Lemmas
The two lemmas that bridge in-probability convergence to almost-sure convergence. First lemma: summable probabilities imply the events occur only finitely often; Second lemma: independence plus divergence implies infinitely often.
Why This Matters
Most learning theorems give you in-probability statements: "with probability at least , the empirical risk is within of the true risk." To upgrade these to "almost-sure" guarantees (which the strong law of large numbers and many martingale arguments require), you need a tool that converts a sequence of high-probability statements into a single probability-1 statement about the entire trajectory. That tool is the first Borel-Cantelli lemma.
The second lemma points the other way: it tells you when events with non-summable probabilities are guaranteed to recur infinitely often. This is the source of every "the algorithm visits every state infinitely often" guarantee in stochastic approximation and reinforcement learning, and it underlies Borel's theorem that Lebesgue-almost every real number is normal.
Both lemmas are two-line statements; between them they supply the strong law of large numbers and every "visit every state infinitely often" guarantee in stochastic approximation.
The Setup: Limit Superior of Events
For a sequence of events , the limit superior is
In words: iff belongs to infinitely many of the . The phrase " infinitely often" abbreviates this and is standard notation. The complement is the event " holds for only finitely many ," equivalently " eventually fails."
The First Lemma
The first lemma needs no independence assumption. It says: if the probabilities of the events sum, the events themselves can only happen finitely often.
First Borel-Cantelli Lemma
Statement
If then Equivalently, .
Intuition
Summability of means the total expected number of that occur is finite. A non-negative random variable with finite expectation cannot be infinite with positive probability. So the count is finite almost surely, which is exactly .
Proof Sketch
By definition for every . Apply countable subadditivity: . Let ; the tail sum vanishes by summability. So .
Why It Matters
This is the workhorse of almost-sure convergence proofs. To show , it suffices to show that for every , , because then holds for only finitely many almost surely, and is arbitrary. This recipe converts any in-probability convergence with summable rate (e.g., ) into a.s. convergence. It is also how the strong law of large numbers is upgraded from the weak law in the simplest proofs.
Failure Mode
Summability is essential. If , the sum diverges and the lemma gives no conclusion; one needs the second lemma (with independence) to conclude anything. The summable rate is also essential when applied to upgrade in-probability convergence: a rate of in-probability convergence does not, by Borel-Cantelli alone, give a.s. convergence.
The Second Lemma
The second lemma is a partial converse: with independence, divergence of the sum forces the events to occur infinitely often almost surely.
Second Borel-Cantelli Lemma
Statement
If are mutually independent and then
Intuition
Without independence, divergent probabilities tell you nothing: the events could be highly correlated and effectively be the same event repeated. Independence forces the events to "use up" their probability mass on distinct trials, so divergence of guarantees infinitely many successes.
Proof Sketch
Show , equivalently for every . By independence, using . As the exponent goes to , so the product is 0. Continuity of probability passes to the infinite intersection.
Why It Matters
This is what guarantees that random search, -greedy exploration, and stochastic approximation visit every region of state space infinitely often, given enough independent randomness. It also gives Borel's theorem on normal numbers: a Lebesgue-almost-every real number has each digit appearing with the right frequency, because each digit-occurrence event has summable probability bounded away from zero.
Failure Mode
Independence cannot be dropped. Counterexample: take for all with . Then trivially, but , so . Pairwise independence is also generally not enough; mutual (or at least sufficient) independence is what the proof needs. There are weakened versions of the second lemma using only pairwise independence with extra moment conditions (Erdos-Renyi, Kochen-Stone), useful when full independence fails.
The Zero-One Dichotomy
Together, the two lemmas yield a striking conclusion for independent events.
Borel zero-one law for independent events. If are mutually independent, then is either 0 or 1, with the two cases distinguished by whether converges or diverges.
So for independent sequences there is no "in-between" probability: either the events stop happening almost surely, or they keep happening almost surely. The summability of probabilities is the threshold. This is a miniature version of Kolmogorov's general 0-1 law for tail sigma-algebras, which extends the dichotomy beyond the structure of " infinitely often" to any event in the tail sigma-algebra of an independent sequence.
Applications
Strong law of large numbers (sketch). Let be i.i.d. with finite fourth moment and mean zero. Markov's inequality gives . Compute using independence (cross terms vanish). Then , and the first Borel-Cantelli lemma gives almost surely. (The classical SLLN under just first moment requires more work, but the fourth-moment case follows directly from this recipe.) See the law of large numbers page for the full treatment.
Visit-every-state guarantees in MDPs. In a finite-state Markov decision process under -greedy exploration, every state is visited infinitely often almost surely. The events "state is visited at time " have probabilities bounded below by a positive constant under -greedy (for ergodic chains), and the visits are sufficiently independent across time for the second lemma to apply (with care; usually one applies a martingale strengthening rather than the raw second lemma). See the MDP page for the careful statement.
Borel's normal number theorem. Almost every real has the property that, in its decimal expansion, each digit appears with limiting frequency . Proof outline: under uniform distribution on , the digit at position is independent across and uniform on . Apply SLLN (which itself uses the first Borel-Cantelli lemma above) to the indicator of "digit equals " for each .
Convergence rate as a Borel-Cantelli condition. A common statement of the form "the high-probability bound holds for all sufficiently large " is shorthand for "by Borel-Cantelli with summable failure rate, the bound holds eventually almost surely." The recipe: if your bound has failure probability with , you have an a.s. guarantee.
Common Confusions
The first lemma needs no independence; the second does
The first lemma works for any sequence of events whatsoever, independent or not. Independence is the price for the second lemma's stronger conclusion. This asymmetry is fundamental: divergent probabilities of dependent events tell you nothing about whether the events recur. It is independence that forces "divergent sum equals infinitely often."
Pairwise independence is generally not enough for the second lemma
The second lemma requires mutual independence in the standard proof. Pairwise independence (every pair independent) is strictly weaker and does not suffice. Erdos-Renyi and Kochen-Stone proved variants under weaker hypotheses, but the usual statement uses full mutual independence; do not apply the second lemma with only pairwise independence unless you are using a strengthened version explicitly.
A 1/n rate of in-probability convergence does not automatically give a.s.
To upgrade in-probability convergence to a.s. convergence via Borel-Cantelli, you need . A rate of is not summable. With rate , the most you can extract directly is a.s. convergence along the sub-sequence (whose probabilities do sum), giving sub-sequential a.s. convergence. To get full-sequence a.s. convergence with this rate, you typically need a martingale or maximal-inequality argument, not bare Borel-Cantelli.
Summary
- The first Borel-Cantelli lemma needs no independence: summable probabilities imply the events occur only finitely often almost surely.
- The second lemma needs mutual independence: divergent probabilities then imply the events occur infinitely often almost surely.
- Together, for independent sequences, is exactly 0 or 1, governed by convergence vs divergence of .
- Standard recipe: to upgrade in-probability convergence to a.s. convergence, get a summable failure rate and apply the first lemma.
- Standard application: visiting every state infinitely often in stochastic exploration, normal numbers, the simplest fourth-moment proof of SLLN.
Exercises
Problem
Let be i.i.d. with for (standard exponential). Show that almost surely. (Treat it as given that the limsup is a tail event; the exercise is to show the value is 1.)
Problem
Construct a sequence of events on a probability space such that but . Then construct a sequence with and for some . Both constructions show why independence is necessary in the second Borel-Cantelli lemma.
References
Standard graduate texts:
- Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Section 4
- Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 2.3
- Williams, "Probability with Martingales" (Cambridge, 1991), Sections 2.7-2.8
- Resnick, "A Probability Path" (Birkhauser, 1999), Section 4.5
Original sources:
- Borel, "Les probabilites denombrables et leurs applications arithmetiques" (Rendiconti del Circolo Matematico di Palermo, 1909)
- Cantelli, "Sulla probabilita come limite della frequenza" (Atti della Reale Accademia Nazionale dei Lincei, 1917)
Refinements and applications:
- Erdos and Renyi, "On Cantor's series with convergent " (Annales Universitatis Scientiarum Budapestinensis, 1959) - relaxes independence
- Kochen and Stone, "A note on the Borel-Cantelli lemma" (Illinois Journal of Mathematics, 1964)
Next Topics
- Law of large numbers: the canonical application, where Borel-Cantelli upgrades the weak law to the strong law
- Modes of convergence of random variables: the parent context, where Borel-Cantelli is the bridge from in-probability to a.s.
- Stochastic approximation theory: where the second lemma supports "visit-every-state" guarantees
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.