Borel-Cantelli Lemmas

Sneiderman, Robby

Mathematical Infrastructure

Borel-Cantelli Lemmas

The two lemmas that bridge in-probability convergence to almost-sure convergence. First lemma: summable probabilities imply the events occur only finitely often; Second lemma: independence plus divergence implies infinitely often.

CoreTier 1StableCore spine~30 min

Prerequisites

Measure Theoretic Probability Modes of Convergence Random Variables

Start 8-question practice · 6 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

mathematical-infrastructure | layer 0B | tier 1. This page has 2 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Law of Large Numbers

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Source-grounded page

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Borel–Cantelli: convergence of ΣP(A_n) decides whether events occur infinitely often

Most learning theorems give you in-probability statements: "with probability at least $1 - \delta$ , the empirical risk is within $\varepsilon$ of the true risk." To upgrade these to "almost-sure" guarantees (which the strong law of large numbers and many martingale arguments require), you need a tool that converts a sequence of high-probability statements into a single probability-1 statement about the entire trajectory. That tool is the first Borel-Cantelli lemma.

The second lemma points the other way: it tells you when events with non-summable probabilities are guaranteed to recur infinitely often. This is the source of every "the algorithm visits every state infinitely often" guarantee in stochastic approximation and reinforcement learning, and it underlies Borel's theorem that Lebesgue-almost every real number is normal.

The Setup: Limit Superior of Events

For a sequence of events $A_1, A_2, \ldots \in \mathcal{F}$ , the limit superior is

$\{A_n \text{ i.o.}\} := \limsup_{n \to \infty} A_n = \bigcap_{N=1}^\infty \bigcup_{n=N}^\infty A_n.$

In words: $\omega \in \{A_n \text{ i.o.}\}$ iff $\omega$ belongs to infinitely many of the $A_n$ . The phrase " $A_n$ infinitely often" abbreviates this and is standard notation. The complement is the event " $A_n$ holds for only finitely many $n$ ," equivalently " $A_n$ eventually fails."

The First Lemma

The first lemma needs no independence assumption. It says: if the probabilities of the events sum, the events themselves can only happen finitely often.

Lemma

First Borel-Cantelli Lemma

Statement

If $\sum_{n=1}^\infty P(A_n) < \infty,$ then $P(A_n \text{ i.o.}) = 0.$ Equivalently, $P(\omega \text{ belongs to only finitely many } A_n) = 1$ .

Intuition

Summability of $P(A_n)$ means the total expected number of $A_n$ that occur is finite. A non-negative random variable with finite expectation cannot be infinite with positive probability. So the count $N(\omega) = \sum_n \mathbf{1}_{A_n}(\omega)$ is finite almost surely, which is exactly $P(A_n \text{ i.o.}) = 0$ .

Proof Sketch

By definition $\{A_n \text{ i.o.}\} \subseteq \bigcup_{n \geq N} A_n$ for every $N$ . Apply countable subadditivity: $P(A_n \text{ i.o.}) \leq P\!\left(\bigcup_{n \geq N} A_n\right) \leq \sum_{n \geq N} P(A_n)$ . Let $N \to \infty$ ; the tail sum vanishes by summability. So $P(A_n \text{ i.o.}) = 0$ .

Why It Matters

This is the workhorse of almost-sure convergence proofs. To show $X_n \xrightarrow{a.s.} X$ , it suffices to show that for every $\varepsilon > 0$ , $\sum_n P(|X_n - X| > \varepsilon) < \infty$ , because then $|X_n - X| > \varepsilon$ holds for only finitely many $n$ almost surely, and $\varepsilon$ is arbitrary. This recipe converts any in-probability convergence with summable rate (e.g., $1/n^2$ ) into a.s. convergence. It is also how the strong law of large numbers is upgraded from the weak law in the simplest proofs.

Failure Mode

Summability is essential. If $P(A_n) = 1/n$ , the sum diverges and the lemma gives no conclusion; one needs the second lemma (with independence) to conclude anything. The summable rate is also essential when applied to upgrade in-probability convergence: a $1/n$ rate of in-probability convergence does not, by Borel-Cantelli alone, give a.s. convergence.

report a correction →

The Second Lemma

The second lemma is a partial converse: with independence, divergence of the sum forces the events to occur infinitely often almost surely.

Lemma

Second Borel-Cantelli Lemma

Statement

If $A_1, A_2, \ldots$ are mutually independent and $\sum_{n=1}^\infty P(A_n) = \infty,$ then $P(A_n \text{ i.o.}) = 1.$

Intuition

Without independence, divergent probabilities tell you nothing: the events could be highly correlated and effectively be the same event repeated. Independence forces the events to "use up" their probability mass on distinct trials, so divergence of $\sum P(A_n)$ guarantees infinitely many successes.

Proof Sketch

Show $P(A_n \text{ eventually fails}) = 0$ , equivalently $P\!\left(\bigcap_{n \geq N} A_n^c\right) = 0$ for every $N$ . By independence, $P\!\left(\bigcap_{n=N}^M A_n^c\right) = \prod_{n=N}^M (1 - P(A_n)) \leq \prod_{n=N}^M e^{-P(A_n)} = e^{-\sum_{n=N}^M P(A_n)}$ using $1 - x \leq e^{-x}$ . As $M \to \infty$ the exponent goes to $-\infty$ , so the product is 0. Continuity of probability passes to the infinite intersection.

Why It Matters

This is what guarantees that random search, $\varepsilon$ -greedy exploration, and stochastic approximation visit every region of state space infinitely often, given enough independent randomness. The second lemma also feeds into Borel's theorem on normal numbers: under uniform Lebesgue measure on $[0,1]$ , the digits of a random number are i.i.d. uniform on $\{0, \ldots, 9\}$ , so each digit-occurrence event has constant probability $1/10$ and the sum diverges. The second lemma immediately gives that each digit appears infinitely often almost surely. Frequency control (each digit appearing with limiting frequency exactly $1/10$ ) is the stronger statement and requires a law of large numbers applied to the digit indicators, not Borel-Cantelli alone, as shown in the Applications section below.

Failure Mode

Independence cannot be dropped. Counterexample: take $A_n = A$ for all $n$ with $P(A) = 1/2$ . Then $\sum P(A_n) = \infty$ trivially, but $\{A_n \text{ i.o.}\} = A$ , so $P(A_n \text{ i.o.}) = 1/2 \neq 1$ . Pairwise independence is also generally not enough; mutual (or at least sufficient) independence is what the proof needs. There are weakened versions of the second lemma using only pairwise independence with extra moment conditions (Erdos-Renyi, Kochen-Stone), useful when full independence fails.

report a correction →

The Zero-One Dichotomy

Together, the two lemmas yield a striking conclusion for independent events.

Borel zero-one law for independent events. If $A_1, A_2, \ldots$ are mutually independent, then $P(A_n \text{ i.o.})$ is either 0 or 1, with the two cases distinguished by whether $\sum P(A_n)$ converges or diverges.

So for independent sequences there is no "in-between" probability: either the events stop happening almost surely, or they keep happening almost surely. The summability of probabilities is the threshold. This is a miniature version of Kolmogorov's general 0-1 law for tail sigma-algebras, which extends the dichotomy beyond the structure of " $A_n$ infinitely often" to any event in the tail sigma-algebra of an independent sequence.

Applications

Strong law of large numbers (sketch). Let $X_1, X_2, \ldots$ be i.i.d. with finite fourth moment and mean zero. Markov's inequality gives $P(|\bar X_n| > \varepsilon) \leq \mathbb{E}[\bar X_n^4] / \varepsilon^4$ . Compute $\mathbb{E}[\bar X_n^4] = O(1/n^2)$ using independence (cross terms vanish). Then $\sum_n P(|\bar X_n| > \varepsilon) \leq C \sum_n 1/n^2 < \infty$ , and the first Borel-Cantelli lemma gives $\bar X_n \to 0$ almost surely. (The classical SLLN under just first moment requires more work, but the fourth-moment case follows directly from this recipe.) See the law of large numbers page for the full treatment.

Visit-every-state guarantees in MDPs. In a finite-state Markov decision process under $\varepsilon$ -greedy exploration, every state is visited infinitely often almost surely. The events "state $s$ is visited at time $n$ " have probabilities bounded below by a positive constant under $\varepsilon$ -greedy (for ergodic chains), and the visits are sufficiently independent across time for the second lemma to apply (with care; usually one applies a martingale strengthening rather than the raw second lemma). See the MDP page for the careful statement.

Borel's normal number theorem. Almost every real $x \in [0, 1]$ has the property that, in its decimal expansion, each digit $d \in \{0, \ldots, 9\}$ appears with limiting frequency $1/10$ . Proof outline: under uniform distribution on $[0, 1]$ , the digit at position $n$ is independent across $n$ and uniform on $\{0, \ldots, 9\}$ . Apply SLLN (which itself uses the first Borel-Cantelli lemma above) to the indicator of "digit $n$ equals $d$ " for each $d$ .

Convergence rate as a Borel-Cantelli condition. A common statement of the form "the high-probability bound holds for all sufficiently large $n$ " is shorthand for "by Borel-Cantelli with summable failure rate, the bound holds eventually almost surely." The recipe: if your bound has failure probability $\delta_n$ with $\sum_n \delta_n < \infty$ , you have an a.s. guarantee.

Common Confusions

Watch Out

The first lemma needs no independence; the second does

The first lemma works for any sequence of events whatsoever, independent or not. Independence is the price for the second lemma's stronger conclusion. This asymmetry is fundamental: divergent probabilities of dependent events tell you nothing about whether the events recur. It is independence that forces "divergent sum equals infinitely often."

Watch Out

Pairwise independence is generally not enough for the second lemma

The second lemma requires mutual independence in the standard proof. Pairwise independence (every pair independent) is strictly weaker and does not suffice. Erdos-Renyi and Kochen-Stone proved variants under weaker hypotheses, but the usual statement uses full mutual independence; do not apply the second lemma with only pairwise independence unless you are using a strengthened version explicitly.

Watch Out

A 1/n rate of in-probability convergence does not automatically give a.s.

To upgrade in-probability convergence to a.s. convergence via Borel-Cantelli, you need $\sum_n P(|X_n - X| > \varepsilon) < \infty$ . A rate of $1/n$ is not summable. With rate $1/n$ , the most you can extract directly is a.s. convergence along the sub-sequence $n_k = k^2$ (whose probabilities $1/k^2$ do sum), giving sub-sequential a.s. convergence. To get full-sequence a.s. convergence with this rate, you typically need a martingale or maximal-inequality argument, not bare Borel-Cantelli.

Summary

The first Borel-Cantelli lemma needs no independence: summable probabilities imply the events occur only finitely often almost surely.
The second lemma needs mutual independence: divergent probabilities then imply the events occur infinitely often almost surely.
Together, for independent sequences, $P(A_n \text{ i.o.})$ is exactly 0 or 1, governed by convergence vs divergence of $\sum P(A_n)$ .
Standard recipe: to upgrade in-probability convergence to a.s. convergence, get a summable failure rate and apply the first lemma.
Standard application: visiting every state infinitely often in stochastic exploration, normal numbers, the simplest fourth-moment proof of SLLN.

Exercises

ExerciseCore

Problem

Let $X_1, X_2, \ldots$ be i.i.d. with $P(X_i > t) = e^{-t}$ for $t \geq 0$ (standard exponential). Show that $\limsup_{n \to \infty} \frac{X_n}{\log n} = 1$ almost surely. (Treat it as given that the limsup is a tail event; the exercise is to show the value is 1.)

ExerciseAdvanced

Problem

Construct a sequence of events $A_1, A_2, \ldots$ on a probability space such that $\sum_n P(A_n) = \infty$ but $P(A_n \text{ i.o.}) = 0$ . Then construct a sequence with $\sum_n P(A_n) = \infty$ and $P(A_n \text{ i.o.}) = c$ for some $c \in (0, 1)$ . Both constructions show why independence is necessary in the second Borel-Cantelli lemma.

References

Standard graduate texts:

Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Section 4
Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 2.3
Williams, "Probability with Martingales" (Cambridge, 1991), Sections 2.7-2.8
Resnick, "A Probability Path" (Birkhauser, 1999), Section 4.5

Original sources:

Borel, "Les probabilites denombrables et leurs applications arithmetiques" (Rendiconti del Circolo Matematico di Palermo, 1909)
Cantelli, "Sulla probabilita come limite della frequenza" (Atti della Reale Accademia Nazionale dei Lincei, 1917)

Refinements and applications:

Erdos and Renyi, "On Cantor's series with convergent $\sum 1/q_n$ " (Annales Universitatis Scientiarum Budapestinensis, 1959) - relaxes independence
Kochen and Stone, "A note on the Borel-Cantelli lemma" (Illinois Journal of Mathematics, 1964)

Next Topics

Law of large numbers: the canonical application, where Borel-Cantelli upgrades the weak law to the strong law
Modes of convergence of random variables: the parent context, where Borel-Cantelli is the bridge from in-probability to a.s.
Stochastic approximation theory: where the second lemma supports "visit-every-state" guarantees

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Measure-Theoretic Probabilitylayer 0B · tier 1
Modes of Convergence of Random Variableslayer 0B · tier 1

Derived topics

2

Law of Large Numberslayer 0B · tier 1
Stochastic Approximation Theorylayer 2 · tier 2

Graph-backed continuations

Law of Large Numbers Stochastic Approximation Theory