Weak Law vs. Strong Law of Large Numbers. Convergence in Probability vs. Almost Sure

What Each Promises

Both statements say the sample mean $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ of i.i.d. variables with finite mean $\mu$ converges to $\mu$ . They differ in what "converges" means.

Weak law (WLLN). For every $\epsilon > 0$ , $\lim_{n \to \infty} \Pr\!\left[\lvert\bar{X}_n - \mu\rvert > \epsilon\right] = 0.$ This is convergence in probability. The probability of a fixed-size deviation vanishes, but for any given $n$ there is still some chance of a large deviation.

Strong law (SLLN). $\Pr\!\left[\lim_{n \to \infty} \bar{X}_n = \mu\right] = 1.$ This is almost-sure convergence. With probability one, the entire sequence $\bar{X}_1, \bar{X}_2, \ldots$ actually converges to $\mu$ as a deterministic limit, viewed sample path by sample path.

The almost-sure form is strictly stronger. Anything that converges almost surely converges in probability; the reverse fails.

The Gap Is Real

The simplest way to see that the gap matters: a sequence that converges in probability but not almost surely.

Let $X_n$ be independent Bernoulli variables with $\Pr[X_n = 1] = 1/n$ and $\Pr[X_n = 0] = 1 - 1/n$ .

In probability: $\Pr[\lvert X_n - 0 \rvert > \epsilon] = 1/n \to 0$ for any $\epsilon \in (0, 1)$ . So $X_n \xrightarrow{P} 0$ .

Almost surely: the events $\{X_n = 1\}$ are independent and $\sum 1/n = \infty$ . By the second Borel-Cantelli lemma, $\Pr[X_n = 1 \text{ infinitely often}] = 1$ . So along almost every sample path, $X_n$ keeps returning to $1$ infinitely often. The sequence $X_n$ does not converge to $0$ for almost every $\omega$ .

This is the canonical separation. Convergence in probability lets the exceptional "bad" event keep happening, as long as its probability shrinks. Almost-sure convergence requires that for almost every sample path, the bad event eventually stops.

Proof-Effort Comparison

Aspect	Weak law	Strong law
Convergence mode	In probability	Almost sure
Minimal i.i.d. assumption	$\mathbb{E}\lvert X \rvert < \infty$	$\mathbb{E}\lvert X \rvert < \infty$
Quick proof tool	Chebyshev inequality (if $\mathrm{Var}(X) < \infty$ )	Borel-Cantelli + 4th moment, or Etemadi
Hard proof	Truncation argument (Khintchine)	Kolmogorov three-series + truncation
Generalization beyond i.i.d.	Easier, many variants	Harder, fewer variants

The 4th-moment SLLN proof is short and goes via Markov + first Borel-Cantelli: compute $\mathbb{E}[(\bar{X}_n - \mu)^4] = O(1/n^2)$ , sum over $n$ to get $\sum_n \Pr[\lvert\bar{X}_n - \mu\rvert > \epsilon] < \infty$ , conclude that the deviation event happens only finitely often. The minimal-assumption SLLN proof (only $\mathbb{E}\lvert X \rvert < \infty$ ) needs Etemadi's argument, which truncates and uses pairwise-independent subsequences. The WLLN under the same assumption needs only a single truncation step.

When Each Form Is What You Need

The weak law is enough whenever the question is about a single value of $n$ :

Confidence intervals. $\Pr[\lvert\bar{X}_n - \mu\rvert > \epsilon] \leq \alpha$ is exactly what the WLLN bounds.
Hypothesis testing power. Probability that the test statistic crosses the threshold at sample size $n$ .
Polling and survey statistics. "How big does $n$ need to be for the estimator to be within $\epsilon$ of the truth with probability $1 - \delta$ ?"

The strong law is required whenever the question is about the whole trajectory:

Monte Carlo correctness. A simulation that runs and runs needs the guarantee that its running average will eventually settle at the right value, sample path by sample path. WLLN does not give this; the running average might keep wandering, just less and less often.
Pathwise statements in stochastic processes. "The empirical measure of a positive-recurrent Markov chain converges to the stationary distribution" is an a.s. statement; the weak version is too weak to use for individual realized trajectories.
Almost-sure consistency of estimators in asymptotic statistics. The classical MLE consistency result is a.s. consistency, not just convergence in probability.

In ML practice the WLLN suffices for most generalization arguments (empirical risk near population risk with high probability), but the SLLN is what justifies the statement "if I train for long enough on fresh i.i.d. samples, the training loss converges to the population loss along almost every training trajectory".

The Practical Gap

For most well-behaved distributions, the two limits look identical in simulation: both versions converge fast enough that the trajectory settles within an envelope of $O(1/\sqrt{n})$ . The visible distinction shows up only in pathological cases:

The Bernoulli example above, which is a sequence of non-identically-distributed variables. For i.i.d. variables, convergence in probability + finite mean does imply almost-sure convergence, so the gap vanishes.
Sequences constructed deliberately to converge in probability but not almost surely (typewriter-style examples in modes-of-convergence).

What this means in practice: for i.i.d. data with finite mean, you get both versions of the LLN for free. The gap is a theoretical-foundations issue, not a practical one. But the gap is real, and knowing which version your argument needs prevents over-claiming or under-claiming guarantees.

Common Confusions

"The strong law is just a stronger version of the weak law." Strictly true as a statement about convergence modes, but misleading as a guide to which to apply. They answer different questions. WLLN bounds a finite- $n$ probability; SLLN guarantees a limit along sample paths. The "stronger" version costs more to prove and is what stochastic-process arguments need.

"For i.i.d. data the two are the same." Almost. For i.i.d. with finite mean, both hold under the same assumption, so as theorems they are equivalent in their scope. But the conclusions are different: one bounds a probability at each $n$ , the other asserts pathwise convergence.

"Convergence in probability implies convergence almost surely along a subsequence." True (this is a classical lemma), but not the same as almost-sure convergence of the original sequence. The subsequence may depend on $\omega$ , and the inter-subsequence behavior can be wild.

Quick Decision Rule

Question	Use
"What is $\Pr[\lvert \bar{X}_n - \mu\rvert > \epsilon]$ at this $n$ ?"	WLLN
"Will my running average settle at $\mu$ ?"	SLLN
"Is my MLE consistent?"	SLLN (in classical statement); WLLN sufficient if you only want "converges in probability"
"Is empirical risk close to population risk with high probability at $n$ ?"	WLLN
"Does the Markov chain time-average converge to the stationary mean?"	SLLN

References

Canonical:

Durrett, Probability: Theory and Examples (5th ed., 2019), Sections 2.2-2.4 (WLLN, SLLN, Etemadi's proof).
Billingsley, Probability and Measure (3rd ed., 1995), Sections 6 and 22.
Kallenberg, Foundations of Modern Probability (3rd ed., 2021), Chapter 5 (strong laws and the Kolmogorov three-series theorem).

Current:

Resnick, A Probability Path (1999; reprint 2014), Chapter 7 (compact pedagogical treatment with the Borel-Cantelli counter-example).