What Each Promises
Both statements say the sample mean of i.i.d. variables with finite mean converges to . They differ in what "converges" means.
Weak law (WLLN). For every , This is convergence in probability. The probability of a fixed-size deviation vanishes, but for any given there is still some chance of a large deviation.
Strong law (SLLN). This is almost-sure convergence. With probability one, the entire sequence actually converges to as a deterministic limit, viewed sample path by sample path.
The almost-sure form is strictly stronger. Anything that converges almost surely converges in probability; the reverse fails.
The Gap Is Real
The simplest way to see that the gap matters: a sequence that converges in probability but not almost surely.
Let be independent Bernoulli variables with and .
In probability: for any . So .
Almost surely: the events are independent and . By the second Borel-Cantelli lemma, . So along almost every sample path, keeps returning to infinitely often. The sequence does not converge to for almost every .
This is the canonical separation. Convergence in probability lets the exceptional "bad" event keep happening, as long as its probability shrinks. Almost-sure convergence requires that for almost every sample path, the bad event eventually stops.
Proof-Effort Comparison
| Aspect | Weak law | Strong law |
|---|---|---|
| Convergence mode | In probability | Almost sure |
| Minimal i.i.d. assumption | ||
| Quick proof tool | Chebyshev inequality (if ) | Borel-Cantelli + 4th moment, or Etemadi |
| Hard proof | Truncation argument (Khintchine) | Kolmogorov three-series + truncation |
| Generalization beyond i.i.d. | Easier, many variants | Harder, fewer variants |
The 4th-moment SLLN proof is short and goes via Markov + first Borel-Cantelli: compute , sum over to get , conclude that the deviation event happens only finitely often. The minimal-assumption SLLN proof (only ) needs Etemadi's argument, which truncates and uses pairwise-independent subsequences. The WLLN under the same assumption needs only a single truncation step.
When Each Form Is What You Need
The weak law is enough whenever the question is about a single value of :
- Confidence intervals. is exactly what the WLLN bounds.
- Hypothesis testing power. Probability that the test statistic crosses the threshold at sample size .
- Polling and survey statistics. "How big does need to be for the estimator to be within of the truth with probability ?"
The strong law is required whenever the question is about the whole trajectory:
- Monte Carlo correctness. A simulation that runs and runs needs the guarantee that its running average will eventually settle at the right value, sample path by sample path. WLLN does not give this; the running average might keep wandering, just less and less often.
- Pathwise statements in stochastic processes. "The empirical measure of a positive-recurrent Markov chain converges to the stationary distribution" is an a.s. statement; the weak version is too weak to use for individual realized trajectories.
- Almost-sure consistency of estimators in asymptotic statistics. The classical MLE consistency result is a.s. consistency, not just convergence in probability.
In ML practice the WLLN suffices for most generalization arguments (empirical risk near population risk with high probability), but the SLLN is what justifies the statement "if I train for long enough on fresh i.i.d. samples, the training loss converges to the population loss along almost every training trajectory".
The Practical Gap
For most well-behaved distributions, the two limits look identical in simulation: both versions converge fast enough that the trajectory settles within an envelope of . The visible distinction shows up only in pathological cases:
- The Bernoulli example above, which is a sequence of non-identically-distributed variables. For i.i.d. variables, convergence in probability + finite mean does imply almost-sure convergence, so the gap vanishes.
- Sequences constructed deliberately to converge in probability but not almost surely (typewriter-style examples in modes-of-convergence).
What this means in practice: for i.i.d. data with finite mean, you get both versions of the LLN for free. The gap is a theoretical-foundations issue, not a practical one. But the gap is real, and knowing which version your argument needs prevents over-claiming or under-claiming guarantees.
Common Confusions
"The strong law is just a stronger version of the weak law." Strictly true as a statement about convergence modes, but misleading as a guide to which to apply. They answer different questions. WLLN bounds a finite- probability; SLLN guarantees a limit along sample paths. The "stronger" version costs more to prove and is what stochastic-process arguments need.
"For i.i.d. data the two are the same." Almost. For i.i.d. with finite mean, both hold under the same assumption, so as theorems they are equivalent in their scope. But the conclusions are different: one bounds a probability at each , the other asserts pathwise convergence.
"Convergence in probability implies convergence almost surely along a subsequence." True (this is a classical lemma), but not the same as almost-sure convergence of the original sequence. The subsequence may depend on , and the inter-subsequence behavior can be wild.
Quick Decision Rule
| Question | Use |
|---|---|
| "What is at this ?" | WLLN |
| "Will my running average settle at ?" | SLLN |
| "Is my MLE consistent?" | SLLN (in classical statement); WLLN sufficient if you only want "converges in probability" |
| "Is empirical risk close to population risk with high probability at ?" | WLLN |
| "Does the Markov chain time-average converge to the stationary mean?" | SLLN |
References
Canonical:
- Durrett, Probability: Theory and Examples (5th ed., 2019), Sections 2.2-2.4 (WLLN, SLLN, Etemadi's proof).
- Billingsley, Probability and Measure (3rd ed., 1995), Sections 6 and 22.
- Kallenberg, Foundations of Modern Probability (3rd ed., 2021), Chapter 5 (strong laws and the Kolmogorov three-series theorem).
Current:
- Resnick, A Probability Path (1999; reprint 2014), Chapter 7 (compact pedagogical treatment with the Borel-Cantelli counter-example).