Foundations
Sequences and Series of Functions
Pointwise vs uniform convergence of function sequences, the Weierstrass M-test, and why uniform convergence preserves continuity. The concept that makes learning theory work.
Prerequisites
Why This Matters
Uniform convergence is the single most important concept connecting classical analysis to learning theory. The reason ERM (empirical risk minimization) works is that empirical risk converges uniformly to population risk over a hypothesis class. If the convergence were only pointwise, ERM would have no generalization guarantees. Understanding the distinction between pointwise and uniform convergence is prerequisite for understanding any generalization bound.
Pointwise Convergence
Pointwise Convergence
A sequence of functions converges pointwise to if for every :
Equivalently: for every and every , there exists such that for all .
The critical detail: may depend on . Different points may require different numbers of terms to get close to the limit.
Pointwise but not uniform convergence
Let on . For each , . At , for all . So the pointwise limit is:
Each is continuous, but the pointwise limit is discontinuous. This shows that pointwise convergence does not preserve continuity.
Uniform Convergence
Uniform Convergence
A sequence converges uniformly to if:
Equivalently: for every , there exists (independent of ) such that for all and all .
The difference from pointwise convergence: in uniform convergence, a single works for all simultaneously. This "uniformity over " is what gives the concept its power.
Main Theorems
Uniform Limit of Continuous Functions is Continuous
Statement
If is continuous for each and uniformly on , then is continuous on .
Intuition
Uniform convergence means the entire graph of is within an -tube around for large . Since is continuous (no jumps) and is close to everywhere simultaneously, cannot have jumps either.
Proof Sketch
Fix and . By uniform convergence, choose so that for all . By continuity of , choose so that when . Then by triangle inequality:
Why It Matters
This theorem explains why uniform convergence is the right notion for learning theory. When empirical risk converges uniformly to population risk, the "landscape" of risk values is preserved: if a hypothesis has low empirical risk, it must have low population risk. Pointwise convergence would not give this guarantee.
Failure Mode
The theorem fails for pointwise convergence. The example on gives a discontinuous limit from continuous functions. In learning theory terms: if empirical risk converges to population risk only pointwise (for each fixed hypothesis), the ERM hypothesis could still have high population risk.
The Weierstrass M-Test
Weierstrass M-Test
Statement
Let satisfy for all , where . Then the series converges uniformly and absolutely on .
Intuition
If you can bound each term by a constant that does not depend on , and these constants form a convergent series, then the function series converges uniformly. The domination by the constant series controls the "worst case" over all simultaneously.
Proof Sketch
For any , . Since converges, its tail . Therefore as , giving uniform Cauchy and hence uniform convergence by completeness of .
Why It Matters
The M-test is the standard tool for proving that series of functions (e.g., Fourier series, power series, series expansions of kernels) converge uniformly. In ML, it appears when establishing that certain function approximations converge uniformly, which is needed for generalization guarantees.
Failure Mode
The M-test is sufficient but not necessary. A series may converge uniformly even when no dominating summable sequence exists. The M-test also requires pointwise bounds that are independent of ; if the bounds grow with (e.g., on an unbounded domain), the test does not apply directly.
Connection to Learning Theory
In statistical learning theory, define:
- (population risk)
- (empirical risk)
The function is a random function that approximates .
Pointwise convergence: for each fixed , by the law of large numbers. This is not useful for ERM because the ERM hypothesis depends on the data.
Uniform convergence: . This guarantees that the ERM hypothesis has population risk close to its empirical risk, which is the foundation of generalization bounds.
The entire program of VC theory, Rademacher complexity, and covering numbers is devoted to establishing conditions under which this uniform convergence holds.
Common Confusions
Pointwise convergence is not useless, just insufficient for ERM
The law of large numbers gives pointwise convergence of empirical risk to population risk for free. The hard work in learning theory is upgrading this to uniform convergence over the hypothesis class. The gap between pointwise and uniform is exactly the gap between "each hypothesis generalizes" and "the selected hypothesis generalizes."
Uniform convergence is not about rate, it is about uniformity
A sequence can converge uniformly at a slow rate or pointwise at a fast rate. The distinction is not about speed but about whether a single works for all . In learning theory, the rate matters too (it determines sample complexity), but the uniformity is the conceptual breakthrough.
Summary
- Pointwise: for each , . The convergence speed may vary with
- Uniform: . One works for all
- Uniform convergence preserves continuity; pointwise does not
- The Weierstrass M-test: if and , then converges uniformly
- Uniform convergence of empirical risk to population risk is the foundation of learning theory
Exercises
Problem
Let on . Show that pointwise. Does uniformly?
Problem
Explain why the law of large numbers alone is insufficient to prove that ERM generalizes. What additional property of the hypothesis class is needed, and how does it relate to uniform convergence?
Related Comparisons
References
Canonical:
- Rudin, Principles of Mathematical Analysis, Chapter 7
- Shalev-Shwartz and Ben-David, Understanding Machine Learning, Chapter 4 (uniform convergence in learning)
Current:
-
Wainwright, High-Dimensional Statistics, Chapter 4 (uniform laws of large numbers)
-
Munkres, Topology (2000), Chapter 1 (set theory review)
Next Topics
- Empirical risk minimization: where uniform convergence meets learning theory
- Uniform convergence: the formal learning-theoretic framework
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.