What Each Measures
Both describe how a sequence of functions approaches a limit function . They differ in whether the convergence rate is allowed to depend on the input point.
Pointwise convergence: for each fixed , as . The speed of convergence can vary across different .
Uniform convergence: at the same rate for all simultaneously. The convergence is controlled by .
Side-by-Side Statement
Pointwise Convergence
A sequence converges pointwise to if:
The threshold is allowed to depend on .
Uniform Convergence
A sequence converges uniformly to if:
The threshold depends only on , not on .
The quantifier order is the key difference. Pointwise: "for all , there exists " (different per ). Uniform: "there exists such that for all " (one works everywhere).
Where Each Is Stronger
Pointwise convergence is easier to establish
Any uniformly convergent sequence is pointwise convergent, but not vice versa. Pointwise convergence only requires checking each individually.
Uniform convergence preserves more structure
Uniform convergence preserves continuity: if each is continuous and uniformly, then is continuous. Pointwise convergence does not guarantee this. Uniform convergence also allows interchange of limits with integration and differentiation under mild conditions.
The Classic Counterexample
Consider on . For each , . At , for all . So pointwise:
Each is continuous, but the pointwise limit is discontinuous. The convergence is not uniform: , and this supremum is 1 for all (take close to 1). So the uniform distance never shrinks, even though pointwise convergence holds everywhere.
Why This Matters for Learning Theory
In learning theory, the connection to ERM makes this distinction critical. Consider:
By the law of large numbers, for each fixed , as . This is pointwise convergence over the hypothesis class (think of each as a "point").
But ERM selects , which depends on . To guarantee that is small, we need:
This is uniform convergence over . Without it, ERM can select a hypothesis that happens to have low empirical risk by luck (overfitting), with population risk far from the empirical risk.
Where Each Fails
Pointwise convergence fails for optimization
If I know that pointwise and I minimize , the minimizer of need not converge to the minimizer of . The minimizer can "chase" the points where convergence is slowest. This is exactly the overfitting phenomenon in ERM.
Uniform convergence can be too strong
For infinite hypothesis classes like neural networks, uniform convergence bounds (VC dimension, Rademacher complexity) can be vacuously large. Modern deep learning generalizes despite the failure of uniform convergence bounds to provide useful guarantees. This has led to research on alternatives: algorithmic stability, PAC-Bayes bounds, and compression-based arguments that do not require uniform convergence.
Key Assumptions That Differ
| Pointwise | Uniform | |
|---|---|---|
| Rate dependence | Can vary with | Same for all |
| Preserves continuity | No | Yes |
| Allows limit-integral swap | Not in general | Yes (bounded convergence theorem) |
| Suffices for ERM | No | Yes |
| Complexity measure needed | None | VC dim, Rademacher, covering numbers |
When a Researcher Would Use Each
Consistency of an estimator at a fixed parameter
To prove that in probability for a fixed true parameter , pointwise convergence suffices. This is the standard consistency proof for MLE: show that the log-likelihood converges pointwise to its expectation.
Proving generalization bounds for ERM
To bound , you need , which is uniform convergence. The rate of this convergence depends on the complexity of .
M-estimation and argmax continuity
When proving that the maximizer of an empirical criterion converges to the maximizer of the population criterion, the standard approach uses uniform convergence of the criterion function. Pointwise convergence of the criterion does not suffice because the argmax is a discontinuous functional.
Common Confusions
Uniform convergence of empirical risk is about the hypothesis class, not the data
The supremum is over hypotheses, not data points. A finite hypothesis class always has uniform convergence (by a union bound). An infinite class may or may not, depending on its complexity.
Pointwise convergence plus compactness does not give uniform convergence
A common error: "the hypothesis class is compact, so pointwise convergence implies uniform convergence." This is false in general. You need equicontinuity (Arzela-Ascoli) or similar conditions. For function sequences, Dini's theorem gives uniform convergence on compact sets if the convergence is monotone and the limit is continuous, but these conditions do not always hold.
What to Memorize
- Pointwise: . Different per point.
- Uniform: . One for all points.
- ERM needs uniform convergence over the hypothesis class, not just pointwise.
- Classic counterexample: on converges pointwise but not uniformly.
- Learning theory implication: the complexity of (VC dim, Rademacher) controls the rate of uniform convergence and therefore the sample complexity of ERM.