Sequences and Series of Functions

Sneiderman, Robby

Foundations

Sequences and Series of Functions

Pointwise vs uniform convergence of function sequences, the Weierstrass M-test, and why uniform convergence preserves continuity. The concept that makes learning theory work.

CoreTier 2StableSupporting~35 min

Prerequisites

Metric Spaces Convergence Completeness

Start 8-question practice · 1 available Prereq Map

Learning position

Read this page in the graph.

foundations | layer 0A | tier 2. This page has 1 direct prerequisite and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Empirical Risk Minimization

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Uniform convergence is one of the central concepts connecting classical analysis to learning theory. The classical reason ERM (empirical risk minimization) generalizes is that empirical risk converges uniformly to population risk over a hypothesis class; pointwise convergence (the law of large numbers applied to one fixed hypothesis) is not enough to control the ERM hypothesis, which is itself data-dependent. Other routes to generalization — algorithmic stability, PAC-Bayes, information-theoretic bounds — sidestep uniform convergence, but the distinction between pointwise and uniform convergence is still prerequisite for understanding the original VC and Rademacher arguments.

Pointwise Convergence

Definition

Pointwise Convergence

A sequence of functions $f_n: X \to \mathbb{R}$ converges pointwise to $f: X \to \mathbb{R}$ if for every $x \in X$ :

$\lim_{n \to \infty} f_n(x) = f(x)$

Equivalently: for every $x \in X$ and every $\epsilon > 0$ , there exists $N = N(x, \epsilon)$ such that $|f_n(x) - f(x)| < \epsilon$ for all $n \geq N$ .

The critical detail: $N$ may depend on $x$ . Different points may require different numbers of terms to get close to the limit.

Example

Pointwise but not uniform convergence

Let $f_n(x) = x^n$ on $[0, 1]$ . For each $x \in [0, 1)$ , $f_n(x) \to 0$ . At $x = 1$ , $f_n(1) = 1$ for all $n$ . So the pointwise limit is:

$f(x) = \begin{cases} 0 & \text{if } x \in [0, 1) \\ 1 & \text{if } x = 1 \end{cases}$

Each $f_n$ is continuous, but the pointwise limit $f$ is discontinuous. This shows that pointwise convergence does not preserve continuity.

Uniform Convergence

Definition

Uniform Convergence

A sequence $f_n: X \to \mathbb{R}$ converges uniformly to $f: X \to \mathbb{R}$ if:

$\sup_{x \in X} |f_n(x) - f(x)| \to 0 \text{ as } n \to \infty$

Equivalently: for every $\epsilon > 0$ , there exists $N = N(\epsilon)$ (independent of $x$ ) such that $|f_n(x) - f(x)| < \epsilon$ for all $n \geq N$ and all $x \in X$ .

The difference from pointwise convergence: in uniform convergence, a single $N$ works for all $x$ simultaneously. This "uniformity over $x$ " is what gives the concept its power.

Main Theorems

Theorem

Uniform Limit of Continuous Functions is Continuous

Statement

If $f_n: X \to \mathbb{R}$ is continuous for each $n$ and $f_n \to f$ uniformly on $X$ , then $f$ is continuous on $X$ .

Intuition

Uniform convergence means the entire graph of $f_n$ is within an $\epsilon$ -tube around $f$ for large $n$ . Since $f_n$ is continuous (no jumps) and is close to $f$ everywhere simultaneously, $f$ cannot have jumps either.

Proof Sketch

$|f(x) - f(x_0)| \leq |f(x) - f_N(x)| + |f_N(x) - f_N(x_0)| + |f_N(x_0) - f(x_0)| < \epsilon$

Why It Matters

This theorem explains why uniform convergence is the right notion for learning theory. When empirical risk converges uniformly to population risk, the "landscape" of risk values is preserved: if a hypothesis has low empirical risk, it must have low population risk. Pointwise convergence would not give this guarantee.

Failure Mode

The theorem fails for pointwise convergence. The example $f_n(x) = x^n$ on $[0, 1]$ gives a discontinuous limit from continuous functions. In learning theory terms: if empirical risk converges to population risk only pointwise (for each fixed hypothesis), the ERM hypothesis could still have high population risk.

report a correction →

The Weierstrass M-Test

Theorem

Weierstrass M-Test

Statement

Let $g_k: X \to \mathbb{R}$ satisfy $|g_k(x)| \leq M_k$ for all $x \in X$ , where $\sum_{k=1}^{\infty} M_k < \infty$ . Then the series $\sum_{k=1}^{\infty} g_k(x)$ converges uniformly and absolutely on $X$ .

Intuition

If you can bound each term by a constant that does not depend on $x$ , and these constants form a convergent series, then the function series converges uniformly. The domination by the constant series $M_k$ controls the "worst case" over all $x$ simultaneously.

Proof Sketch

For any $x$ , $|\sum_{k=n}^{m} g_k(x)| \leq \sum_{k=n}^{m} M_k$ . Since $\sum M_k$ converges, its tail $\sum_{k=n}^{\infty} M_k \to 0$ . Therefore $\sup_x |\sum_{k=n}^{m} g_k(x)| \to 0$ as $n, m \to \infty$ , giving uniform Cauchy and hence uniform convergence by completeness of $\mathbb{R}$ .

Why It Matters

The M-test is the standard tool for proving that series of functions (e.g., Fourier series, power series, series expansions of kernels) converge uniformly. In ML, it appears when establishing that certain function approximations converge uniformly, which is needed for generalization guarantees.

Failure Mode

The M-test is sufficient but not necessary. A series may converge uniformly even when no dominating summable sequence $M_k$ exists. The M-test also requires pointwise bounds that are independent of $x$ ; if the bounds grow with $x$ (e.g., on an unbounded domain), the test does not apply directly.

report a correction →

Connection to Learning Theory

In statistical learning theory, define:

$R(h) = \mathbb{E}[\ell(h(x), y)]$ (population risk)
$\hat{R}_n(h) = \frac{1}{n} \sum_{i=1}^n \ell(h(x_i), y_i)$ (empirical risk)

The function $h \mapsto \hat{R}_n(h)$ is a random function that approximates $h \mapsto R(h)$ .

Pointwise convergence: for each fixed $h$ , $\hat{R}_n(h) \to R(h)$ by the law of large numbers. This is not useful for ERM because the ERM hypothesis $h_{\text{ERM}}$ depends on the data.

Uniform convergence: $\sup_{h \in \mathcal{H}} |\hat{R}_n(h) - R(h)| \to 0$ . This guarantees that the ERM hypothesis has population risk close to its empirical risk, which is the foundation of generalization bounds.

The entire program of VC theory, Rademacher complexity, and covering numbers is devoted to establishing conditions under which this uniform convergence holds.

Common Confusions

Watch Out

Pointwise convergence is not useless, just insufficient for ERM

The law of large numbers gives pointwise convergence of empirical risk to population risk for free. The hard work in learning theory is upgrading this to uniform convergence over the hypothesis class. The gap between pointwise and uniform is exactly the gap between "each hypothesis generalizes" and "the selected hypothesis generalizes."

Watch Out

Uniform convergence is not about rate, it is about uniformity

A sequence can converge uniformly at a slow rate or pointwise at a fast rate. The distinction is not about speed but about whether a single $N$ works for all $x$ . In learning theory, the rate matters too (it determines sample complexity), but the uniformity is the conceptual breakthrough.

Summary

Pointwise: for each $x$ , $f_n(x) \to f(x)$ . The convergence speed may vary with $x$
Uniform: $\sup_x |f_n(x) - f(x)| \to 0$ . One $N$ works for all $x$
Uniform convergence preserves continuity; pointwise does not
The Weierstrass M-test: if $|g_k(x)| \leq M_k$ and $\sum M_k < \infty$ , then $\sum g_k$ converges uniformly
Uniform convergence of empirical risk to population risk is the foundation of learning theory

Exercises

ExerciseCore

Problem

Let $f_n(x) = \frac{nx}{1 + n^2 x^2}$ on $[0, 1]$ . Show that $f_n \to 0$ pointwise. Does $f_n \to 0$ uniformly?

ExerciseAdvanced

Problem

Explain why the law of large numbers alone is insufficient to prove that ERM generalizes. What additional property of the hypothesis class is needed, and how does it relate to uniform convergence?

Related Comparisons

Pointwise vs. Uniform Convergence

References

Canonical:

Rudin, Principles of Mathematical Analysis, Chapter 7
Shalev-Shwartz and Ben-David, Understanding Machine Learning, Chapter 4 (uniform convergence in learning)

Current:

Wainwright, High-Dimensional Statistics, Chapter 4 (uniform laws of large numbers)
Munkres, Topology (2000), Chapter 1 (set theory review)

Next Topics

Empirical risk minimization: where uniform convergence meets learning theory
Uniform convergence: the formal learning-theoretic framework

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Metric Spaces, Convergence, and Completenesslayer 0A · tier 1

Derived topics

2

Empirical Risk Minimizationlayer 2 · tier 1
Uniform Convergencelayer 2 · tier 1

Graph-backed continuations

Empirical Risk Minimization Uniform Convergence