Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Metric Spaces, Convergence, and Completeness

Metric space axioms, convergence of sequences, Cauchy sequences, completeness, and the Banach fixed-point theorem.

CoreTier 1Stable~55 min

Why This Matters

Convergence guarantees for optimization algorithms (gradient descent, EM, iterative methods) require a precise notion of distance and completeness. The Banach fixed-point theorem gives convergence of contraction mappings, which appears in value iteration (reinforcement learning), iterative solvers, and fixed-point equations throughout ML.

Core Definitions

Definition

Metric Space

A metric space is a set XX with a function d:X×X[0,)d: X \times X \to [0, \infty) satisfying:

  1. Identity of indiscernibles: d(x,y)=0    x=yd(x, y) = 0 \iff x = y
  2. Symmetry: d(x,y)=d(y,x)d(x, y) = d(y, x)
  3. Triangle inequality: d(x,z)d(x,y)+d(y,z)d(x, z) \leq d(x, y) + d(y, z)

Key Examples

Euclidean space Rn\mathbb{R}^n with d(x,y)=xy2=i=1n(xiyi)2d(x, y) = \|x - y\|_2 = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. More generally, any norm \|\cdot\| on Rn\mathbb{R}^n induces a metric d(x,y)=xyd(x,y) = \|x - y\|. The p\ell^p norms (1p1 \leq p \leq \infty) all give valid metrics.

Discrete metric. On any set XX, define d(x,y)=1d(x, y) = 1 if xyx \neq y and d(x,x)=0d(x, x) = 0. Every subset is open, every sequence that converges is eventually constant, and the space is always complete. This is useful as a degenerate case to test whether a theorem's hypotheses are too weak.

Function spaces. C([a,b])C([a, b]), the continuous functions on [a,b][a, b], with the supremum metric d(f,g)=supt[a,b]f(t)g(t)d(f, g) = \sup_{t \in [a,b]} |f(t) - g(t)|. This is a complete metric space (a Banach space). Convergence in this metric is uniform convergence, which preserves continuity. The weaker pointwise convergence does not come from a metric on C([a,b])C([a,b]) and does not preserve continuity.

p\ell^p sequence spaces. Sequences (x1,x2,)(x_1, x_2, \ldots) with xip<\sum |x_i|^p < \infty, using d(x,y)=(xiyip)1/pd(x, y) = (\sum |x_i - y_i|^p)^{1/p}. Complete for 1p1 \leq p \leq \infty. These spaces appear in functional analysis and approximation theory.

Definition

Open and Closed Sets

A set UXU \subseteq X is open if for every xUx \in U there exists ϵ>0\epsilon > 0 such that the ball B(x,ϵ)={yX:d(x,y)<ϵ}B(x, \epsilon) = \{y \in X : d(x, y) < \epsilon\} is contained in UU. A set is closed if its complement is open, equivalently if it contains all its limit points.

Convergence and Cauchy Sequences

Definition

Convergence of Sequences

A sequence (xn)(x_n) in (X,d)(X, d) converges to xXx \in X if for every ϵ>0\epsilon > 0 there exists NN such that d(xn,x)<ϵd(x_n, x) < \epsilon for all nNn \geq N. The limit is unique in a metric space.

Uniqueness follows from the triangle inequality: if xnxx_n \to x and xnyx_n \to y, then 0d(x,y)d(x,xn)+d(xn,y)00 \leq d(x, y) \leq d(x, x_n) + d(x_n, y) \to 0, so d(x,y)=0d(x, y) = 0 and x=yx = y.

Definition

Cauchy Sequence

A sequence (xn)(x_n) is Cauchy if for every ϵ>0\epsilon > 0 there exists NN such that d(xm,xn)<ϵd(x_m, x_n) < \epsilon for all m,nNm, n \geq N. Every convergent sequence is Cauchy, but the converse requires completeness.

The Cauchy condition is an intrinsic property: it refers only to distances between terms of the sequence, not to a candidate limit. This is critical because in many spaces (e.g., incomplete spaces, or when proving existence of a limit), the limit is unknown.

Every convergent sequence is Cauchy. If xnxx_n \to x, then d(xm,xn)d(xm,x)+d(x,xn)<ϵ/2+ϵ/2=ϵd(x_m, x_n) \leq d(x_m, x) + d(x, x_n) < \epsilon/2 + \epsilon/2 = \epsilon for m,nm, n large enough.

The converse fails without completeness. In Q\mathbb{Q} with the usual metric, the sequence of rational approximations to 2\sqrt{2} (e.g., via Newton's method) is Cauchy but does not converge in Q\mathbb{Q}.

Completeness

Definition

Complete Metric Space

A metric space (X,d)(X, d) is complete if every Cauchy sequence in XX converges to a point in XX.

Completeness is a property of the space, not of individual sequences. It guarantees that "sequences that should converge" actually do converge within the space.

Completeness is preserved by closure. Any closed subset of a complete metric space is itself complete. Conversely, a complete subspace of a metric space is closed.

Completion. Every metric space can be embedded isometrically into a complete metric space (its completion). R\mathbb{R} is the completion of Q\mathbb{Q}. This is analogous to how the Lebesgue integral completes the Riemann integral.

Compactness in Metric Spaces

Definition

Compact Metric Space

A metric space is compact if every open cover has a finite subcover. In metric spaces, this is equivalent to: every sequence has a convergent subsequence (sequential compactness).

In Rn\mathbb{R}^n, the Heine-Borel theorem characterizes compact sets as those that are closed and bounded. In infinite-dimensional spaces, closed and bounded does not imply compact. The closed unit ball in 2\ell^2 is closed and bounded but not compact.

Why compactness matters for ML. Compactness guarantees that continuous functions attain their extrema (extreme value theorem). Many existence proofs in optimization require compactness of the feasible set. If you are minimizing a continuous loss over a compact parameter space, a minimizer exists. Without compactness, you need additional arguments (coercivity, lower semicontinuity) to guarantee existence.

Contraction Mappings

Definition

Contraction Mapping

A function T:XXT: X \to X is a contraction if there exists γ[0,1)\gamma \in [0, 1) such that d(T(x),T(y))γd(x,y)d(T(x), T(y)) \leq \gamma \, d(x, y) for all x,yXx, y \in X. The constant γ\gamma is the contraction factor.

A contraction is automatically uniformly continuous (and hence continuous). The contraction factor γ\gamma controls the convergence rate: the error after nn iterations decays as γn\gamma^n, giving geometric (linear) convergence.

Main Theorems

Theorem

Banach Fixed-Point Theorem (Contraction Mapping Theorem)

Statement

TT has a unique fixed point xXx^* \in X (i.e., T(x)=xT(x^*) = x^*). For any starting point x0Xx_0 \in X, the iterates xn+1=T(xn)x_{n+1} = T(x_n) converge to xx^*, with error bound:

d(xn,x)γn1γd(x1,x0)d(x_n, x^*) \leq \frac{\gamma^n}{1 - \gamma} d(x_1, x_0)

Intuition

A contraction shrinks distances. Iterating it produces a Cauchy sequence (consecutive terms get geometrically closer). Completeness guarantees this sequence converges. Uniqueness follows because two fixed points would have to be distance zero apart.

Proof Sketch

For m>nm > n: d(xn,xm)k=nm1d(xk,xk+1)k=nm1γkd(x0,x1)γn1γd(x0,x1)d(x_n, x_m) \leq \sum_{k=n}^{m-1} d(x_k, x_{k+1}) \leq \sum_{k=n}^{m-1} \gamma^k d(x_0, x_1) \leq \frac{\gamma^n}{1-\gamma} d(x_0, x_1). This shows (xn)(x_n) is Cauchy. By completeness, xnxx_n \to x^*. Continuity of TT gives T(x)=limT(xn)=limxn+1=xT(x^*) = \lim T(x_n) = \lim x_{n+1} = x^*. For uniqueness: if T(y)=yT(y) = y, then d(x,y)=d(T(x),T(y))γd(x,y)d(x^*, y) = d(T(x^*), T(y)) \leq \gamma d(x^*, y), so d(x,y)=0d(x^*, y) = 0.

Why It Matters

This theorem gives both existence and a constructive algorithm (iterate TT) with a convergence rate. In RL, the Bellman operator is a contraction in the sup-norm with factor γ\gamma (the discount factor), so value iteration converges. In optimization, proximal operators are often contractions.

Failure Mode

Fails if γ=1\gamma = 1 (the map is merely non-expansive, not a contraction). Example: T(x)=x+1T(x) = x + 1 on R\mathbb{R} preserves distances but has no fixed point. Fails if the space is not complete: T(x)=x/2T(x) = x/2 on (0,1)(0, 1) is a contraction but the fixed point 00 is not in the space.

Standard Metrics

SpaceMetricComplete?
Rn\mathbb{R}^nd(x,y)=xy2d(x,y) = \|x - y\|_2Yes
C([0,1])C([0,1]) (continuous functions)$d(f,g) = \sup_tf(t) - g(t)
2\ell^2 (square-summable sequences)d(x,y)=(xiyi)2d(x,y) = \sqrt{\sum (x_i - y_i)^2}Yes
Q\mathbb{Q} (rationals)$d(x,y) =x - y

Common Confusions

Watch Out

Cauchy does not imply convergent without completeness

The sequence xn=1/nx_n = 1/n in (0,1)(0, 1) is Cauchy but does not converge in (0,1)(0, 1) because the limit 00 is not in the space. Completeness is exactly the property that rules out this pathology.

Watch Out

Contraction factor must be strictly less than 1

The map T(x)=x/(1+x)T(x) = x/(1 + x) on [0,)[0, \infty) satisfies d(T(x),T(y))<d(x,y)d(T(x), T(y)) < d(x, y) for xyx \neq y but is not a contraction (no uniform γ<1\gamma < 1). It still has a fixed point (x=0x = 0), but the Banach theorem does not apply, and convergence may be arbitrarily slow.

Watch Out

Closed and bounded does not imply compact in infinite dimensions

In Rn\mathbb{R}^n, closed plus bounded equals compact (Heine-Borel). In infinite-dimensional spaces this fails. The closed unit ball in 2\ell^2 is closed and bounded but not compact: the sequence e1,e2,e3,e_1, e_2, e_3, \ldots of standard basis vectors has no convergent subsequence (all pairwise distances are 2\sqrt{2}). Compactness in infinite dimensions requires additional conditions such as total boundedness.

Exercises

ExerciseCore

Problem

Show that d(x,y)=xyd(x, y) = |x - y| is a metric on R\mathbb{R}. Which of the three axioms is the hardest to verify?

ExerciseAdvanced

Problem

Let T:RRT: \mathbb{R} \to \mathbb{R} be defined by T(x)=cos(x)T(x) = \cos(x). Show that TT is a contraction on R\mathbb{R} and find the contraction factor. What is the fixed point?

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapters 2-3
  • Kreyszig, Introductory Functional Analysis with Applications (1989), Chapters 1-2
  • Munkres, Topology (2000), Chapter 2 (metric spaces, compactness, completeness)

Supplementary:

  • Sutherland, Introduction to Metric and Topological Spaces (2009), Chapters 3-7
  • Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapter 3 (metric spaces in economics and optimization)

For ML context:

  • Bertsekas, Dynamic Programming and Optimal Control (2012), Chapter 1 (contraction mappings in RL)
  • Puterman, Markov Decision Processes (2005), Chapter 6.2 (contraction operators for value iteration)

Last reviewed: April 2026

Builds on This

Next Topics