Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Continuity in R^n

Epsilon-delta continuity, uniform continuity, and Lipschitz continuity in Euclidean space. Lipschitz constants control how fast function values change and appear throughout optimization and generalization theory.

CoreTier 1Stable~35 min

Why This Matters

Continuity is the minimal regularity condition on functions. Without it, optimization is hopeless: you cannot guarantee that gradient descent approaches a minimum, that a loss function attains its infimum on a compact set, or that small changes in parameters produce small changes in predictions.

Lipschitz continuity is the variant that appears most in ML theory. Generalization bounds for neural networks often depend on the Lipschitz constant of the network. Wasserstein GANs enforce a Lipschitz constraint explicitly.

Core Definitions

Definition

Continuity at a Point

A function f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m (where Rn\mathbb{R}^n carries the standard metric) is continuous at aRna \in \mathbb{R}^n if for every ϵ>0\epsilon > 0, there exists δ>0\delta > 0 such that:

xa<δ    f(x)f(a)<ϵ\|x - a\| < \delta \implies \|f(x) - f(a)\| < \epsilon

The choice of δ\delta can depend on both ϵ\epsilon and the point aa.

Definition

Uniform Continuity

f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m is uniformly continuous on a set SS if for every ϵ>0\epsilon > 0, there exists δ>0\delta > 0 such that for all x,ySx, y \in S:

xy<δ    f(x)f(y)<ϵ\|x - y\| < \delta \implies \|f(x) - f(y)\| < \epsilon

The δ\delta depends only on ϵ\epsilon, not on the specific points x,yx, y.

Definition

Lipschitz Continuity

f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m is LL-Lipschitz on a set SS if there exists L0L \geq 0 such that for all x,ySx, y \in S:

f(x)f(y)Lxy\|f(x) - f(y)\| \leq L \|x - y\|

The smallest such LL is the Lipschitz constant of ff on SS.

The hierarchy is strict: Lipschitz implies uniformly continuous implies continuous. The converses fail. The function f(x)=xf(x) = \sqrt{x} on [0,1][0,1] is uniformly continuous but not Lipschitz (its derivative blows up at 0). The function f(x)=x2f(x) = x^2 on R\mathbb{R} is continuous but not uniformly continuous.

Composition and Algebraic Properties

Continuous functions compose: if ff is continuous at aa and gg is continuous at f(a)f(a), then gfg \circ f is continuous at aa. Sums, products, and quotients (where the denominator is nonzero) of continuous functions are continuous.

For Lipschitz functions, the composition rule is quantitative. If ff is LfL_f-Lipschitz and gg is LgL_g-Lipschitz, then gfg \circ f is LfLgL_f L_g-Lipschitz. This multiplicative blowup is why deep networks can have large Lipschitz constants: each layer multiplies.

Main Theorems

Theorem

Extreme Value Theorem

Statement

If f:SRf: S \to \mathbb{R} is continuous and SRnS \subseteq \mathbb{R}^n is compact and nonempty, then ff attains its maximum and minimum on SS. That is, there exist xmin,xmaxSx_{\min}, x_{\max} \in S such that:

f(xmin)f(x)f(xmax)for all xSf(x_{\min}) \leq f(x) \leq f(x_{\max}) \quad \text{for all } x \in S

Intuition

A continuous function on a closed and bounded set cannot "escape to infinity" or "approach but never reach" its supremum. Compactness traps sequences and continuity preserves limits.

Proof Sketch

Since ff is continuous and SS is compact, f(S)f(S) is compact in R\mathbb{R} (continuous image of a compact set is compact). A compact subset of R\mathbb{R} is closed and bounded, so it contains its supremum and infimum.

Why It Matters

This theorem guarantees that optimization problems over compact sets have solutions. When you minimize a continuous loss over a bounded parameter space, a minimizer exists. Without compactness, minimizers may not exist: infx>01/x=0\inf_{x > 0} 1/x = 0 but no x>0x > 0 achieves it.

Failure Mode

Fails without compactness. On the open interval (0,1)(0,1), the function f(x)=1/xf(x) = 1/x is continuous but has no maximum. Fails without continuity: the indicator function 1{0}\mathbf{1}_{\{0\}} on [1,1][-1,1] achieves its max but a discontinuous function in general need not.

Theorem

Heine-Cantor Theorem

Statement

If f:SRmf: S \to \mathbb{R}^m is continuous and SRnS \subseteq \mathbb{R}^n is compact, then ff is uniformly continuous on SS.

Intuition

On a compact set, continuity cannot degrade from point to point. The worst-case δ\delta over all points in SS is still positive because SS has no "escape to infinity" where the modulus of continuity might shrink to zero.

Proof Sketch

Suppose ff is not uniformly continuous. Then there exists ϵ>0\epsilon > 0 and sequences xk,ykx_k, y_k with xkyk0\|x_k - y_k\| \to 0 but f(xk)f(yk)ϵ\|f(x_k) - f(y_k)\| \geq \epsilon. By compactness, extract a convergent subsequence xkjax_{k_j} \to a. Then ykjay_{k_j} \to a as well. By continuity at aa, f(xkj)f(ykj)0\|f(x_{k_j}) - f(y_{k_j})\| \to 0, contradicting ϵ\geq \epsilon.

Why It Matters

This is why bounded parameter spaces simplify analysis. A continuous loss function on a compact parameter set is automatically uniformly continuous, which makes approximation arguments (like discretizing the parameter space) valid.

Failure Mode

Fails on non-compact domains. f(x)=x2f(x) = x^2 is continuous on R\mathbb{R} but not uniformly continuous: for large xx, a small change in xx produces a large change in x2x^2.

Common Confusions

Watch Out

Lipschitz constant depends on the norm

The Lipschitz constant of a function depends on which norm you use. A function that is 1-Lipschitz in the 2\ell_2 norm may have a different Lipschitz constant in the \ell_\infty norm. In ML, the 2\ell_2 norm is the default unless stated otherwise.

Watch Out

Differentiable does not imply Lipschitz

A function can be differentiable everywhere without being Lipschitz. f(x)=x2f(x) = x^2 on R\mathbb{R} is smooth but not Lipschitz because its derivative is unbounded. Lipschitz on Rn\mathbb{R}^n requires that f\|\nabla f\| be bounded.

Exercises

ExerciseCore

Problem

Prove that f(x)=x2f(x) = \|x\|_2 is 1-Lipschitz on Rn\mathbb{R}^n.

ExerciseAdvanced

Problem

Let f:RnRf: \mathbb{R}^n \to \mathbb{R} be differentiable with f(x)L\|\nabla f(x)\| \leq L for all xx. Prove that ff is LL-Lipschitz.

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapters 4 and 7
  • Apostol, Mathematical Analysis (1974), Chapter 4
  • Folland, Real Analysis (1999), Chapter 4 (continuity and topology)

Current:

  • Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Section 26.1 (Lipschitz conditions in generalization)
  • Vershynin, High-Dimensional Probability (2018), Section 5.2.2 (Lipschitz functions and concentration)
  • Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 5.1 (continuity in the context of differentiation)

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics