Foundations
Continuity in R^n
Epsilon-delta continuity, uniform continuity, and Lipschitz continuity in Euclidean space. Lipschitz constants control how fast function values change and appear throughout optimization and generalization theory.
Prerequisites
Why This Matters
Continuity is the minimal regularity condition on functions. Without it, optimization is hopeless: you cannot guarantee that gradient descent approaches a minimum, that a loss function attains its infimum on a compact set, or that small changes in parameters produce small changes in predictions.
Lipschitz continuity is the variant that appears most in ML theory. Generalization bounds for neural networks often depend on the Lipschitz constant of the network. Wasserstein GANs enforce a Lipschitz constraint explicitly.
Core Definitions
Continuity at a Point
A function (where carries the standard metric) is continuous at if for every , there exists such that:
The choice of can depend on both and the point .
Uniform Continuity
is uniformly continuous on a set if for every , there exists such that for all :
The depends only on , not on the specific points .
Lipschitz Continuity
is -Lipschitz on a set if there exists such that for all :
The smallest such is the Lipschitz constant of on .
The hierarchy is strict: Lipschitz implies uniformly continuous implies continuous. The converses fail. The function on is uniformly continuous but not Lipschitz (its derivative blows up at 0). The function on is continuous but not uniformly continuous.
Composition and Algebraic Properties
Continuous functions compose: if is continuous at and is continuous at , then is continuous at . Sums, products, and quotients (where the denominator is nonzero) of continuous functions are continuous.
For Lipschitz functions, the composition rule is quantitative. If is -Lipschitz and is -Lipschitz, then is -Lipschitz. This multiplicative blowup is why deep networks can have large Lipschitz constants: each layer multiplies.
Main Theorems
Extreme Value Theorem
Statement
If is continuous and is compact and nonempty, then attains its maximum and minimum on . That is, there exist such that:
Intuition
A continuous function on a closed and bounded set cannot "escape to infinity" or "approach but never reach" its supremum. Compactness traps sequences and continuity preserves limits.
Proof Sketch
Since is continuous and is compact, is compact in (continuous image of a compact set is compact). A compact subset of is closed and bounded, so it contains its supremum and infimum.
Why It Matters
This theorem guarantees that optimization problems over compact sets have solutions. When you minimize a continuous loss over a bounded parameter space, a minimizer exists. Without compactness, minimizers may not exist: but no achieves it.
Failure Mode
Fails without compactness. On the open interval , the function is continuous but has no maximum. Fails without continuity: the indicator function on achieves its max but a discontinuous function in general need not.
Heine-Cantor Theorem
Statement
If is continuous and is compact, then is uniformly continuous on .
Intuition
On a compact set, continuity cannot degrade from point to point. The worst-case over all points in is still positive because has no "escape to infinity" where the modulus of continuity might shrink to zero.
Proof Sketch
Suppose is not uniformly continuous. Then there exists and sequences with but . By compactness, extract a convergent subsequence . Then as well. By continuity at , , contradicting .
Why It Matters
This is why bounded parameter spaces simplify analysis. A continuous loss function on a compact parameter set is automatically uniformly continuous, which makes approximation arguments (like discretizing the parameter space) valid.
Failure Mode
Fails on non-compact domains. is continuous on but not uniformly continuous: for large , a small change in produces a large change in .
Common Confusions
Lipschitz constant depends on the norm
The Lipschitz constant of a function depends on which norm you use. A function that is 1-Lipschitz in the norm may have a different Lipschitz constant in the norm. In ML, the norm is the default unless stated otherwise.
Differentiable does not imply Lipschitz
A function can be differentiable everywhere without being Lipschitz. on is smooth but not Lipschitz because its derivative is unbounded. Lipschitz on requires that be bounded.
Exercises
Problem
Prove that is 1-Lipschitz on .
Problem
Let be differentiable with for all . Prove that is -Lipschitz.
References
Canonical:
- Rudin, Principles of Mathematical Analysis (1976), Chapters 4 and 7
- Apostol, Mathematical Analysis (1974), Chapter 4
- Folland, Real Analysis (1999), Chapter 4 (continuity and topology)
Current:
- Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Section 26.1 (Lipschitz conditions in generalization)
- Vershynin, High-Dimensional Probability (2018), Section 5.2.2 (Lipschitz functions and concentration)
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 5.1 (continuity in the context of differentiation)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.