Foundations
Zermelo-Fraenkel Set Theory
The ZFC axioms form the standard foundation for mathematics. Extensionality, pairing, union, power set, infinity, separation, replacement, choice, and foundation prevent paradoxes while being expressive enough for all of modern mathematics.
Why This Matters
Almost all of mathematics is built inside ZFC. When you use measure theory to define probability distributions, when you invoke the axiom of choice to prove the Hahn-Banach theorem in functional analysis, when you construct the real numbers as Dedekind cuts: you are working in ZFC.
For ML theory, the connection is indirect but load-bearing. Probability theory requires measure theory. Measure theory requires the real numbers. The real numbers require the axiom of infinity and the power set axiom. Functional analysis (used in kernel methods, RKHS, optimization) requires the axiom of choice. If you want to understand why these tools work, you need to know what axioms they rest on.
The Axioms
Axiom of Extensionality
Two sets are equal if and only if they have the same elements:
A set is determined entirely by its elements, not by how it is described.
Axiom of Empty Set
There exists a set with no elements:
Axiom of Pairing
For any two sets and , there exists a set whose only elements are and .
Axiom of Union
For any set , there exists a set whose elements are exactly the elements of elements of :
Axiom of Power Set
For any set , there exists the power set .
Axiom of Infinity
There exists an infinite set. Specifically, there exists a set such that and for every , the successor . This set contains , which serves as the construction of .
Axiom Schema of Separation (Comprehension)
For any set and any property expressible in the language of set theory, there exists the set . You can form subsets by filtering with a property, but only subsets of an already-existing set.
Axiom Schema of Replacement
If is a definable function (in the language of set theory) and is a set, then is a set. The image of a set under a definable function is a set.
Axiom of Foundation (Regularity)
Every nonempty set contains an element such that . This prevents circular membership chains like and rules out a set being a member of itself.
Axiom of Choice (AC)
For any collection of nonempty sets, there exists a function that selects one element from each set. Formally: if is a family of nonempty sets, there exists a function with for all .
Why These Axioms
Russell's paradox (1901) showed that naive set theory is inconsistent. Consider . Then , a contradiction.
ZFC avoids this by restricting set formation. Separation only forms subsets of existing sets, so you cannot form "the set of all sets" or "the set of all sets not containing themselves." You can only filter elements from a set you already have.
Main Theorems
Equivalence of AC, Zorn's Lemma, and Well-Ordering
Statement
The following are equivalent over ZF:
- Axiom of Choice: every family of nonempty sets has a choice function.
- Zorn's Lemma: every partially ordered set in which every chain has an upper bound contains a maximal element.
- Well-Ordering Theorem: every set can be well-ordered (given a total order where every nonempty subset has a least element).
Intuition
All three say the same thing in different mathematical languages. Choice says you can pick elements. Zorn says you can find maximal things. Well-ordering says you can line everything up. Each is useful in different contexts: choice in analysis, Zorn in algebra (proving every vector space has a basis), well-ordering in set theory.
Proof Sketch
AC Well-Ordering: use a choice function to build a well-ordering by transfinite recursion, selecting the "next" element at each step. Well-Ordering Zorn: given a well-ordering, any chain has an upper bound, and the maximal element argument goes through. Zorn AC: consider the partially ordered set of partial choice functions (ordered by extension). Every chain has an upper bound (the union). By Zorn, there is a maximal partial choice function, and maximality forces it to be total.
Why It Matters
The axiom of choice is used throughout mathematics, often without explicit mention. Every proof that "every vector space has a basis," every application of the Hahn-Banach theorem, and every invocation of Tychonoff's theorem depends on AC. In ML, the existence of measurable selection functions and the completeness of certain function spaces both rely on AC.
Failure Mode
AC has non-constructive consequences. It implies the existence of non-measurable sets (Vitali sets), which is why not every subset of is Lebesgue measurable. It also implies the Banach-Tarski paradox: a solid ball in can be decomposed into finitely many pieces and reassembled into two balls of the same size. These consequences are mathematically valid but physically nonsensical.
What ZFC Does for ML
| ML concept | ZFC dependency |
|---|---|
| Real numbers | Axiom of infinity + power set (Dedekind cuts or Cauchy sequences) |
| Probability measures | Separation (sigma-algebras), power set (Borel sets) |
| Existence of conditional expectations | Axiom of choice (Radon-Nikodym theorem) |
| Every vector space has a basis | Axiom of choice (Zorn's lemma) |
| RKHS completeness | Axiom of choice (completeness of spaces) |
You do not need to think about ZFC when training a neural network. But if you want to prove that your loss function has a minimizer, or that a certain function space is complete, or that conditional expectations exist, you are relying on these axioms.
Common Confusions
Separation is not unrestricted comprehension
Naive set theory allows for any property . ZFC only allows for an existing set . This restriction is what prevents Russell's paradox. You cannot form the set of all sets; you can only form subsets of sets you already have.
The axiom of choice is not obviously true or obviously false
AC is independent of ZF: you can consistently add either AC or its negation to ZF. Most working mathematicians accept AC because it yields clean theorems (every vector space has a basis, products of compact spaces are compact). But its non-constructive nature means it asserts existence without providing an explicit construction.
Exercises
Problem
Explain why Russell's paradox does not arise in ZFC. Which axiom prevents it?
Problem
Prove, using the axioms of ZFC, that the set of natural numbers exists. Which axioms do you need?
Problem
Give an example of a mathematical result commonly used in ML that depends on the axiom of choice. State exactly where AC enters the proof.
References
Canonical:
- Enderton, Elements of Set Theory (1977), Chapters 1-7
- Kunen, Set Theory: An Introduction to Independence Proofs (1980), Chapter 1
Accessible:
-
Halmos, Naive Set Theory (1960), full book
-
Munkres, Topology (2000), Chapter 1 (set theory review)
Next Topics
- Cantor's theorem and uncountability: the first deep consequence of the power set axiom
- Measure-theoretic probability: where these axioms become load-bearing for ML
Last reviewed: April 2026
Builds on This
- Foundational DependenciesLayer 0A