Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Zermelo-Fraenkel Set Theory

The ZFC axioms form the standard foundation for mathematics. Extensionality, pairing, union, power set, infinity, separation, replacement, choice, and foundation prevent paradoxes while being expressive enough for all of modern mathematics.

CoreTier 2Stable~50 min
0

Why This Matters

Almost all of mathematics is built inside ZFC. When you use measure theory to define probability distributions, when you invoke the axiom of choice to prove the Hahn-Banach theorem in functional analysis, when you construct the real numbers as Dedekind cuts: you are working in ZFC.

For ML theory, the connection is indirect but load-bearing. Probability theory requires measure theory. Measure theory requires the real numbers. The real numbers require the axiom of infinity and the power set axiom. Functional analysis (used in kernel methods, RKHS, optimization) requires the axiom of choice. If you want to understand why these tools work, you need to know what axioms they rest on.

The Axioms

Definition

Axiom of Extensionality

Two sets are equal if and only if they have the same elements:

AB[x(xA    xB)    A=B]\forall A \, \forall B \, [\forall x \, (x \in A \iff x \in B) \implies A = B]

A set is determined entirely by its elements, not by how it is described.

Definition

Axiom of Empty Set

There exists a set with no elements:

x(x)\exists \emptyset \, \forall x \, (x \notin \emptyset)

Definition

Axiom of Pairing

For any two sets aa and bb, there exists a set {a,b}\{a, b\} whose only elements are aa and bb.

Definition

Axiom of Union

For any set AA, there exists a set A\bigcup A whose elements are exactly the elements of elements of AA:

xA    B(BAxB)x \in \bigcup A \iff \exists B \, (B \in A \land x \in B)

Definition

Axiom of Power Set

For any set AA, there exists the power set P(A)={B:BA}\mathcal{P}(A) = \{B : B \subseteq A\}.

Definition

Axiom of Infinity

There exists an infinite set. Specifically, there exists a set II such that I\emptyset \in I and for every xIx \in I, the successor x{x}Ix \cup \{x\} \in I. This set contains ,{},{,{}},\emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\}, \ldots, which serves as the construction of N\mathbb{N}.

Definition

Axiom Schema of Separation (Comprehension)

For any set AA and any property ϕ(x)\phi(x) expressible in the language of set theory, there exists the set {xA:ϕ(x)}\{x \in A : \phi(x)\}. You can form subsets by filtering with a property, but only subsets of an already-existing set.

Definition

Axiom Schema of Replacement

If FF is a definable function (in the language of set theory) and AA is a set, then {F(x):xA}\{F(x) : x \in A\} is a set. The image of a set under a definable function is a set.

Definition

Axiom of Foundation (Regularity)

Every nonempty set AA contains an element xx such that Ax=A \cap x = \emptyset. This prevents circular membership chains like abaa \in b \in a and rules out a set being a member of itself.

Definition

Axiom of Choice (AC)

For any collection of nonempty sets, there exists a function that selects one element from each set. Formally: if {Ai}iI\{A_i\}_{i \in I} is a family of nonempty sets, there exists a function f:IiIAif: I \to \bigcup_{i \in I} A_i with f(i)Aif(i) \in A_i for all ii.

Why These Axioms

Russell's paradox (1901) showed that naive set theory is inconsistent. Consider R={x:xx}R = \{x : x \notin x\}. Then RR    RRR \in R \iff R \notin R, a contradiction.

ZFC avoids this by restricting set formation. Separation only forms subsets of existing sets, so you cannot form "the set of all sets" or "the set of all sets not containing themselves." You can only filter elements from a set you already have.

Main Theorems

Theorem

Equivalence of AC, Zorn's Lemma, and Well-Ordering

Statement

The following are equivalent over ZF:

  1. Axiom of Choice: every family of nonempty sets has a choice function.
  2. Zorn's Lemma: every partially ordered set in which every chain has an upper bound contains a maximal element.
  3. Well-Ordering Theorem: every set can be well-ordered (given a total order where every nonempty subset has a least element).

Intuition

All three say the same thing in different mathematical languages. Choice says you can pick elements. Zorn says you can find maximal things. Well-ordering says you can line everything up. Each is useful in different contexts: choice in analysis, Zorn in algebra (proving every vector space has a basis), well-ordering in set theory.

Proof Sketch

AC     \implies Well-Ordering: use a choice function to build a well-ordering by transfinite recursion, selecting the "next" element at each step. Well-Ordering     \implies Zorn: given a well-ordering, any chain has an upper bound, and the maximal element argument goes through. Zorn     \implies AC: consider the partially ordered set of partial choice functions (ordered by extension). Every chain has an upper bound (the union). By Zorn, there is a maximal partial choice function, and maximality forces it to be total.

Why It Matters

The axiom of choice is used throughout mathematics, often without explicit mention. Every proof that "every vector space has a basis," every application of the Hahn-Banach theorem, and every invocation of Tychonoff's theorem depends on AC. In ML, the existence of measurable selection functions and the completeness of certain function spaces both rely on AC.

Failure Mode

AC has non-constructive consequences. It implies the existence of non-measurable sets (Vitali sets), which is why not every subset of R\mathbb{R} is Lebesgue measurable. It also implies the Banach-Tarski paradox: a solid ball in R3\mathbb{R}^3 can be decomposed into finitely many pieces and reassembled into two balls of the same size. These consequences are mathematically valid but physically nonsensical.

What ZFC Does for ML

ML conceptZFC dependency
Real numbersAxiom of infinity + power set (Dedekind cuts or Cauchy sequences)
Probability measuresSeparation (sigma-algebras), power set (Borel sets)
Existence of conditional expectationsAxiom of choice (Radon-Nikodym theorem)
Every vector space has a basisAxiom of choice (Zorn's lemma)
RKHS completenessAxiom of choice (completeness of L2L^2 spaces)

You do not need to think about ZFC when training a neural network. But if you want to prove that your loss function has a minimizer, or that a certain function space is complete, or that conditional expectations exist, you are relying on these axioms.

Common Confusions

Watch Out

Separation is not unrestricted comprehension

Naive set theory allows {x:ϕ(x)}\{x : \phi(x)\} for any property ϕ\phi. ZFC only allows {xA:ϕ(x)}\{x \in A : \phi(x)\} for an existing set AA. This restriction is what prevents Russell's paradox. You cannot form the set of all sets; you can only form subsets of sets you already have.

Watch Out

The axiom of choice is not obviously true or obviously false

AC is independent of ZF: you can consistently add either AC or its negation to ZF. Most working mathematicians accept AC because it yields clean theorems (every vector space has a basis, products of compact spaces are compact). But its non-constructive nature means it asserts existence without providing an explicit construction.

Exercises

ExerciseCore

Problem

Explain why Russell's paradox does not arise in ZFC. Which axiom prevents it?

ExerciseAdvanced

Problem

Prove, using the axioms of ZFC, that the set of natural numbers N={0,1,2,}\mathbb{N} = \{0, 1, 2, \ldots\} exists. Which axioms do you need?

ExerciseAdvanced

Problem

Give an example of a mathematical result commonly used in ML that depends on the axiom of choice. State exactly where AC enters the proof.

References

Canonical:

  • Enderton, Elements of Set Theory (1977), Chapters 1-7
  • Kunen, Set Theory: An Introduction to Independence Proofs (1980), Chapter 1

Accessible:

  • Halmos, Naive Set Theory (1960), full book

  • Munkres, Topology (2000), Chapter 1 (set theory review)

Next Topics

Last reviewed: April 2026

Builds on This

Next Topics