Foundations
Sets, Functions, and Relations
The language underneath all of mathematics: sets, Cartesian products, functions, injectivity, surjectivity, equivalence relations, and quotient sets.
Prerequisites
Why This Matters
Every mathematical object in ML theory is built on sets and functions. A hypothesis class is a set. A loss function is a function. Feature maps, kernels, probability measures: all defined in terms of sets, functions, and relations. Without precise language for these, no theorem statement is unambiguous. This page assumes familiarity with basic logic and proof techniques.
Core Definitions
Set
A set is a collection of distinct objects. We write to mean is an element of . The empty set is . Standard constructions: union , intersection , complement , difference .
Cartesian Product
The Cartesian product of sets and is the set of all ordered pairs:
More generally, is the set of -tuples. The space is ( times).
Function
A function assigns to each element exactly one element . The set is the domain, is the codomain. The image of is . The preimage of is .
Injective, Surjective, Bijective
A function is:
- Injective (one-to-one):
- Surjective (onto): for every , there exists with
- Bijective: both injective and surjective. A bijection has an inverse .
Equivalence Relation
A relation on a set is an equivalence relation if it is:
- Reflexive: for all
- Symmetric:
- Transitive: and
The equivalence class of is .
Quotient Set
The quotient set is the set of all equivalence classes . The equivalence classes partition into disjoint, exhaustive subsets.
Power Sets and Cardinality
Power Set
The power set of , written or , is the set of all subsets of :
If , then . The power set always includes and itself. Power sets appear in measure theory when defining sigma-algebras: a sigma-algebra on is a subset of closed under complements and countable unions.
Cardinality
Two sets and have the same cardinality (written ) if there exists a bijection . For finite sets, cardinality is the count of elements. For infinite sets, cardinality distinguishes countable sets (bijectable with ) from uncountable ones (like ). See cardinality and countability for a full treatment.
Function Composition and Inverses
Function Composition
Given and , the composition is defined by . Composition is associative: . It is not commutative in general.
Key facts about composition and injectivity/surjectivity:
- If is injective, then is injective (but need not be).
- If is surjective, then is surjective (but need not be).
- If and are both bijective, then is bijective and .
These facts are used constantly in linear algebra when reasoning about compositions of linear maps and matrix products.
Left and Right Inverses
A function has a left inverse if , and a right inverse if . A function has a left inverse if and only if it is injective (assuming the axiom of choice for the converse). A function has a right inverse if and only if it is surjective (requires the axiom of choice).
Relations and Partial Orders
Relation
A relation from to is a subset . We write to mean . A function is a special case: a relation where each appears in exactly one pair.
Partial Order
A relation on a set is a partial order if it is:
- Reflexive: for all
- Antisymmetric: and implies
- Transitive: and implies
A set with a partial order is called a poset (partially ordered set). A total order is a partial order where every pair is comparable: for all , either or .
Partial orders appear throughout ML theory. The subset relation on is a partial order. The divisibility relation on is a partial order. Lattices (posets where every pair has a least upper bound and greatest lower bound) appear in formal concept analysis and information theory.
Canonical Examples
Injective, surjective, and bijective functions
Consider defined by .
- Not injective: .
- Not surjective: has no preimage.
Restrict the domain: defined by . Now is bijective. The inverse is .
This pattern (restricting domain and codomain to make a function bijective) is used when defining the inverse of activation functions and when constructing normalizing flows in generative models.
Equivalence relations in ML
Define a relation on classifiers: if for all in the training set. This is an equivalence relation (reflexive, symmetric, transitive). The equivalence classes partition the hypothesis class into groups of classifiers that agree on training data but may differ on unseen data. The VC dimension counts how many distinct equivalence classes exist when varying the training set.
Cartesian products in feature spaces
If a model takes both an image feature vector and a text feature vector , the combined feature space is . The Cartesian product formalizes feature concatenation. More generally, a dataset of examples with features lives in , the -fold Cartesian product.
Main Theorems
Cantor's Theorem
Statement
There is no surjection from onto its power set . In particular, for every set .
Intuition
No matter how large a set is, its collection of subsets is strictly larger. This is why is uncountable: and Cantor's theorem gives .
Proof Sketch
Suppose is surjective. Define . Since is surjective, for some . Then , a contradiction.
Why It Matters
Cantor's theorem establishes that infinite sets come in different sizes. This underpins the distinction between countable and uncountable sets, which matters when defining probability measures and hypothesis classes.
Failure Mode
The proof is constructive and has no hidden assumptions. It works for all sets, finite or infinite. The only subtlety: it requires the axiom schema of specification from ZFC to form the set .
Schroeder-Bernstein Theorem
Statement
If there exist injections and , then there exists a bijection . Equivalently, and implies .
Intuition
If each set can be embedded into the other, they have the same size. This seems obvious for finite sets but is nontrivial for infinite sets. The proof constructs the bijection explicitly by partitioning into elements that "originate" from (use ) and elements that "originate" from (use ).
Proof Sketch
Define chains by iterating starting from elements of . An element is "A-originated" if it lies in such a chain and "B-originated" otherwise. Define if is A-originated and if is B-originated (here is well-defined on B-originated elements because they are in the image of ). Verify is a bijection.
Why It Matters
This theorem is the standard tool for proving two infinite sets have equal cardinality. Instead of constructing a bijection directly (often hard), you construct two injections (often easier). It is used in the proof that and in arguments about the cardinality of hypothesis classes.
Failure Mode
The theorem does not extend to partial orders in general. Having and does not imply for arbitrary preorders. The result is specific to cardinality comparisons. Note also that the proof does not require the axiom of choice.
Common Confusions
Preimage is not the inverse function
The notation for preimage does not require to be invertible. Every function has preimages of sets. Only bijections have inverse functions.
Surjective onto the codomain, not the image
Every function is surjective onto its image. Surjectivity is a statement about the codomain. defined by is not surjective because has no preimage, even though maps onto .
Equivalence relations are not partial orders
Both equivalence relations and partial orders are reflexive and transitive. The difference is the second axiom: equivalence relations are symmetric (), while partial orders are antisymmetric ( and implies ). These are mutually exclusive except for the identity relation. In probability, "same distribution" is an equivalence relation; "stochastically dominates" is a partial order.
A function is not its graph
Formally, a function is often defined as its graph together with the requirement that each appears in exactly one pair. But in practice, specifying domain and codomain matters. The function given by and the function given by have the same graph but different properties: is surjective, is not.
Exercises
Problem
Let and . Prove that if is injective, then is injective.
Problem
Let be an equivalence relation on and let be the canonical projection . Prove that is surjective. Under what condition is also injective?
Problem
Let . List all elements of . Verify that .
Problem
Define by if and if . Prove that is a bijection, establishing .
Problem
Let be a function. Prove that for any subsets :
- Give an example showing that equality need not hold in (2). Under what condition on does equality hold?
References
Canonical:
- Halmos, Naive Set Theory (1960), Chapters 1-8. The standard concise introduction.
- Enderton, Elements of Set Theory (1977), Chapters 1-3. More detailed than Halmos.
- Munkres, Topology (2000), Chapter 1. Sets, functions, and relations at the level needed for analysis.
For ML context:
- Shalev-Shwartz & Ben-David, Understanding Machine Learning, Appendix A
- Axler, Linear Algebra Done Right (2024), Chapter 1A. Uses set-theoretic language for vector spaces.
- Billingsley, Probability and Measure (1995), Chapter 1. Sets and sigma-algebras for measure theory.
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Basic Logic and Proof TechniquesLayer 0A
Builds on This
- Arrow's Impossibility TheoremLayer 2
- Cardinality and CountabilityLayer 0A
- Category TheoryLayer 0A
- Common Probability DistributionsLayer 0A
- Computability TheoryLayer 0A
- Differentiation in RnLayer 0A
- Formal Languages and AutomataLayer 0A
- Cryptographic Hash FunctionsLayer 2
- Matrix Operations and PropertiesLayer 0A
- Public-Key CryptographyLayer 2
- Relational AlgebraLayer 1