Foundations
Kolmogorov Probability Axioms
The three axioms (non-negativity, normalization, countable additivity) that every probability claim on this site implicitly invokes. Sample space, event sigma-algebra, probability measure, and the immediate consequences.
Prerequisites
Why This Matters
Every probabilistic statement on this site, from a single-parameter Bernoulli likelihood to the convergence guarantee of stochastic gradient descent, rests on three axioms written down by Kolmogorov in 1933. The axioms do not say what probability means; they say what any consistent assignment of probabilities must satisfy. Frequentist long-run frequencies, Bayesian degrees of belief, and classical equally-likely-outcomes interpretations all produce probabilities obeying the same axioms. The interpretation is philosophical; the axioms are mathematical.
Reading this page is what lets every later result be unambiguous. When measure-theoretic probability writes "let be a probability space," this page is what that phrase means.
The Three Objects
Probability requires three objects, fixed before any random variable is introduced.
Sample Space
The sample space is a non-empty set whose elements are called outcomes. An outcome represents a complete specification of one possible result of the random experiment. The set is the entire space of "what could happen."
Event Sigma-Algebra
A collection of subsets of is a sigma-algebra (or event space) if:
- ,
- (closed under complements),
- (closed under countable unions).
Elements of are called events. Closure under complements and countable unions automatically gives closure under countable intersections, set differences, and limits.
Probability Measure
A probability measure is a function satisfying the three Kolmogorov axioms below. The triple is a probability space.
The reason events live in a sigma-algebra rather than is that not every subset of an uncountable can be assigned a probability consistently. The Vitali set on has no Lebesgue measure; trying to define on it gives a contradiction. The sigma-algebra is the largest collection of subsets on which can be defined coherently.
The Three Axioms
Kolmogorov Axioms of Probability
Statement
is a probability measure if and only if it satisfies:
- Non-negativity. for all .
- Normalization. .
- Countable additivity. For any countable collection of pairwise disjoint events (so for ),
Intuition
Axiom 1 rules out negative probability. Axiom 2 fixes the total mass at 1 (otherwise we'd be doing measure theory, not probability theory). Axiom 3 says probability behaves like a mass: putting probability on a countable disjoint union is the same as adding the masses on each piece. The choice of countable (not just finite) additivity is what gives probability its analytic strength: it forces continuity properties used in every limit theorem.
Proof Sketch
This is a definition disguised as a theorem; there is nothing to prove. The content is in the consequences below, all of which follow from these three axioms by elementary set manipulation.
Why It Matters
Every result in probability and statistics derives from these three axioms plus the structure of the chosen . The axioms are deliberately weak: they make no claim about how to assign probabilities to specific events, only about consistency requirements any such assignment must meet. This is what allows Bayesians, frequentists, and decision theorists to share a mathematical foundation while disagreeing on interpretation.
Failure Mode
A function satisfying only finite additivity (sums for finite disjoint unions) is a finitely additive probability, not a (countably additive) probability measure. Finitely additive probabilities exist on any algebra, but they fail the continuity properties below and break the dominated convergence theorem. Real-valued probability theory uses countable additivity because the analytic payoff (limit theorems, Lebesgue integration of expectations) is enormous. The cost is that not every subset of an uncountable is an event.
Immediate Consequences
The next four properties follow directly from the three axioms. Every later page uses them without proof.
Probability of the empty set: . Proof. Apply axiom 3 to to get , forcing .
Finite additivity: when . Proof. Apply countable additivity to , , for , using .
Complement rule: . Proof. and are disjoint with , so .
Monotonicity: . Proof. Write as a disjoint union, so by axiom 1.
A useful corollary: for every event . The codomain in the definition is forced by the axioms, not assumed.
Inclusion-Exclusion
For finite unions of overlapping events, additivity needs a correction.
Inclusion-Exclusion Principle
Statement
For events ,
For : . For : .
Intuition
Adding double-counts the overlap, so subtract . With three sets, the three pairwise overlaps subtract too much from the triple overlap, so add it back. The alternating sign pattern generalizes this bookkeeping to any finite .
Proof Sketch
Induction on . Base case follows by writing as a disjoint union and using . Inductive step: apply to and , distribute intersections, and collect.
Why It Matters
Inclusion-exclusion is the workhorse for computing union probabilities when you can compute intersections. It appears in derangement counts, the union bound (a one-term truncation), and inclusion-exclusion bounds in combinatorial probability. The Bonferroni inequalities are obtained by truncating the alternating sum after an even or odd number of terms, yielding lower or upper bounds.
Failure Mode
The number of terms grows as . For large , computing every intersection probability is infeasible, and inclusion-exclusion becomes a theoretical tool rather than a computational one. The union bound is the cheap one-sided alternative used throughout learning theory.
Continuity of Probability
Countable additivity is equivalent (given finite additivity and non-negativity) to a continuity property: probabilities respect monotone limits of events.
Continuity of Probability Measures
Statement
is countably additive (and hence a probability measure) if and only if both of the following hold:
Continuity from below. For any increasing sequence with ,
Continuity from above. For any decreasing sequence with ,
Intuition
"Increasing union" means the events grow to fill out their limit; the probabilities should grow to fill out the limit's probability. Without continuity, a sequence of events could grow to include "more and more" of while their probabilities stayed pinned below the union's probability, which would break every limit argument in probability.
Proof Sketch
Decompose the increasing union as a countable disjoint union: . Apply countable additivity: . The partial sums telescope to , so the limit equals . Continuity from above follows by passing to complements.
Why It Matters
This is what countable additivity buys you. Every "" argument, every interchange of limit and probability, every dominated convergence application for indicator functions, sits on this continuity. The Borel-Cantelli lemmas and the modes of convergence of random variables both rely on it.
Failure Mode
Continuity from above requires the events to be decreasing and at least one of them to have finite measure. For probability measures this is automatic (everything has measure at most 1), but for general measures (like Lebesgue measure on ) the assumption is necessary. Example: the sets decrease to but each has infinite Lebesgue measure.
Why Countable, Not Finite, Additivity
Finite additivity is the version most people guess when first writing down probability axioms. Why does the standard formulation insist on countable additivity?
Three reasons:
-
Limit theorems. Without continuity from below, the law of large numbers cannot be stated as a single statement about a sequence of averages. The law of large numbers and the central limit theorem both produce countable disjoint unions in their proofs.
-
Lebesgue integration. The expected value is the Lebesgue integral against . Lebesgue's monotone and dominated convergence theorems require countable additivity. Without them, you cannot interchange and , which is required in nearly every consistency proof.
-
Probability of unions of events with shrinking probability. Countable additivity is what guarantees that countably many "rare events" (each with small probability) cannot collectively sum to more than 1. This is the engine of the common inequalities used as union bounds throughout learning theory.
The price is the existence of non-measurable sets. On uncountable , not every subset is in . For all of probability and statistics, this is a fair trade.
Common Confusions
Probability zero is not the same as impossible
For a continuous random variable with density , for every fixed , yet takes some value with probability 1. "Probability zero" means the event has measure zero, not that it cannot happen. Symmetric: "probability one" (almost sure) does not mean "always," only that the exceptional set has measure zero.
Sigma-algebras are not optional bookkeeping
Many introductory treatments hide the sigma-algebra to keep notation light, writing as if every subset were an event. This works on discrete or finite , where you can take . On or , the only well-behaved choice is the Borel sigma-algebra generated by open sets, which excludes pathological sets like the Vitali construction. Pretending on is what causes Banach-Tarski-style paradoxes when you try to define a uniform probability on .
The axioms do not pick an interpretation
The axioms tell you what arithmetic probabilities must obey. They do not tell you whether means a long-run frequency, a betting rate, or a degree of belief. Frequentists, Bayesians, and subjectivists all use the same Kolmogorov axioms; they differ on what the numbers refer to. The mathematics is consistent across interpretations because the axioms are interpretation-free.
Summary
- A probability space is a triple : a sample space, an event sigma-algebra, and a probability measure.
- The three axioms are non-negativity, normalization, and countable additivity.
- Immediate consequences: , complement rule, monotonicity, finite additivity.
- Inclusion-exclusion handles finite unions of overlapping events; the union bound is its one-sided cheap relative.
- Countable additivity is equivalent to continuity of probability for monotone sequences of events; this continuity is what makes limit theorems possible.
- The axioms are silent on interpretation: frequentist, Bayesian, and classical accounts of probability all satisfy them.
Exercises
Problem
Let be a probability space and . Prove that (the two-event union bound), with equality if and only if has probability zero.
Problem
Construct a finitely additive probability on that is not countably additive, by assigning for every singleton but . (Such a exists, by appeal to the Hahn-Banach theorem or an ultrafilter on .) Then explain which Kolmogorov axiom this violates and why it cannot be a probability measure in the standard sense.
References
Original:
- Kolmogorov, "Grundbegriffe der Wahrscheinlichkeitsrechnung" (Springer, 1933); English translation "Foundations of the Theory of Probability" (Chelsea, 1956), Chapter 1
Standard graduate texts:
- Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 2-3
- Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 1.1
- Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 1
- Resnick, "A Probability Path" (Birkhauser, 1999), Chapters 1-2
Real analysis perspective:
- Folland, "Real Analysis: Modern Techniques and Their Applications" (2nd edition, Wiley, 1999), Chapter 1
- Rudin, "Real and Complex Analysis" (3rd edition, McGraw-Hill, 1987), Chapter 1
Next Topics
- Measure-theoretic probability: building expectation and Lebesgue integration on top of the axioms
- Joint, marginal, conditional distributions: the working notation that the axioms support
- Common inequalities: Markov, Chebyshev, Jensen, and the union bound
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A