Foundations
Markov Chains and Steady State
Markov chains: the Markov property, transition matrices, stationary distributions, irreducibility, aperiodicity, the ergodic theorem, and mixing time. The backbone of PageRank, MCMC, and reinforcement learning.
Why This Matters
Markov chains are the probabilistic backbone of MCMC sampling, reinforcement learning, and web search. Every Metropolis-Hastings sampler is a Markov chain designed to converge to a target distribution. Every MDP under a fixed policy reduces to a Markov chain. PageRank computes the stationary distribution of a Markov chain on the web graph.
Mental Model
A system has states. At each time step, it jumps to a new state with probabilities that depend only on the current state, not on how it got there. This memorylessness is the Markov property. The central question: if you run the chain long enough, does the fraction of time spent in each state converge to a fixed distribution?
Formal Setup and Notation
Markov chain
A sequence of random variables taking values in a state space is a Markov chain if for all and all states :
The future is conditionally independent of the past given the present.
Transition matrix
For a finite state space , the transition matrix has entries:
Each row sums to 1: . Such a matrix is called stochastic. The -step transition probabilities are given by .
Stationary distribution
A distribution over is stationary for if:
That is, is a left eigenvector of with eigenvalue 1. If the chain starts in distribution , it stays in forever.
Irreducibility
A Markov chain is irreducible if every state can be reached from every other state in a finite number of steps. Formally, for all , there exists such that .
Aperiodicity
State has period . A chain is aperiodic if every state has period 1. A self-loop ( for some state) guarantees aperiodicity for irreducible chains.
Detailed balance (reversibility)
A distribution satisfies detailed balance with respect to if:
Detailed balance implies stationarity (sum both sides over ). MCMC methods construct chains satisfying detailed balance with respect to a target distribution.
Main Theorems
Ergodic Theorem for Markov Chains
Statement
If a Markov chain on a finite state space is irreducible and aperiodic, then it has a unique stationary distribution and for any initial state :
Moreover, the time-average converges: for any function on ,
Intuition
Run the chain long enough and the distribution over states converges to regardless of where you started. Time averages converge to ensemble averages. This is why MCMC works: you sample a trajectory and average over it to approximate expectations under the target distribution.
Proof Sketch
The Perron-Frobenius theorem guarantees that a positive stochastic matrix (which becomes for large enough under irreducibility and aperiodicity) has a unique left eigenvector with eigenvalue 1, and all other eigenvalues have modulus strictly less than 1. The powers therefore converge to the matrix with all rows equal to .
Why It Matters
This theorem is the theoretical foundation of MCMC. It guarantees that Metropolis-Hastings and Gibbs sampling converge to the correct distribution. It also justifies PageRank: the stationary distribution of a random walk on the web graph is the importance vector.
Failure Mode
If the chain is reducible (disconnected components), there are multiple stationary distributions. If it is periodic (e.g., a bipartite graph with period 2), the chain oscillates and never converges to a single distribution, though the time-average still converges. The ergodic theorem says nothing about how fast convergence happens. That requires mixing time analysis.
Perron-Frobenius for Stochastic Matrices
Statement
has eigenvalue 1 with algebraic multiplicity 1. All other eigenvalues satisfy . The left eigenvector corresponding to eigenvalue 1, normalized to sum to 1, is the unique stationary distribution .
Intuition
The largest eigenvalue controls long-term behavior. Since it is 1, the chain neither grows nor shrinks in total probability. The gap between the largest and second-largest eigenvalue magnitude controls the rate of convergence.
Proof Sketch
Apply the Perron-Frobenius theorem for nonneg matrices. Irreducibility implies the matrix is irreducible as a nonneg matrix. Aperiodicity plus irreducibility implies primitivity, which gives .
Why It Matters
The spectral gap is the single most important quantity for Markov chain convergence speed. A large spectral gap means fast mixing. This connects Markov chain theory to spectral graph theory.
Failure Mode
For countably infinite state spaces, eigenvalue 1 may be in the continuous spectrum and the chain can be null-recurrent (stationary distribution does not exist even though the chain is irreducible). This does not happen for finite chains.
Mixing Time
Mixing time
The mixing time is:
where is the total variation distance. Conventionally, .
The spectral gap bounds mixing time:
where is the number of states. Small spectral gap implies slow mixing. Random walks on expander graphs have large spectral gaps and mix in steps.
Canonical Examples
Two-state chain
States with transition matrix:
The stationary distribution is . The second eigenvalue is , so the spectral gap is . When and are both small, the chain mixes slowly because it stays in each state for a long time.
Common Confusions
Stationary does not mean converges to
A chain can have a stationary distribution without converging to it. A periodic chain has a stationary distribution but oscillates forever. Convergence requires both irreducibility and aperiodicity.
Detailed balance is sufficient, not necessary
Many chains have stationary distributions without satisfying detailed balance. Detailed balance is a convenient sufficient condition used in MCMC design, but non-reversible chains (which violate detailed balance) can mix faster.
Markov property is about conditional independence, not memorylessness of states
A Markov chain can revisit states and have complex long-run behavior. The Markov property only says that the transition probability depends on the current state alone. The chain absolutely "remembers" where it has been in the sense that past states influence the current state; it just does not use that memory for the next transition.
Summary
- Markov property: the future depends only on the present state
- Transition matrix is row-stochastic; defines the stationary distribution
- Irreducibility + aperiodicity = unique stationary distribution and convergence (ergodic theorem)
- Spectral gap controls mixing speed
- Detailed balance is the design principle behind MCMC samplers
- MDP under a fixed policy is a Markov chain on the state space
Exercises
Problem
Compute the stationary distribution of the transition matrix:
Problem
A random walk on a cycle graph (move left or right with probability each) is periodic with period 2 when is even. How would you modify the chain to make it aperiodic without changing the stationary distribution?
References
Canonical:
- Levin, Peres, Wilmer, Markov Chains and Mixing Times (2009), Chapters 1-4, 12
- Norris, Markov Chains (1997), Chapters 1-3
Current:
-
Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues (2020), Part I
-
Munkres, Topology (2000), Chapter 1 (set theory review)
Next Topics
From Markov chains, the natural continuations are:
- Metropolis-Hastings: designing Markov chains with a target stationary distribution
- Markov decision processes: adding actions and rewards to Markov chains
- PageRank algorithm: computing the stationary distribution of the web graph
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A