Markov Chains and Steady State

Sneiderman, Robby

Foundations

Markov Chains and Steady State

Markov chains: the Markov property, transition matrices, stationary distributions, irreducibility, aperiodicity, the ergodic theorem, and mixing time. The backbone of PageRank, MCMC, and reinforcement learning.

CoreTier 2StableSupporting~50 min

Prerequisites

Common Probability Distributions Eigenvalues and Eigenvectors Pagerank Algorithm

Start 8-question practice · 6 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

foundations | layer 1 | tier 2. This page has 3 direct prerequisites and 5 published dependents.

Open Atlas Prerequisites Leads to

What next

Metropolis-Hastings Algorithm

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Markov chains are the probabilistic backbone of MCMC sampling, reinforcement learning, and web search. Every Metropolis-Hastings sampler is a Markov chain designed to converge to a target distribution. Every MDP under a fixed policy reduces to a Markov chain. PageRank computes the stationary distribution of a Markov chain on the web graph.

theorem visual

Steady-State Flow

$An irreducible, aperiodic Markov chain forgets its start: different initial states are all pushed toward the same stationary distribution.$

Time step

step 4

same target from every start

start A

TV gap 0.055

A

0.382

B

0.385

C

0.233

start B

TV gap 0.018

A

0.309

B

0.410

C

0.281

start C

TV gap 0.039

A

0.289

B

0.405

C

0.306

stationary identity

$π = π P$

The dashed markers show the long-run weights the bars settle toward.

same long-run target

$P^{t} (A, \cdot), P^{t} (B, \cdot), P^{t} (C, \cdot) \to π$

The point is not one lucky start. Every initial state is being washed into the same stationary mix.

worst TV gap

0.055

This is the largest remaining distance to $π$ across the three starts at step 4.

ergodic reading

$irreducible + aperiodic \Rightarrow P^{t} (i, \cdot) \to π$

The theorem is about both distributional convergence and time averages along a single long trajectory.

Mental Model

A system has states. At each time step, it jumps to a new state with probabilities that depend only on the current state, not on how it got there. This memorylessness is the Markov property. The central question: if you run the chain long enough, does the fraction of time spent in each state converge to a fixed distribution?

Formal Setup and Notation

Definition

Markov chain

A sequence of random variables $X_0, X_1, X_2, \ldots$ taking values in a state space $\mathcal{S}$ is a Markov chain if and only if for all $n$ and all states $s_0, \ldots, s_{n+1}$ :

$P(X_{n+1} = s_{n+1} \mid X_n = s_n, \ldots, X_0 = s_0) = P(X_{n+1} = s_{n+1} \mid X_n = s_n)$

The future is conditionally independent of the past given the present.

Definition

Transition matrix $P$

For a finite state space $\mathcal{S} = \{1, \ldots, N\}$ , the transition matrix $P \in \mathbb{R}^{N \times N}$ has entries:

$P_{ij} = P(X_{n+1} = j \mid X_n = i)$

Each row sums to 1: $\sum_j P_{ij} = 1$ . Such a matrix is called stochastic. The $n$ -step transition probabilities are given by $P^n$ .

Definition

Stationary distribution $π$

A distribution $\pi$ over $\mathcal{S}$ is stationary for $P$ if and only if:

$\pi = \pi P$

That is, $\pi$ is a left eigenvector of $P$ with eigenvalue 1. If the chain starts in distribution $\pi$ , it stays in $\pi$ forever.

Definition

Irreducibility

A Markov chain is irreducible if and only if every state can be reached from every other state in a finite number of steps. Formally, for all $i, j$ , there exists $n > 0$ such that $P^n_{ij} > 0$ .

Definition

Aperiodicity

State $i$ has period $d_i = \gcd\{n \geq 1 : P^n_{ii} > 0\}$ . A chain is aperiodic if and only if every state has period 1. A self-loop ( $P_{ii} > 0$ for some state) guarantees aperiodicity for irreducible chains.

Definition

Detailed balance (reversibility)

A distribution $\pi$ satisfies detailed balance with respect to $P$ if and only if:

$\pi_i P_{ij} = \pi_j P_{ji} \quad \text{for all } i, j$

Detailed balance implies stationarity (sum both sides over $i$ ). MCMC methods construct chains satisfying detailed balance with respect to a target distribution.

Main Theorems

Theorem

Ergodic Theorem for Markov Chains

Statement

If a Markov chain on a finite state space is irreducible and aperiodic, then it has a unique stationary distribution $\pi$ and for any initial state $i$ :

$\lim_{n \to \infty} P^n_{ij} = \pi_j \quad \text{for all } j$

Moreover, the time-average converges: for any function $f$ on $\mathcal{S}$ ,

$\frac{1}{T} \sum_{t=0}^{T-1} f(X_t) \xrightarrow{a.s.} \sum_j \pi_j f(j)$

Intuition

Run the chain long enough and the distribution over states converges to $\pi$ regardless of where you started. Time averages converge to ensemble averages. This is why MCMC works: you sample a trajectory and average over it to approximate expectations under the target distribution.

Proof Sketch

The Perron-Frobenius theorem guarantees that a positive stochastic matrix (which $P^n$ becomes for large enough $n$ under irreducibility and aperiodicity) has a unique left eigenvector with eigenvalue 1, and all other eigenvalues have modulus strictly less than 1. The powers $P^n$ therefore converge to the matrix with all rows equal to $\pi$ .

Why It Matters

This theorem is the theoretical foundation of MCMC. It guarantees that Metropolis-Hastings and Gibbs sampling converge to the correct distribution. It also justifies PageRank: the stationary distribution of a random walk on the web graph is the importance vector.

Failure Mode

If the chain is reducible (disconnected components), there are multiple stationary distributions. If it is periodic (e.g., a bipartite graph with period 2), the chain oscillates and never converges to a single distribution, though the time-average still converges. The ergodic theorem says nothing about how fast convergence happens. That requires mixing time analysis.

report a correction →

Theorem

Perron-Frobenius for Stochastic Matrices

Statement

$P$ has eigenvalue 1 with algebraic multiplicity 1. All other eigenvalues $\lambda$ satisfy $|\lambda| < 1$ . The left eigenvector corresponding to eigenvalue 1, normalized to sum to 1, is the unique stationary distribution $\pi$ .

Intuition

The largest eigenvalue controls long-term behavior. Since it is 1, the chain neither grows nor shrinks in total probability. The gap $1 - |\lambda_2|$ between the largest and second-largest eigenvalue magnitude controls the rate of convergence.

Proof Sketch

Apply the Perron-Frobenius theorem for nonneg matrices. Irreducibility implies the matrix is irreducible as a nonneg matrix. Aperiodicity plus irreducibility implies primitivity, which gives $|\lambda_2| < 1$ .

Why It Matters

The spectral gap $1 - |\lambda_2|$ is the single most important quantity for Markov chain convergence speed. A large spectral gap means fast mixing. This connects Markov chain theory to spectral graph theory.

Failure Mode

For countably infinite state spaces, eigenvalue 1 may be in the continuous spectrum and the chain can be null-recurrent (stationary distribution does not exist even though the chain is irreducible). This does not happen for finite chains.

report a correction →

Mixing Time

Definition

Mixing time $t_{mix} (ϵ)$

The mixing time is:

$t_{\text{mix}}(\epsilon) = \min\{t : \max_i \| P^t(i, \cdot) - \pi \|_{\text{TV}} \leq \epsilon\}$

where $\| \cdot \|_{\text{TV}}$ is the total variation distance. Conventionally, $t_{\text{mix}} = t_{\text{mix}}(1/4)$ .

For an irreducible, aperiodic, and reversible chain with stationary distribution $\pi$ , the spectral gap $\gamma_* := 1 - |\lambda_2|$ controls mixing time via

$\frac{1}{2 \gamma_*} \cdot \log\frac{1}{2\epsilon} \;\leq\; t_{\text{mix}}(\epsilon) \;\leq\; \frac{1}{\gamma_*} \cdot \log\frac{1}{\pi_{\min} \, \epsilon},$

where $\pi_{\min} = \min_x \pi(x)$ (Levin–Peres–Wilmer, Markov Chains and Mixing Times, Thm. 12.3–12.4). The dependence on $\pi_{\min}$ — not the cruder $\log(N/\epsilon)$ — is what controls real-world bounds: for nearly-uniform $\pi$ , $\log(1/\pi_{\min}) \approx \log N$ , but for skewed $\pi$ it is much larger. For non-reversible chains the relevant gap involves the singular values of the additive symmetrization, and the upper bound generally weakens. Random walks on expander graphs have $\gamma_* = \Omega(1)$ and mix in $O(\log N)$ steps.

Canonical Examples

Example

Two-state chain

States $\{0, 1\}$ with transition matrix:

$P = \begin{pmatrix} 1-\alpha & \alpha \\ \beta & 1-\beta \end{pmatrix}$

The stationary distribution is $\pi = (\beta/(\alpha+\beta), \; \alpha/(\alpha+\beta))$ . The second eigenvalue is $1 - \alpha - \beta$ , so the spectral gap is $\alpha + \beta$ . When $\alpha$ and $\beta$ are both small, the chain mixes slowly because it stays in each state for a long time.

Common Confusions

Watch Out

Stationary does not mean converges to

A chain can have a stationary distribution without converging to it. A periodic chain has a stationary distribution but oscillates forever. Convergence requires both irreducibility and aperiodicity.

Watch Out

Detailed balance is sufficient, not necessary

Many chains have stationary distributions without satisfying detailed balance. Detailed balance is a convenient sufficient condition used in MCMC design, but non-reversible chains (which violate detailed balance) can mix faster.

Watch Out

Markov property is about conditional independence, not memorylessness of states

A Markov chain can revisit states and have complex long-run behavior. The Markov property only says that the transition probability depends on the current state alone. The chain absolutely "remembers" where it has been in the sense that past states influence the current state; it just does not use that memory for the next transition.

Summary

Markov property: the future depends only on the present state
Transition matrix $P$ is row-stochastic; $\pi = \pi P$ defines the stationary distribution
Irreducibility + aperiodicity = unique stationary distribution and convergence (ergodic theorem)
Spectral gap $1 - |\lambda_2|$ controls mixing speed
Detailed balance is the design principle behind MCMC samplers
MDP under a fixed policy is a Markov chain on the state space

Exercises

ExerciseCore

Problem

Compute the stationary distribution of the transition matrix:

$P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}$

ExerciseAdvanced

Problem

A random walk on a cycle graph $C_n$ (move left or right with probability $1/2$ each) is periodic with period 2 when $n$ is even. How would you modify the chain to make it aperiodic without changing the stationary distribution?

References

Canonical:

Levin, Peres, Wilmer, Markov Chains and Mixing Times (2009), Chapters 1-4, 12
Norris, Markov Chains (1997), Chapters 1-3

Current:

Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues (2020), Part I
Munkres, Topology (2000), Chapter 1 (set theory review)

Next Topics

From Markov chains, the natural continuations are:

Metropolis-Hastings: designing Markov chains with a target stationary distribution
Markov decision processes: adding actions and rewards to Markov chains
PageRank algorithm: computing the stationary distribution of the web graph

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Eigenvalues and Eigenvectorslayer 0A · tier 1
PageRank Algorithmlayer 2 · tier 2

Derived topics

5

Markov Chain Monte Carlolayer 2 · tier 1
Markov Decision Processeslayer 2 · tier 1
Metropolis-Hastings Algorithmlayer 2 · tier 1
Burn-in and Convergence Diagnosticslayer 2 · tier 2
State Space Modelslayer 2 · tier 2

Graph-backed continuations

Metropolis-Hastings Algorithm Markov Decision Processes Burn-in and Convergence Diagnostics Markov Chain Monte Carlo State Space Models