Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

ML Methods

PageRank Algorithm

PageRank as the stationary distribution of a random walk on a graph: damping factor, power iteration, eigenvector interpretation, and applications beyond web search.

CoreTier 2Stable~45 min
0

Why This Matters

PageRank answers a simple but powerful question: given a graph, which nodes are most important? The answer. define importance as the stationary distribution of a random walk. turns out to be one of the most widely used ideas in machine learning and data science, far beyond its original application to web search.

PageRank appears in citation analysis (which papers are most influential), social networks (who are the key influencers), knowledge graphs (which entities are most central), recommendation systems (which items are most relevant), and even in understanding neural network architectures.

Mental Model

Imagine a person randomly browsing the web. At each page, they either click a random outgoing link (with probability α\alpha) or jump to a completely random page (with probability 1α1 - \alpha). After a very long time, the fraction of time spent on each page converges to a fixed distribution. Pages where the random surfer spends more time are "more important."

PageRank is this limiting distribution.

Formal Setup

Let G=(V,E)G = (V, E) be a directed graph with n=Vn = |V| nodes. Let AA be the adjacency matrix where Aij=1A_{ij} = 1 if there is an edge from jj to ii (note: column jj lists jj's outgoing links).

Definition

Transition Matrix

The transition matrix MM of the random walk on GG is:

Mij=AijdjoutM_{ij} = \frac{A_{ij}}{d_j^{\text{out}}}

where djout=iAijd_j^{\text{out}} = \sum_i A_{ij} is the out-degree of node jj. Column jj of MM is a probability distribution over the neighbors of jj. For dangling nodes (nodes with no outgoing links), define the column as 1/n1/n for all entries (uniform random jump).

Definition

PageRank Vector

The PageRank vector π\pi is defined as the stationary distribution of the modified random walk:

π=αMπ+(1α)1n\pi = \alpha M \pi + (1 - \alpha) \frac{\mathbf{1}}{n}

where α(0,1)\alpha \in (0, 1) is the damping factor (typically α=0.85\alpha = 0.85) and 1/n\mathbf{1}/n is the uniform distribution over all nodes.

Equivalently, π\pi is the solution to:

π=(αM+1αn11)π\pi = \left(\alpha M + \frac{1 - \alpha}{n} \mathbf{1}\mathbf{1}^\top\right) \pi

The matrix M~=αM+1αn11\tilde{M} = \alpha M + \frac{1-\alpha}{n} \mathbf{1}\mathbf{1}^\top is the Google matrix. It represents a random walk where at each step, with probability α\alpha you follow a random outgoing link and with probability 1α1 - \alpha you teleport to a uniformly random node.

Main Theorems

Theorem

Existence and Uniqueness of PageRank

Statement

The PageRank vector π\pi exists, is unique, and has all positive entries. Specifically, π\pi is the unique probability vector satisfying M~π=π\tilde{M} \pi = \pi, where M~=αM+1αn11\tilde{M} = \alpha M + \frac{1-\alpha}{n}\mathbf{1}\mathbf{1}^\top.

Intuition

The teleportation term (1α)/n(1-\alpha)/n ensures that every node can reach every other node (the Markov chain is irreducible) and that the chain is aperiodic. By the Perron-Frobenius theorem, a positive stochastic matrix has a unique stationary distribution with all positive entries. Without teleportation, the random walk could get stuck in cycles or sinks, and the stationary distribution might not be unique.

Proof Sketch

The Google matrix M~\tilde{M} is a column-stochastic matrix (columns sum to 1) with all entries strictly positive (each entry is at least (1α)/n>0(1-\alpha)/n > 0). By the Perron-Frobenius theorem for positive matrices, M~\tilde{M} has a unique eigenvalue of magnitude 1 (which is 1 itself), and the corresponding eigenvector has all positive entries. Normalizing this eigenvector to sum to 1 gives the unique stationary distribution π\pi.

Why It Matters

Existence and uniqueness mean that PageRank is well-defined for any graph. The positivity of all entries means every node gets some PageRank, which is important for practical applications where you do not want nodes to have zero importance.

Failure Mode

If α=1\alpha = 1 (no teleportation), uniqueness can fail. The random walk on a graph with multiple strongly connected components may have multiple stationary distributions. The damping factor is not just a tuning parameter. It is mathematically necessary for uniqueness.

Computing PageRank: Power Iteration

Theorem

Power Iteration Convergence for PageRank

Statement

The power iteration πt+1=M~πt\pi_{t+1} = \tilde{M} \pi_t converges to the PageRank vector π\pi from any starting distribution π0\pi_0. The convergence rate is geometric:

πtπ12αt\|\pi_t - \pi\|_1 \leq 2\alpha^t

After t=O(log(1/ϵ)/log(1/α))t = O(\log(1/\epsilon) / \log(1/\alpha)) iterations, the error is at most ϵ\epsilon.

Intuition

At each step, the random surfer's distribution gets "mixed" by the teleportation. The teleportation mixes at rate 1α1 - \alpha per step, so after tt steps, the unmixed fraction is αt\alpha^t. For α=0.85\alpha = 0.85, you need about 50 iterations to reduce the error by a factor of 10310^3.

Proof Sketch

The second-largest eigenvalue of M~\tilde{M} in absolute value is at most α\alpha (the teleportation "shrinks" all eigenvalues except the dominant one toward zero by a factor of α\alpha). The error after tt steps is bounded by πtπ1cλ2tcαt\|\pi_t - \pi\|_1 \leq c \cdot |\lambda_2|^t \leq c \cdot \alpha^t. More precisely, write π0π=k2ckvk\pi_0 - \pi = \sum_{k \geq 2} c_k v_k in the eigenbasis. After tt multiplications, πtπ=k2ckλktvk\pi_t - \pi = \sum_{k \geq 2} c_k \lambda_k^t v_k, and each λkα|\lambda_k| \leq \alpha.

Why It Matters

Power iteration makes PageRank computation practical even for graphs with billions of nodes. Each iteration is a sparse matrix-vector multiplication (cost proportional to the number of edges), and convergence is fast. This is why Google could compute PageRank for the entire web.

Failure Mode

Convergence slows as α1\alpha \to 1. For α=0.99\alpha = 0.99, you need about 10 times as many iterations as for α=0.85\alpha = 0.85. There is a trade-off: larger α\alpha gives a PageRank that more faithfully reflects the link structure, but takes longer to compute and is more sensitive to graph structure.

The Eigenvector Interpretation

PageRank is the dominant eigenvector of the Google matrix M~\tilde{M}. This connects PageRank to spectral graph theory.

The eigenvalue equation M~π=π\tilde{M}\pi = \pi says that π\pi is an eigenvector with eigenvalue 1. Since M~\tilde{M} is a positive stochastic matrix, 1 is the largest eigenvalue (by Perron-Frobenius), and π\pi is the corresponding eigenvector.

This means PageRank can also be understood as the solution to an eigenvalue problem: find the eigenvector of the modified adjacency matrix with the largest eigenvalue. This connects to the broader theory of spectral methods on graphs.

The Role of the Damping Factor

The damping factor α\alpha controls the balance between two forces:

  • Link following (α\alpha close to 1): PageRank heavily reflects the link structure. Important nodes are those that receive links from other important nodes (recursive definition). Risk: sensitive to graph structure, susceptible to link spam.

  • Teleportation (α\alpha close to 0): PageRank approaches the uniform distribution. Every node is roughly equally important. Safe but uninformative.

The standard choice α=0.85\alpha = 0.85 means that at each step, the random surfer follows a link 85% of the time and teleports 15% of the time. This was found empirically to give good results for web search.

Applications Beyond Web Search

Citation networks: Nodes are papers, edges are citations. PageRank identifies influential papers that are cited by other influential papers. Unlike simple citation counts, PageRank weights citations by the importance of the citing paper.

Social networks: Nodes are users, edges are follow/friend relationships. PageRank identifies influential users. Variants like Personalized PageRank (replace uniform teleportation with teleportation biased toward a specific user) power recommendation systems.

Knowledge graphs: Nodes are entities, edges are relations. PageRank helps identify central entities for question answering and information retrieval.

Recommendation systems: Random walks on user-item graphs, where the stationary distribution (or a personalized variant) ranks items by relevance to a user.

Common Confusions

Watch Out

PageRank is not just counting incoming links

A node with many low-quality links can have lower PageRank than a node with few high-quality links. PageRank is recursive: a link from an important node transfers more importance than a link from an unimportant node. This is the key insight that made PageRank work for web search.

Watch Out

The damping factor is not optional

Without damping (α=1\alpha = 1), the random walk can get trapped in sinks (nodes with no outgoing links) or cycle between disconnected components. The stationary distribution may not be unique. The damping factor is not a regularization hack. It is a mathematical necessity for well-definedness.

Watch Out

PageRank and eigenvector centrality are not identical

Eigenvector centrality is the dominant eigenvector of the adjacency matrix AA. PageRank is the dominant eigenvector of the Google matrix M~\tilde{M}, which normalizes by out-degree and adds teleportation. PageRank is a refined version of eigenvector centrality that handles directed graphs, dangling nodes, and disconnected components.

Summary

  • PageRank is the stationary distribution of a random walk with teleportation
  • The Google matrix is M~=αM+(1α)11/n\tilde{M} = \alpha M + (1-\alpha)\mathbf{1}\mathbf{1}^\top/n
  • Damping factor α=0.85\alpha = 0.85 controls link-following vs teleportation
  • Existence and uniqueness follow from Perron-Frobenius (positive stochastic matrix)
  • Power iteration converges geometrically at rate α\alpha
  • PageRank = dominant eigenvector of the Google matrix
  • Applications: citation networks, social networks, knowledge graphs, recommendation systems

Exercises

ExerciseCore

Problem

Consider a 3-node graph: node 1 links to nodes 2 and 3, node 2 links to node 3, and node 3 links to node 1. Write down the transition matrix MM and the Google matrix M~\tilde{M} with α=0.85\alpha = 0.85.

ExerciseAdvanced

Problem

Prove that if α=1\alpha = 1 (no teleportation), the PageRank may not be unique. Give an example of a graph where the random walk has multiple stationary distributions.

ExerciseAdvanced

Problem

How many power iterations are needed to compute PageRank to within L1L_1 error ϵ=106\epsilon = 10^{-6} when α=0.85\alpha = 0.85? What about α=0.99\alpha = 0.99?

References

Canonical:

  • Page, Brin, Motwani, Winograd, "The PageRank Citation Ranking: Bringing Order to the Web" (1999)
  • Langville & Meyer, Google's PageRank and Beyond (2006)

Current:

  • Gleich, "PageRank Beyond the Web" (SIAM Review, 2015)

  • Hamilton, Graph Representation Learning (2020), Chapter 2

  • Bishop, Pattern Recognition and Machine Learning (2006), Chapters 1-14

Next Topics

The natural next steps from PageRank:

  • Spectral clustering: using eigenvectors of graph matrices for clustering
  • Graph neural networks: generalizing PageRank-style message passing with learned functions
  • Markov chains: the full theory of random walks and stationary distributions

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics