Graph Algorithms Essentials

Sneiderman, Robby

Algorithms Foundations

Graph Algorithms Essentials

The graph algorithms every ML practitioner needs: BFS, DFS, Dijkstra, MST, and topological sort. Why they matter for computational graphs, knowledge graphs, dependency resolution, and GNNs.

CoreTier 2StableSupporting~45 min

Prerequisites

Sets Functions and Relations

Prereq Map

Learning position

Read this page in the graph.

algorithms-foundations | layer 0A | tier 2. This page has 1 direct prerequisite and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Dynamic Programming

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Graphs are everywhere in ML, even when you do not see them. A neural network's computation is a directed acyclic graph. Backpropagation is reverse topological order traversal. Package dependencies form a DAG. Knowledge graphs encode relational data. Graph neural networks operate directly on graph-structured data.

You cannot understand automatic differentiation without topological sort. You cannot understand message passing in GNNs without BFS. You cannot understand shortest-path problems in reasoning without Dijkstra. These algorithms are the vocabulary of graph computation.

Mental Model

A graph is a set of nodes connected by edges. Algorithms on graphs answer fundamental questions: can I reach node B from node A? What is the shortest path? What is the cheapest way to connect all nodes? Is there a valid ordering of nodes that respects all dependencies?

Each algorithm uses a different strategy: BFS explores level by level (breadth first), DFS explores as deep as possible first, Dijkstra greedily expands the closest unvisited node, and MST algorithms greedily add the cheapest safe edge.

Core Definitions

Definition

Graph $G = (V, E)$

A graph $G = (V, E)$ consists of a set of vertices (nodes) $V$ and a set of edges $E$ . Edges can be directed (ordered pairs $(u, v)$ ) or undirected (unordered pairs $\{u, v\}$ ). A weighted graph assigns a weight $w(u, v)$ to each edge. We use $n = |V|$ for the number of vertices and $m = |E|$ for the number of edges.

Definition

Directed Acyclic Graph (DAG)

A DAG is a directed graph with no directed cycles. Every DAG has at least one topological ordering. Computational graphs in neural networks are DAGs: nodes are operations, edges are data dependencies, and the absence of cycles means the computation can proceed in a well-defined order.

BFS: Breadth-First Search

Definition

Breadth-First Search (BFS)

BFS explores a graph layer by layer from a source node $s$ . It uses a queue:

Initialize: enqueue $s$ , mark $s$ as visited, set $\text{dist}(s) = 0$
While the queue is not empty:
- Dequeue node $u$
- For each neighbor $v$ $v$ of $u$ $u$ that is unvisited:
  - Mark $v$ as visited, set $\text{dist}(v) = \text{dist}(u) + 1$
  - Enqueue $v$

Time complexity: $O(n + m)$ . Space complexity: $O(n)$ .

BFS computes the shortest path in unweighted graphs: the first time BFS reaches a node, it has found the shortest path (fewest edges) from the source. This follows directly from the layer-by-layer exploration: all nodes at distance $k$ are visited before any node at distance $k+1$ .

ML application: In GNNs, the $k$ -hop neighborhood of a node (all nodes reachable in at most $k$ steps) is exactly what $k$ rounds of BFS-like message passing would explore. The number of GNN layers determines the receptive field, analogous to BFS depth.

DFS: Depth-First Search

Definition

Depth-First Search (DFS)

DFS explores as far as possible along each branch before backtracking. It uses a stack (or recursion):

Initialize: mark all nodes as unvisited
For each unvisited node $u$ : call $\text{DFS-Visit}(u)$
$\text{DFS-Visit}(u)$ : mark $u$ as visited. For each neighbor $v$ of $u$ that is unvisited, recursively call $\text{DFS-Visit}(v)$ . After all neighbors are processed, record $u$ 's finish time.

Time complexity: $O(n + m)$ . Space complexity: $O(n)$ .

DFS provides two critical capabilities:

Cycle detection: a directed graph has a cycle if and only if DFS encounters a back edge (an edge from a node to one of its ancestors in the DFS tree). This is used to validate that computational graphs are acyclic.
Topological sort: process nodes in reverse order of DFS finish times. Nodes that finish later are placed earlier in the ordering.

Topological Sort

Definition

Topological Sort

A topological ordering of a DAG is a linear ordering of its vertices such that for every directed edge $(u, v)$ , $u$ appears before $v$ . A topological sort can be computed by:

Run DFS on the entire graph
Output vertices in reverse order of finish times

Alternatively, use Kahn's algorithm: repeatedly remove nodes with in-degree zero and add them to the ordering.

Time complexity: $O(n + m)$ .

ML application: Automatic differentiation computes gradients by traversing the computational graph in reverse topological order. The forward pass evaluates operations in topological order (each operation runs after its inputs are available). The backward pass accumulates gradients in reverse topological order (each node's gradient is computed after all downstream gradients are available). This is why backpropagation requires a DAG structure.

Dijkstra's Algorithm

Theorem

Dijkstra's Algorithm Correctness

Statement

Dijkstra's algorithm computes the shortest-path distance from source $s$ to all reachable vertices. It maintains a set $S$ of vertices whose shortest distances are finalized and a priority queue of tentative distances. At each step, it extracts the vertex $u$ with minimum tentative distance, adds $u$ to $S$ , and relaxes all edges leaving $u$ :

$\text{dist}(v) \leftarrow \min(\text{dist}(v), \text{dist}(u) + w(u, v))$

When $u$ is extracted, $\text{dist}(u)$ equals the true shortest-path distance $\delta(s, u)$ .

Time complexity: $O((n + m) \log n)$ with a binary heap, $O(m + n \log n)$ with a Fibonacci heap.

Intuition

Dijkstra's algorithm is a greedy algorithm. At each step, it finalizes the unvisited vertex closest to the source. This is correct because all edge weights are non-negative: no future path through an unvisited vertex can be shorter than the direct path already found. The vertex with the smallest tentative distance cannot be improved by going through other unvisited vertices (which are farther away) and then traversing non-negative edges.

Proof Sketch

By induction on the number of finalized vertices we show that $\text{dist}(u) = \delta(s, u)$ at the moment $u$ is extracted. Base case: $\text{dist}(s) = 0$ is correct.

Inductive step. Suppose all vertices already in $S$ have correct distances and let $u$ be the next vertex extracted. Assume for contradiction $\text{dist}(u) > \delta(s, u)$ . Pick any true shortest path $s = v_0 \to v_1 \to \cdots \to v_k = u$ , and let $y$ be the first vertex on this path that is not in $S$ (so the predecessor of $y$ is in $S$ ). When that predecessor was finalized, the edge to $y$ was relaxed, so $\text{dist}(y) \leq \delta(s, y)$ . Since edge weights are non-negative, $\delta(s, y) \leq \delta(s, u)$ , giving $\text{dist}(y) \leq \delta(s, u) < \text{dist}(u)$ . But $u$ was chosen as the minimum-distance vertex outside $S$ , so $\text{dist}(u) \leq \text{dist}(y)$ , a contradiction. Hence $\text{dist}(u) = \delta(s, u)$ .

Non-negativity of edge weights is essential: with negative edges, a longer prefix could be improved by a later negative edge, so $\delta(s, y) \leq \delta(s, u)$ would fail.

Why It Matters

Dijkstra's algorithm is the standard for single-source shortest paths with non-negative weights. In ML, shortest-path computations appear in knowledge graph reasoning (finding the closest entity), graph-based semi-supervised learning (label propagation weighted by distance), and network analysis.

Failure Mode

Dijkstra fails with negative edge weights. Consider: $s \to a$ with weight 1, $s \to b$ with weight 3, and $b \to a$ with weight $-4$ . Dijkstra finalizes $a$ with distance 1, but the true shortest path $s \to b \to a$ has distance $3 + (-4) = -1$ . For negative weights (without negative cycles), use Bellman-Ford ( $O(nm)$ ).

report a correction →

Minimum Spanning Trees

Theorem

MST Cut Property

Statement

For any cut $(S, V \setminus S)$ of the graph, the minimum-weight edge crossing the cut is in every minimum spanning tree. Formally, if $e$ is the unique lightest edge with one endpoint in $S$ and the other in $V \setminus S$ , then $e$ belongs to every MST.

Intuition

A cut divides the graph into two parts. Any spanning tree must have at least one edge crossing the cut (otherwise the two parts are disconnected). Replacing a heavier crossing edge with the lightest crossing edge reduces the tree's total weight, so the lightest crossing edge must be in the MST.

Proof Sketch

Let $T$ be an MST that does not contain $e = (u, v)$ where $u \in S$ and $v \in V \setminus S$ . Adding $e$ to $T$ creates a cycle. This cycle must contain another edge $e'$ crossing the cut (the cycle enters $S$ and must leave it again). Since $w(e) < w(e')$ (by the distinct-weights assumption), $T' = T \cup \{e\} \setminus \{e'\}$ has lower total weight than $T$ , contradicting $T$ being an MST.

Why It Matters

The cut property is the foundation of both Kruskal's and Prim's algorithms. It provides a greedy certificate: if an edge is the lightest across some cut, it is safe to include it. In ML, minimum spanning trees appear in hierarchical clustering (single-linkage clustering is equivalent to computing the MST and cutting the heaviest edges) and in constructing sparse graph structures for graph-based learning.

Failure Mode

If edge weights are not distinct, the lightest crossing edge may not be unique, and there may be multiple MSTs. The cut property still guarantees that some lightest crossing edge is in some MST, but not that a specific edge is in every MST.

report a correction →

Kruskal's algorithm: Sort edges by weight. Add each edge if it does not create a cycle (check with union-find). Time: $O(m \log m)$ .

Prim's algorithm: Grow the MST from a starting vertex, always adding the cheapest edge connecting the tree to a non-tree vertex. Time: $O(m \log n)$ with a binary heap.

Why ML People Need Graph Algorithms

Computational graphs: neural network forward/backward passes are topological sort on a DAG
Knowledge graphs: entity-relation triples form directed graphs; shortest paths and connectivity queries support reasoning
Dependency resolution: package managers, data pipelines, and build systems use topological sort to order tasks
GNNs: message-passing GNNs generalize BFS-like neighborhood aggregation; understanding graph traversal clarifies the GNN receptive field
Clustering: single-linkage hierarchical clustering is MST-based; spectral clustering uses graph Laplacian eigenvectors

Common Confusions

Watch Out

BFS finds shortest paths only in unweighted graphs

BFS gives shortest paths when all edges have equal weight (or no weight). For weighted graphs, BFS can give incorrect distances. Dijkstra handles non-negative weights. Bellman-Ford handles negative weights (without negative cycles). Using BFS on a weighted graph is a common bug.

Watch Out

Topological sort exists only for DAGs

If a directed graph has a cycle, no topological ordering exists (you cannot place all cycle members before one another). Kahn's algorithm detects this: if the algorithm terminates without visiting all vertices, the graph contains a cycle. This is how dependency systems detect circular dependencies.

Watch Out

Dijkstra is greedy, but not all greedy shortest-path approaches work

Dijkstra's greedy choice (always finalize the closest vertex) works because of non-negative weights. A different greedy strategy (e.g., always relax the heaviest edge first) would not produce correct shortest paths. The correctness depends on the specific greedy criterion matching the problem structure.

Summary

BFS: queue-based, $O(n+m)$ , shortest paths in unweighted graphs
DFS: stack/recursion-based, $O(n+m)$ , cycle detection and topological sort
Topological sort: linear ordering respecting all directed edges; only for DAGs; used in backpropagation and dependency resolution
Dijkstra: greedy shortest path with non-negative weights, $O((n+m)\log n)$ ; fails with negative weights
MST (Kruskal/Prim): greedy via cut property; used in hierarchical clustering
Graph algorithms are the foundation for computational graphs, GNNs, and knowledge graph reasoning

Exercises

ExerciseCore

Problem

A neural network has 5 layers computed in sequence: input, hidden1, hidden2, hidden3, output. Draw the computational graph as a DAG. Give a topological ordering. In what order does backpropagation compute gradients?

ExerciseAdvanced

Problem

Dijkstra's algorithm on a graph with $n = 1000$ vertices and $m = 5000$ edges using a binary heap takes $O((n + m) \log n)$ time. Compute the concrete operation count and compare to Bellman-Ford's $O(nm)$ time. When would you prefer Bellman-Ford?

ExerciseResearch

Problem

In a $k$ -layer GNN with message passing, each node aggregates information from its $k$ -hop neighborhood. Relate this to BFS and explain what happens as $k$ increases in terms of the "over-smoothing" problem.

References

Canonical:

Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 20-23
Kleinberg & Tardos, Algorithm Design, Chapters 3-4

Current:

Hamilton, Graph Representation Learning (2020), Chapters 1-2
Bronstein et al., "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021), arXiv:2104.13478

Next Topics

The natural next steps from graph algorithms:

Dynamic programming: solving problems on graphs with overlapping subproblems
PageRank algorithm: random walks and eigenvector centrality on graphs

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Sets, Functions, and Relationslayer 0A · tier 1

Derived topics

2

Dynamic Programminglayer 0A · tier 1
PageRank Algorithmlayer 2 · tier 2

Graph-backed continuations

Dynamic Programming PageRank Algorithm