Algorithms Foundations
Graph Algorithms Essentials
The graph algorithms every ML practitioner needs: BFS, DFS, Dijkstra, MST, and topological sort. Why they matter for computational graphs, knowledge graphs, dependency resolution, and GNNs.
Why This Matters
Graphs are everywhere in ML, even when you do not see them. A neural network's computation is a directed acyclic graph. Backpropagation is reverse topological order traversal. Package dependencies form a DAG. Knowledge graphs encode relational data. Graph neural networks operate directly on graph-structured data.
You cannot understand automatic differentiation without topological sort. You cannot understand message passing in GNNs without BFS. You cannot understand shortest-path problems in reasoning without Dijkstra. These algorithms are the vocabulary of graph computation.
Mental Model
A graph is a set of nodes connected by edges. Algorithms on graphs answer fundamental questions: can I reach node B from node A? What is the shortest path? What is the cheapest way to connect all nodes? Is there a valid ordering of nodes that respects all dependencies?
Each algorithm uses a different strategy: BFS explores level by level (breadth first), DFS explores as deep as possible first, Dijkstra greedily expands the closest unvisited node, and MST algorithms greedily add the cheapest safe edge.
Core Definitions
Graph
A graph consists of a set of vertices (nodes) and a set of edges . Edges can be directed (ordered pairs ) or undirected (unordered pairs ). A weighted graph assigns a weight to each edge. We use for the number of vertices and for the number of edges.
Directed Acyclic Graph (DAG)
A DAG is a directed graph with no directed cycles. Every DAG has at least one topological ordering. Computational graphs in neural networks are DAGs: nodes are operations, edges are data dependencies, and the absence of cycles means the computation can proceed in a well-defined order.
BFS: Breadth-First Search
Breadth-First Search (BFS)
BFS explores a graph layer by layer from a source node . It uses a queue:
- Initialize: enqueue , mark as visited, set
- While the queue is not empty:
- Dequeue node
- For each neighbor of that is unvisited:
- Mark as visited, set
- Enqueue
Time complexity: . Space complexity: .
BFS computes the shortest path in unweighted graphs: the first time BFS reaches a node, it has found the shortest path (fewest edges) from the source. This follows directly from the layer-by-layer exploration: all nodes at distance are visited before any node at distance .
ML application: In GNNs, the -hop neighborhood of a node (all nodes reachable in at most steps) is exactly what rounds of BFS-like message passing would explore. The number of GNN layers determines the receptive field, analogous to BFS depth.
DFS: Depth-First Search
Depth-First Search (DFS)
DFS explores as far as possible along each branch before backtracking. It uses a stack (or recursion):
- Initialize: mark all nodes as unvisited
- For each unvisited node : call
- : mark as visited. For each neighbor of that is unvisited, recursively call . After all neighbors are processed, record 's finish time.
Time complexity: . Space complexity: .
DFS provides two critical capabilities:
- Cycle detection: a directed graph has a cycle if and only if DFS encounters a back edge (an edge from a node to one of its ancestors in the DFS tree). This is used to validate that computational graphs are acyclic.
- Topological sort: process nodes in reverse order of DFS finish times. Nodes that finish later are placed earlier in the ordering.
Topological Sort
Topological Sort
A topological ordering of a DAG is a linear ordering of its vertices such that for every directed edge , appears before . A topological sort can be computed by:
- Run DFS on the entire graph
- Output vertices in reverse order of finish times
Alternatively, use Kahn's algorithm: repeatedly remove nodes with in-degree zero and add them to the ordering.
Time complexity: .
ML application: Automatic differentiation computes gradients by traversing the computational graph in reverse topological order. The forward pass evaluates operations in topological order (each operation runs after its inputs are available). The backward pass accumulates gradients in reverse topological order (each node's gradient is computed after all downstream gradients are available). This is why backpropagation requires a DAG structure.
Dijkstra's Algorithm
Dijkstra's Algorithm Correctness
Statement
Dijkstra's algorithm computes the shortest-path distance from source to all reachable vertices. It maintains a set of vertices whose shortest distances are finalized and a priority queue of tentative distances. At each step, it extracts the vertex with minimum tentative distance, adds to , and relaxes all edges leaving :
When is extracted, equals the true shortest-path distance .
Time complexity: with a binary heap, with a Fibonacci heap.
Intuition
Dijkstra's algorithm is a greedy algorithm. At each step, it finalizes the unvisited vertex closest to the source. This is correct because all edge weights are non-negative: no future path through an unvisited vertex can be shorter than the direct path already found. The vertex with the smallest tentative distance cannot be improved by going through other unvisited vertices (which are farther away) and then traversing non-negative edges.
Proof Sketch
By induction on the number of finalized vertices. Base case: is correct. Inductive step: suppose all vertices in have correct distances. Let be the next vertex extracted. Assume for contradiction that . Then the true shortest path from to passes through some vertex before reaching . But , and since , (because was chosen as the minimum). But , contradicting the assumption. Non-negative weights are essential: with negative edges, a vertex farther in the queue could lead to a shorter total path.
Why It Matters
Dijkstra's algorithm is the standard for single-source shortest paths with non-negative weights. In ML, shortest-path computations appear in knowledge graph reasoning (finding the closest entity), graph-based semi-supervised learning (label propagation weighted by distance), and network analysis.
Failure Mode
Dijkstra fails with negative edge weights. Consider: with weight 1, with weight 3, and with weight . Dijkstra finalizes with distance 1, but the true shortest path has distance . For negative weights (without negative cycles), use Bellman-Ford ().
Minimum Spanning Trees
MST Cut Property
Statement
For any cut of the graph, the minimum-weight edge crossing the cut is in every minimum spanning tree. Formally, if is the unique lightest edge with one endpoint in and the other in , then belongs to every MST.
Intuition
A cut divides the graph into two parts. Any spanning tree must have at least one edge crossing the cut (otherwise the two parts are disconnected). Replacing a heavier crossing edge with the lightest crossing edge reduces the tree's total weight, so the lightest crossing edge must be in the MST.
Proof Sketch
Let be an MST that does not contain where and . Adding to creates a cycle. This cycle must contain another edge crossing the cut (the cycle enters and must leave it again). Since (by the distinct-weights assumption), has lower total weight than , contradicting being an MST.
Why It Matters
The cut property is the foundation of both Kruskal's and Prim's algorithms. It provides a greedy certificate: if an edge is the lightest across some cut, it is safe to include it. In ML, minimum spanning trees appear in hierarchical clustering (single-linkage clustering is equivalent to computing the MST and cutting the heaviest edges) and in constructing sparse graph structures for graph-based learning.
Failure Mode
If edge weights are not distinct, the lightest crossing edge may not be unique, and there may be multiple MSTs. The cut property still guarantees that some lightest crossing edge is in some MST, but not that a specific edge is in every MST.
Kruskal's algorithm: Sort edges by weight. Add each edge if it does not create a cycle (check with union-find). Time: .
Prim's algorithm: Grow the MST from a starting vertex, always adding the cheapest edge connecting the tree to a non-tree vertex. Time: with a binary heap.
Why ML People Need Graph Algorithms
- Computational graphs: neural network forward/backward passes are topological sort on a DAG
- Knowledge graphs: entity-relation triples form directed graphs; shortest paths and connectivity queries support reasoning
- Dependency resolution: package managers, data pipelines, and build systems use topological sort to order tasks
- GNNs: message-passing GNNs generalize BFS-like neighborhood aggregation; understanding graph traversal clarifies the GNN receptive field
- Clustering: single-linkage hierarchical clustering is MST-based; spectral clustering uses graph Laplacian eigenvectors
Common Confusions
BFS finds shortest paths only in unweighted graphs
BFS gives shortest paths when all edges have equal weight (or no weight). For weighted graphs, BFS can give incorrect distances. Dijkstra handles non-negative weights. Bellman-Ford handles negative weights (without negative cycles). Using BFS on a weighted graph is a common bug.
Topological sort exists only for DAGs
If a directed graph has a cycle, no topological ordering exists (you cannot place all cycle members before one another). Kahn's algorithm detects this: if the algorithm terminates without visiting all vertices, the graph contains a cycle. This is how dependency systems detect circular dependencies.
Dijkstra is greedy, but not all greedy shortest-path approaches work
Dijkstra's greedy choice (always finalize the closest vertex) works because of non-negative weights. A different greedy strategy (e.g., always relax the heaviest edge first) would not produce correct shortest paths. The correctness depends on the specific greedy criterion matching the problem structure.
Summary
- BFS: queue-based, , shortest paths in unweighted graphs
- DFS: stack/recursion-based, , cycle detection and topological sort
- Topological sort: linear ordering respecting all directed edges; only for DAGs; used in backpropagation and dependency resolution
- Dijkstra: greedy shortest path with non-negative weights, ; fails with negative weights
- MST (Kruskal/Prim): greedy via cut property; used in hierarchical clustering
- Graph algorithms are the foundation for computational graphs, GNNs, and knowledge graph reasoning
Exercises
Problem
A neural network has 5 layers computed in sequence: input, hidden1, hidden2, hidden3, output. Draw the computational graph as a DAG. Give a topological ordering. In what order does backpropagation compute gradients?
Problem
Dijkstra's algorithm on a graph with vertices and edges using a binary heap takes time. Compute the concrete operation count and compare to Bellman-Ford's time. When would you prefer Bellman-Ford?
Problem
In a -layer GNN with message passing, each node aggregates information from its -hop neighborhood. Relate this to BFS and explain what happens as increases in terms of the "over-smoothing" problem.
References
Canonical:
- Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 20-23
- Kleinberg & Tardos, Algorithm Design, Chapters 3-4
Current:
- Hamilton, Graph Representation Learning (2020), Chapters 1-2
- Bronstein et al., "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021)
Next Topics
The natural next steps from graph algorithms:
- Dynamic programming: solving problems on graphs with overlapping subproblems
- PageRank algorithm: random walks and eigenvector centrality on graphs
Last reviewed: April 2026