Dynamic Programming

Sneiderman, Robby

Algorithms Foundations

Dynamic Programming

Solve complex optimization problems by decomposing them into overlapping subproblems with optimal substructure. The algorithmic backbone of sequence models, control theory, and reinforcement learning.

CoreTier 1StableSupporting~50 min

Prerequisites

Sets Functions and Relations Graph Algorithms Essentials Greedy Algorithms

Start 8-question practice · 1 available Prereq Map

Learning position

Read this page in the graph.

algorithms-foundations | layer 0A | tier 1. This page has 3 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Greedy Algorithms

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Dynamic programming is how you solve problems that greedy cannot touch. Whenever an optimization problem has optimal substructure and overlapping subproblems, DP gives you an efficient exact solution by trading space for time. Beyond algorithms courses, the Bellman equation underlying DP is the same Bellman equation that drives all of reinforcement learning.

If you understand DP deeply, you understand value iteration, policy iteration, and Q-learning at the algorithmic level—all of which appear in Markov decision processes.

Mental Model

Instead of solving a problem from scratch, ask: can I express the solution to this problem in terms of solutions to smaller versions of the same problem? If yes, solve all the small problems first (or cache them as you encounter them) and build up to the full solution.

The key difference from divide-and-conquer: subproblems overlap. In merge sort, the subproblems are disjoint. In DP, the same subproblem appears many times, so you solve it once and store the result.

Core Definitions

Definition

Optimal Substructure

A problem exhibits optimal substructure with respect to a chosen decomposition into subproblems $\{P_1, P_2, \ldots\}$ if some optimal solution $x^*$ to the whole problem decomposes into optimal solutions to the subproblems induced by that choice. Optimal substructure is a property of the decomposition, not of the problem in the abstract: the same problem can admit a decomposition with optimal substructure (which DP can exploit) and other decompositions where it fails. The classical caution is that in the longest simple path problem the natural prefix decomposition does not preserve optimality.

Definition

Overlapping Subproblems

A recursive decomposition has overlapping subproblems if and only if the same subproblem is encountered multiple times during the recursion. Without memoization, this causes exponential blowup. With memoization or bottom-up tabulation, each subproblem is solved exactly once.

Definition

Bellman Equation $V (s) = max_{a} [r (s, a) + γ V (s^{'})]$

The Bellman equation expresses the value of a state as the immediate reward plus the discounted value of the successor state under the optimal action:

$V^*(s) = \max_{a} \left[ r(s, a) + \gamma \sum_{s'} P(s' \mid s, a) \, V^*(s') \right]$

This is the DP recursion for sequential decision-making under uncertainty.

Main Theorems

Theorem

Bellman's Principle of Optimality

Statement

An optimal policy has the property that, regardless of the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Intuition

If you are on the optimal path from A to C through B, then the sub-path from B to C must itself be optimal. If it were not, you could replace it with a better sub-path and improve the whole solution, contradicting optimality.

Proof Sketch

By contradiction. Suppose the overall solution $x^*$ is optimal but its restriction to some subproblem is not. Then replacing the subproblem solution with a better one yields a strictly better overall solution, contradicting the optimality of $x^*$ .

Why It Matters

This principle is the justification for DP. Without it, solving subproblems optimally and combining them would not yield a globally optimal solution. It is also the foundation of the Bellman equation in RL: the value of a state depends only on optimal behavior from that state onward.

Failure Mode

The principle fails when decisions interact in ways that prevent decomposition. For example, in problems with global constraints that couple all decisions (certain scheduling problems with resource sharing), optimal substructure may not hold.

report a correction →

Top-Down vs Bottom-Up

Top-down (memoization): Write the natural recursion, then add a cache. You only solve subproblems that are actually needed. Easier to code but has recursion overhead and potential stack depth issues.

Bottom-up (tabulation): Determine the order of subproblems, fill in a table from smallest to largest. No recursion overhead. Enables space optimization when you only need the previous row of the table.

Both approaches have the same asymptotic time complexity when both visit the same set of subproblems. When a query touches only a small fraction of the state space, top-down memoization can be asymptotically cheaper because it explores only reachable subproblems, while a naive bottom-up sweep fills the entire table. The choice is otherwise a matter of convenience and constant factors.

Canonical Examples

Example

Fibonacci Numbers

The naive recursion $F(n) = F(n-1) + F(n-2)$ has exponential time because it recomputes the same values. DP reduces this to $O(n)$ time and $O(1)$ space (keep only the last two values).

This is the simplest possible illustration: overlapping subproblems ( $F(n-2)$ is computed by both $F(n)$ and $F(n-1)$ ) and optimal substructure (trivially, since there is no optimization).

Example

0/1 Knapsack

Given $n$ items with values $v_i$ and weights $w_i$ , and capacity $W$ , maximize total value. Define $\text{dp}[i][w]$ = maximum value using items $1, \ldots, i$ with capacity $w$ . The recurrence is:

$\text{dp}[i][w] = \max(\text{dp}[i-1][w], \; v_i + \text{dp}[i-1][w - w_i])$

Time: $O(nW)$ . Space: $O(nW)$ , reducible to $O(W)$ since each row depends only on the previous row. Note: this is pseudo-polynomial because $W$ may be exponential in the input size.

Example

Longest Common Subsequence (LCS)

Given strings $X$ and $Y$ , find the length of their longest common subsequence. Define $\text{dp}[i][j]$ = LCS length of $X[1..i]$ and $Y[1..j]$ .

$\text{dp}[i][j] = \begin{cases} \text{dp}[i-1][j-1] + 1 & \text{if } X[i] = Y[j] \\ \max(\text{dp}[i-1][j], \, \text{dp}[i][j-1]) & \text{otherwise} \end{cases}$

Time and space: $O(mn)$ where $m = |X|$ , $n = |Y|$ .

Time-Space Tradeoffs

Many DP problems allow space reduction:

1D state depending on previous row only: reduce from $O(n \times W)$ to $O(W)$ by keeping only the current and previous rows (knapsack).
Hirschberg's trick: for LCS, compute the optimal value in $O(n)$ space but recover the actual subsequence in $O(mn)$ time using a divide-and-conquer over the DP table.
Knuth's optimization, divide-and-conquer optimization: for specific recurrence structures, reduce time from $O(n^3)$ to $O(n^2 \log n)$ or $O(n^2)$ .

Common Confusions

Watch Out

DP is not just recursion with memoization

Memoization is a technique for implementing DP, not the essence of it. The essence is the principle of optimality: the problem must decompose into subproblems whose optimal solutions combine into an overall optimal solution. If optimal substructure does not hold, memoizing a recursion gives you the right answers to subproblems but combining them does not give the right answer to the whole problem. The recursion must encode a valid decomposition.

Watch Out

Pseudo-polynomial is not polynomial

The 0/1 knapsack DP runs in $O(nW)$ time. This looks polynomial, but $W$ is a number in the input, not the size of the input. The input size is $O(n \log W)$ , so $O(nW) = O(n \cdot 2^{\log W})$ is exponential in the input size. Knapsack is NP-hard, and the DP is a pseudo-polynomial time algorithm. This distinction matters for complexity theory.

Watch Out

DP vs greedy vs divide-and-conquer

Greedy: make one choice, never reconsider. Works when matroid structure guarantees the local choice is globally safe.
Divide-and-conquer: split into disjoint subproblems, solve recursively, combine. No overlapping subproblems (e.g., merge sort).
DP: split into overlapping subproblems, solve each once, combine. Requires optimal substructure.

Memoization vs Tabulation

Both approaches implement DP but differ in traversal order and practical tradeoffs.

Top-down (memoization) writes the natural recursion, then intercepts repeat calls with a cache. In Python, this is a dictionary keyed by subproblem arguments; in systems code, it is a hash map or an array indexed by state. Only the subproblems actually needed are computed. For problems where most subproblems are irrelevant to the query (e.g., computing $F(n)$ for Fibonacci only touches $n+1$ states out of a much larger state space), memoization avoids wasted work.

Concrete structure for coin change:

def min_coins(amount, coins, memo={}):
    if amount in memo: return memo[amount]
    if amount == 0: return 0
    best = float('inf')
    for c in coins:
        if c <= amount:
            best = min(best, 1 + min_coins(amount - c, coins, memo))
    memo[amount] = best
    return best

Bottom-up (tabulation) determines the topological order of subproblems, then fills an array in that order. No recursion overhead; no stack depth limits. Space optimization is natural: if $\text{dp}[i]$ depends only on $\text{dp}[i-1]$ , keep just one row.

Concrete structure for the same problem:

def min_coins_bu(amount, coins):
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0
    for a in range(1, amount + 1):
        for c in coins:
            if c <= a:
                dp[a] = min(dp[a], 1 + dp[a - c])
    return dp[amount]

Key tradeoffs:

	Top-down (memoization)	Bottom-up (tabulation)
Traversal	Demand-driven, only needed states	All states in fixed order
Overhead	Recursion stack, hash lookup	Array indexing, predictable
Space opt.	Harder (cache must retain all states)	Easy (overwrite old rows)
Stack depth	Risky for large inputs (Python recursion limit)	None
Clarity	Matches the recurrence directly	Requires explicit ordering

Both have identical asymptotic complexity. The choice depends on whether most subproblems are needed (prefer bottom-up) and whether the ordering is hard to compute (prefer top-down).

The Bellman equations in reinforcement learning typically use bottom-up tabulation when the state space is small and explicit, and function approximation (not exact DP) when the state space is too large to enumerate.

Summary

DP requires two properties: optimal substructure and overlapping subproblems
The Bellman equation is the DP recursion for sequential decision problems
Top-down (memoization) and bottom-up (tabulation) match asymptotically when they visit the same subproblems; memoization can be asymptotically cheaper when the reachable state set is sparse
Space optimization is often possible by keeping only the previous layer
Pseudo-polynomial time (e.g., $O(nW)$ ) is not the same as polynomial time
The principle of optimality is the justification for why DP gives correct answers

Exercises

ExerciseCore

Problem

Write the DP recurrence for the minimum number of coins needed to make change for amount $A$ using coin denominations $c_1, \ldots, c_k$ . What is the time and space complexity?

ExerciseCore

Problem

The naive recursive Fibonacci has $O(2^n)$ calls. Explain precisely why memoization reduces this to $O(n)$ , and why bottom-up DP reduces space from $O(n)$ to $O(1)$ .

ExerciseAdvanced

Problem

The matrix chain multiplication problem asks for the optimal parenthesization of $A_1 A_2 \cdots A_n$ . Write the DP recurrence and give its time complexity. Why does greedy fail here?

References

Canonical:

Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 14-15 (DP foundations, LCS, matrix chain)
Bellman, Dynamic Programming (1957), Chapters 3-4 (principle of optimality, sequential decision processes)

RL Connection:

Sutton & Barto, Reinforcement Learning: An Introduction (2018), Chapters 3-4 (MDPs, Bellman equations, value iteration)

Space optimization:

Hirschberg, "A Linear Space Algorithm for Computing Maximal Common Subsequences" (1975)
Knuth, "Optimum Binary Search Trees" (1971) — $O(n^2)$ optimization for monotone recurrences

Accessible:

Kleinberg & Tardos, Algorithm Design, Chapter 6 (weighted interval scheduling, sequence alignment, RNA folding)

Advanced:

Bertsekas, Dynamic Programming and Optimal Control (2017), Vol. I, Chapters 1-2 (stochastic DP, infinite-horizon problems)

Next Topics

Greedy algorithms: the simpler paradigm for problems with matroid structure
Convex optimization basics: continuous optimization with similar decomposition ideas

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Sets, Functions, and Relationslayer 0A · tier 1
Graph Algorithms Essentialslayer 0A · tier 2
Greedy Algorithmslayer 0A · tier 2

Derived topics

2

Convex Optimization Basicslayer 1 · tier 1
Knapsack Problemlayer 0A · tier 2

Graph-backed continuations

Convex Optimization Basics Knapsack Problem