Algorithms Foundations
Dynamic Programming
Solve complex optimization problems by decomposing them into overlapping subproblems with optimal substructure. The algorithmic backbone of sequence models, control theory, and reinforcement learning.
Why This Matters
Dynamic programming is how you solve problems that greedy cannot touch. Whenever an optimization problem has optimal substructure and overlapping subproblems, DP gives you an efficient exact solution by trading space for time. Beyond algorithms courses, the Bellman equation underlying DP is the same Bellman equation that drives all of reinforcement learning.
If you understand DP deeply, you understand value iteration, policy iteration, and Q-learning at the algorithmic level.
Mental Model
Instead of solving a problem from scratch, ask: can I express the solution to this problem in terms of solutions to smaller versions of the same problem? If yes, solve all the small problems first (or cache them as you encounter them) and build up to the full solution.
The key difference from divide-and-conquer: subproblems overlap. In merge sort, the subproblems are disjoint. In DP, the same subproblem appears many times, so you solve it once and store the result.
Core Definitions
Optimal Substructure
A problem exhibits optimal substructure if an optimal solution to the whole problem contains within it optimal solutions to subproblems. Formally: if is optimal for problem , then the restriction of to any subproblem is optimal for .
Overlapping Subproblems
A recursive decomposition has overlapping subproblems if the same subproblem is encountered multiple times during the recursion. Without memoization, this causes exponential blowup. With memoization or bottom-up tabulation, each subproblem is solved exactly once.
Bellman Equation
The Bellman equation expresses the value of a state as the immediate reward plus the discounted value of the successor state under the optimal action:
This is the DP recursion for sequential decision-making under uncertainty.
Main Theorems
Bellman's Principle of Optimality
Statement
An optimal policy has the property that, regardless of the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
Intuition
If you are on the optimal path from A to C through B, then the sub-path from B to C must itself be optimal. If it were not, you could replace it with a better sub-path and improve the whole solution, contradicting optimality.
Proof Sketch
By contradiction. Suppose the overall solution is optimal but its restriction to some subproblem is not. Then replacing the subproblem solution with a better one yields a strictly better overall solution, contradicting the optimality of .
Why It Matters
This principle is the justification for DP. Without it, solving subproblems optimally and combining them would not yield a globally optimal solution. It is also the foundation of the Bellman equation in RL: the value of a state depends only on optimal behavior from that state onward.
Failure Mode
The principle fails when decisions interact in ways that prevent decomposition. For example, in problems with global constraints that couple all decisions (certain scheduling problems with resource sharing), optimal substructure may not hold.
Top-Down vs Bottom-Up
Top-down (memoization): Write the natural recursion, then add a cache. You only solve subproblems that are actually needed. Easier to code but has recursion overhead and potential stack depth issues.
Bottom-up (tabulation): Determine the order of subproblems, fill in a table from smallest to largest. No recursion overhead. Enables space optimization when you only need the previous row of the table.
Both approaches have the same asymptotic time complexity. The choice is usually a matter of convenience and constant factors.
Canonical Examples
Fibonacci Numbers
The naive recursion has exponential time because it recomputes the same values. DP reduces this to time and space (keep only the last two values).
This is the simplest possible illustration: overlapping subproblems ( is computed by both and ) and optimal substructure (trivially, since there is no optimization).
0/1 Knapsack
Given items with values and weights , and capacity , maximize total value. Define = maximum value using items with capacity . The recurrence is:
Time: . Space: , reducible to since each row depends only on the previous row. Note: this is pseudo-polynomial because may be exponential in the input size.
Longest Common Subsequence (LCS)
Given strings and , find the length of their longest common subsequence. Define = LCS length of and .
Time and space: where , .
Time-Space Tradeoffs
Many DP problems allow space reduction:
- 1D state depending on previous row only: reduce from to by keeping only the current and previous rows (knapsack).
- Hirschberg's trick: for LCS, compute the optimal value in space but recover the actual subsequence in time using a divide-and-conquer over the DP table.
- Knuth's optimization, divide-and-conquer optimization: for specific recurrence structures, reduce time from to or .
Common Confusions
DP is not just recursion with memoization
Memoization is a technique for implementing DP, not the essence of it. The essence is the principle of optimality: the problem must decompose into subproblems whose optimal solutions combine into an overall optimal solution. If optimal substructure does not hold, memoizing a recursion gives you the right answers to subproblems but combining them does not give the right answer to the whole problem. The recursion must encode a valid decomposition.
Pseudo-polynomial is not polynomial
The 0/1 knapsack DP runs in time. This looks polynomial, but is a number in the input, not the size of the input. The input size is , so is exponential in the input size. Knapsack is NP-hard, and the DP is a pseudo-polynomial time algorithm. This distinction matters for complexity theory.
DP vs greedy vs divide-and-conquer
- Greedy: make one choice, never reconsider. Works when matroid structure guarantees the local choice is globally safe.
- Divide-and-conquer: split into disjoint subproblems, solve recursively, combine. No overlapping subproblems (e.g., merge sort).
- DP: split into overlapping subproblems, solve each once, combine. Requires optimal substructure.
Summary
- DP requires two properties: optimal substructure and overlapping subproblems
- The Bellman equation is the DP recursion for sequential decision problems
- Top-down (memoization) and bottom-up (tabulation) are equivalent in theory
- Space optimization is often possible by keeping only the previous layer
- Pseudo-polynomial time (e.g., ) is not the same as polynomial time
- The principle of optimality is the justification for why DP gives correct answers
Exercises
Problem
Write the DP recurrence for the minimum number of coins needed to make change for amount using coin denominations . What is the time and space complexity?
Problem
The naive recursive Fibonacci has calls. Explain precisely why memoization reduces this to , and why bottom-up DP reduces space from to .
Problem
The matrix chain multiplication problem asks for the optimal parenthesization of . Write the DP recurrence and give its time complexity. Why does greedy fail here?
References
Canonical:
- Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapter 14-15
- Bellman, Dynamic Programming (1957)
RL Connection:
- Sutton & Barto, Reinforcement Learning: An Introduction (2018), Chapters 3-4
Accessible:
- Kleinberg & Tardos, Algorithm Design, Chapter 6
Next Topics
- Greedy algorithms: the simpler paradigm for problems with matroid structure
- Convex optimization basics: continuous optimization with similar decomposition ideas
Last reviewed: April 2026
Builds on This
- Knapsack ProblemLayer 0A