Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Algorithms Foundations

Dynamic Programming

Solve complex optimization problems by decomposing them into overlapping subproblems with optimal substructure. The algorithmic backbone of sequence models, control theory, and reinforcement learning.

CoreTier 1Stable~50 min

Why This Matters

Dynamic programming is how you solve problems that greedy cannot touch. Whenever an optimization problem has optimal substructure and overlapping subproblems, DP gives you an efficient exact solution by trading space for time. Beyond algorithms courses, the Bellman equation underlying DP is the same Bellman equation that drives all of reinforcement learning.

If you understand DP deeply, you understand value iteration, policy iteration, and Q-learning at the algorithmic level.

Naive recursion: O(2ⁿ) callsF(5)F(4)F(3)F(2)F(2)F(1)F(1)F(0)F(3)F(2)F(1)DP table: O(n) computationsnF(n)001121324355Fill left to right, each cell computed onceF(n) = F(n-1) + F(n-2)Each subproblem solved exactly oncevs. exponential recomputation in naive treeUnique callRedundant call (overlap)

Mental Model

Instead of solving a problem from scratch, ask: can I express the solution to this problem in terms of solutions to smaller versions of the same problem? If yes, solve all the small problems first (or cache them as you encounter them) and build up to the full solution.

The key difference from divide-and-conquer: subproblems overlap. In merge sort, the subproblems are disjoint. In DP, the same subproblem appears many times, so you solve it once and store the result.

Core Definitions

Definition

Optimal Substructure

A problem exhibits optimal substructure if an optimal solution to the whole problem contains within it optimal solutions to subproblems. Formally: if xx^* is optimal for problem PP, then the restriction of xx^* to any subproblem PP' is optimal for PP'.

Definition

Overlapping Subproblems

A recursive decomposition has overlapping subproblems if the same subproblem is encountered multiple times during the recursion. Without memoization, this causes exponential blowup. With memoization or bottom-up tabulation, each subproblem is solved exactly once.

Definition

Bellman Equation

The Bellman equation expresses the value of a state as the immediate reward plus the discounted value of the successor state under the optimal action:

V(s)=maxa[r(s,a)+γsP(ss,a)V(s)]V^*(s) = \max_{a} \left[ r(s, a) + \gamma \sum_{s'} P(s' \mid s, a) \, V^*(s') \right]

This is the DP recursion for sequential decision-making under uncertainty.

Main Theorems

Theorem

Bellman's Principle of Optimality

Statement

An optimal policy has the property that, regardless of the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Intuition

If you are on the optimal path from A to C through B, then the sub-path from B to C must itself be optimal. If it were not, you could replace it with a better sub-path and improve the whole solution, contradicting optimality.

Proof Sketch

By contradiction. Suppose the overall solution xx^* is optimal but its restriction to some subproblem is not. Then replacing the subproblem solution with a better one yields a strictly better overall solution, contradicting the optimality of xx^*.

Why It Matters

This principle is the justification for DP. Without it, solving subproblems optimally and combining them would not yield a globally optimal solution. It is also the foundation of the Bellman equation in RL: the value of a state depends only on optimal behavior from that state onward.

Failure Mode

The principle fails when decisions interact in ways that prevent decomposition. For example, in problems with global constraints that couple all decisions (certain scheduling problems with resource sharing), optimal substructure may not hold.

Top-Down vs Bottom-Up

Top-down (memoization): Write the natural recursion, then add a cache. You only solve subproblems that are actually needed. Easier to code but has recursion overhead and potential stack depth issues.

Bottom-up (tabulation): Determine the order of subproblems, fill in a table from smallest to largest. No recursion overhead. Enables space optimization when you only need the previous row of the table.

Both approaches have the same asymptotic time complexity. The choice is usually a matter of convenience and constant factors.

Canonical Examples

Example

Fibonacci Numbers

The naive recursion F(n)=F(n1)+F(n2)F(n) = F(n-1) + F(n-2) has exponential time because it recomputes the same values. DP reduces this to O(n)O(n) time and O(1)O(1) space (keep only the last two values).

This is the simplest possible illustration: overlapping subproblems (F(n2)F(n-2) is computed by both F(n)F(n) and F(n1)F(n-1)) and optimal substructure (trivially, since there is no optimization).

Example

0/1 Knapsack

Given nn items with values viv_i and weights wiw_i, and capacity WW, maximize total value. Define dp[i][w]\text{dp}[i][w] = maximum value using items 1,,i1, \ldots, i with capacity ww. The recurrence is:

dp[i][w]=max(dp[i1][w],  vi+dp[i1][wwi])\text{dp}[i][w] = \max(\text{dp}[i-1][w], \; v_i + \text{dp}[i-1][w - w_i])

Time: O(nW)O(nW). Space: O(nW)O(nW), reducible to O(W)O(W) since each row depends only on the previous row. Note: this is pseudo-polynomial because WW may be exponential in the input size.

Example

Longest Common Subsequence (LCS)

Given strings XX and YY, find the length of their longest common subsequence. Define dp[i][j]\text{dp}[i][j] = LCS length of X[1..i]X[1..i] and Y[1..j]Y[1..j].

dp[i][j]={dp[i1][j1]+1if X[i]=Y[j]max(dp[i1][j],dp[i][j1])otherwise\text{dp}[i][j] = \begin{cases} \text{dp}[i-1][j-1] + 1 & \text{if } X[i] = Y[j] \\ \max(\text{dp}[i-1][j], \, \text{dp}[i][j-1]) & \text{otherwise} \end{cases}

Time and space: O(mn)O(mn) where m=Xm = |X|, n=Yn = |Y|.

Time-Space Tradeoffs

Many DP problems allow space reduction:

  • 1D state depending on previous row only: reduce from O(n×W)O(n \times W) to O(W)O(W) by keeping only the current and previous rows (knapsack).
  • Hirschberg's trick: for LCS, compute the optimal value in O(n)O(n) space but recover the actual subsequence in O(mn)O(mn) time using a divide-and-conquer over the DP table.
  • Knuth's optimization, divide-and-conquer optimization: for specific recurrence structures, reduce time from O(n3)O(n^3) to O(n2logn)O(n^2 \log n) or O(n2)O(n^2).

Common Confusions

Watch Out

DP is not just recursion with memoization

Memoization is a technique for implementing DP, not the essence of it. The essence is the principle of optimality: the problem must decompose into subproblems whose optimal solutions combine into an overall optimal solution. If optimal substructure does not hold, memoizing a recursion gives you the right answers to subproblems but combining them does not give the right answer to the whole problem. The recursion must encode a valid decomposition.

Watch Out

Pseudo-polynomial is not polynomial

The 0/1 knapsack DP runs in O(nW)O(nW) time. This looks polynomial, but WW is a number in the input, not the size of the input. The input size is O(nlogW)O(n \log W), so O(nW)=O(n2logW)O(nW) = O(n \cdot 2^{\log W}) is exponential in the input size. Knapsack is NP-hard, and the DP is a pseudo-polynomial time algorithm. This distinction matters for complexity theory.

Watch Out

DP vs greedy vs divide-and-conquer

  • Greedy: make one choice, never reconsider. Works when matroid structure guarantees the local choice is globally safe.
  • Divide-and-conquer: split into disjoint subproblems, solve recursively, combine. No overlapping subproblems (e.g., merge sort).
  • DP: split into overlapping subproblems, solve each once, combine. Requires optimal substructure.

Summary

  • DP requires two properties: optimal substructure and overlapping subproblems
  • The Bellman equation is the DP recursion for sequential decision problems
  • Top-down (memoization) and bottom-up (tabulation) are equivalent in theory
  • Space optimization is often possible by keeping only the previous layer
  • Pseudo-polynomial time (e.g., O(nW)O(nW)) is not the same as polynomial time
  • The principle of optimality is the justification for why DP gives correct answers

Exercises

ExerciseCore

Problem

Write the DP recurrence for the minimum number of coins needed to make change for amount AA using coin denominations c1,,ckc_1, \ldots, c_k. What is the time and space complexity?

ExerciseCore

Problem

The naive recursive Fibonacci has O(2n)O(2^n) calls. Explain precisely why memoization reduces this to O(n)O(n), and why bottom-up DP reduces space from O(n)O(n) to O(1)O(1).

ExerciseAdvanced

Problem

The matrix chain multiplication problem asks for the optimal parenthesization of A1A2AnA_1 A_2 \cdots A_n. Write the DP recurrence and give its time complexity. Why does greedy fail here?

References

Canonical:

  • Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapter 14-15
  • Bellman, Dynamic Programming (1957)

RL Connection:

  • Sutton & Barto, Reinforcement Learning: An Introduction (2018), Chapters 3-4

Accessible:

  • Kleinberg & Tardos, Algorithm Design, Chapter 6

Next Topics

Last reviewed: April 2026

Builds on This

Next Topics