Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Decision Theory

Bounded Rationality

Real agents optimize under limited information, limited compute, and limited foresight. Simon's satisficing, heuristics, and the implications for search, planning, and agent design in ML.

CoreTier 1Stable~45 min

Why This Matters

Classical decision theory assumes agents can enumerate all actions, compute all consequences, and select the optimal one. This assumption is false for every real system. A chess player cannot evaluate all 1012010^{120} possible games. A language model cannot score all possible continuations. A robot cannot plan over an infinite horizon. The computational cost of finding the optimum exceeds any real agent's budget.

Herbert Simon introduced bounded rationality in the 1950s to replace the fiction of perfect optimization with a realistic account of what agents actually do: search until they find something good enough, then stop. This is not a failure mode. It is the only feasible strategy when optimization is computationally intractable.

Every ML system that uses beam search, early stopping, pruning, or sampling with a temperature parameter is implementing a form of bounded rationality. Understanding the theory tells you when these shortcuts are justified, when they fail, and how to design better ones.

The Critique of Perfect Rationality

Classical rational choice theory (Savage, von Neumann-Morgenstern) assumes:

  1. The agent knows all available alternatives.
  2. The agent can compute the consequences of each alternative.
  3. The agent has a complete, transitive preference ordering over consequences.
  4. The agent selects the alternative that maximizes expected utility.

Simon's critique targets assumptions (1) and (2). In any nontrivial environment, the set of alternatives is too large to enumerate, and the consequences of each alternative are too expensive to compute. A chess player does not maximize over all strategies. A firm does not compute the profit-maximizing price to arbitrary precision. An LLM does not score all possible token sequences.

The question is not whether real agents are rational. The question is: what does rational behavior look like when computation itself is a scarce resource?

Satisficing

Definition

Satisficing

An agent satisfices if it searches through alternatives and selects the first one that meets or exceeds an aspiration level τ\tau. Formally, given a set of alternatives A\mathcal{A}, a utility function u:ARu: \mathcal{A} \to \mathbb{R}, and a threshold τ\tau, the agent selects:

a=first aA encountered such that u(a)τa^* = \text{first } a \in \mathcal{A} \text{ encountered such that } u(a) \geq \tau

The agent does not compare aa^* to all other alternatives. It stops searching as soon as it finds one that is good enough.

Definition

Aspiration Level

The aspiration level τ\tau is the threshold that separates satisfactory from unsatisfactory alternatives. Simon proposed that aspiration levels are not fixed. They adapt based on experience: if the agent easily finds alternatives above τ\tau, it raises τ\tau. If it repeatedly fails to find satisfactory alternatives, it lowers τ\tau.

Definition

Search Cost

Let c>0c > 0 be the cost of evaluating one additional alternative. The agent faces a tradeoff: searching more increases the expected quality of the selected alternative but incurs additional cost. The optimal stopping problem balances the marginal benefit of continued search against the marginal cost cc.

The Satisficing Threshold Theorem

Theorem

Optimal Satisficing Threshold

Statement

Let alternatives be drawn i.i.d. from a distribution with CDF FF and PDF ff on [0,1][0, 1]. Each evaluation costs c>0c > 0. The agent uses a satisficing strategy: evaluate alternatives sequentially and accept the first with utility uτu \geq \tau. The expected net payoff of this strategy is:

V(τ)=E[uuτ](1F(τ))c1F(τ)=E[uuτ]c1F(τ)V(\tau) = \frac{\mathbb{E}[u \mid u \geq \tau] \cdot (1 - F(\tau)) - c}{1 - F(\tau)} = \mathbb{E}[u \mid u \geq \tau] - \frac{c}{1 - F(\tau)}

The optimal threshold τ\tau^* satisfies:

τ=E[uuτ]c1F(τ)\tau^* = \mathbb{E}[u \mid u \geq \tau^*] - \frac{c}{1 - F(\tau^*)}

This is the reservation value equation: the threshold equals the expected net payoff of continuing to search.

Intuition

The agent should set τ\tau so that accepting any alternative at exactly the threshold gives the same expected payoff as rejecting it and continuing to search. If the threshold is too low, the agent accepts mediocre alternatives too quickly. If the threshold is too high, the agent wastes too much on search costs before finding anything acceptable. The optimal threshold balances these two errors.

Proof Sketch

The expected number of evaluations before finding uτu \geq \tau is 1/(1F(τ))1/(1-F(\tau)), since each draw exceeds τ\tau with probability 1F(τ)1 - F(\tau) (geometric distribution). The expected total search cost is c/(1F(τ))c/(1-F(\tau)). The expected value of the accepted alternative is E[uuτ]\mathbb{E}[u \mid u \geq \tau]. The net payoff is the difference. Setting dV/dτ=0dV/d\tau = 0 and rearranging gives the reservation value equation. The fixed-point characterization follows from the envelope condition of the optimal stopping problem.

Why It Matters

This theorem formalizes Simon's intuition that satisficing is not irrational but is instead the rational response to search costs. The optimal threshold depends on the distribution of alternatives and the cost of search. When search is cheap (c0c \to 0), the threshold rises toward the maximum of FF. When search is expensive, the threshold drops, and the agent accepts lower quality. This maps directly to ML: beam width in search, early stopping patience, and sampling temperature all control an implicit satisficing threshold.

Failure Mode

The theorem assumes the agent knows the distribution FF. In practice, agents must estimate FF from experience, which introduces a learning problem on top of the search problem. The theorem also assumes i.i.d. draws. If alternatives arrive in a correlated or adversarial order (as in adversarial search or game trees), the analysis changes substantially. The optimal stopping literature (secretary problem, prophet inequalities) addresses some of these extensions.

Computational Complexity as a Foundation

Simon's argument gains formal teeth from computational complexity theory.

Definition

Intractability as Bounded Rationality Justification

A decision problem is NP-hard if no known algorithm solves all instances in time polynomial in the input size. Many optimization problems faced by real agents are NP-hard: traveling salesman, satisfiability, protein folding, optimal neural architecture search. For these problems, perfect optimization is not merely difficult; it is (under standard conjectures) provably infeasible for large instances.

When the optimization problem is NP-hard, satisficing is not a cognitive bias. It is the only available strategy. The question becomes: which heuristics achieve good approximation ratios, and under what conditions?

This connects bounded rationality to the theory of approximation algorithms. A heuristic that finds a solution within a factor α\alpha of optimal in polynomial time is a formal model of satisficing with a quantifiable performance guarantee.

Heuristics as Rational Strategies

Definition

Fast-and-Frugal Heuristic

A fast-and-frugal heuristic (Gigerenzer) is a decision rule that uses limited information and computation to achieve good performance in a specific environment. Examples:

  • Take-the-best: rank cues by validity, choose the first cue that discriminates between alternatives, and decide based on that cue alone.
  • 1/N rule: allocate resources equally among NN options, ignoring all information about correlations and expected returns.
  • Recognition heuristic: if you recognize one option but not the other, choose the recognized one.

These heuristics ignore most available information. They succeed because ignoring irrelevant information reduces overfitting to noise, especially in small-sample environments.

The 1/N1/N portfolio allocation rule illustrates a key insight. DeMiguel, Garlappi, and Uppal (2009) showed that 1/N1/N outperforms mean-variance optimization on real financial data when the estimation window is short. Mean-variance optimization is the theoretically optimal strategy, but it requires estimating O(N2)O(N^2) covariance parameters from limited data. The estimation error exceeds the optimization gain. The "naive" heuristic wins because it has zero estimation error.

Aspiration Adaptation

Definition

Aspiration Adaptation

Aspiration adaptation is the process by which an agent adjusts its satisficing threshold τ\tau over time based on the outcomes of past searches. If the agent consistently finds alternatives above τ\tau, it raises the threshold. If it consistently fails, it lowers it. Formally:

τt+1=τt+η(utτt)\tau_{t+1} = \tau_t + \eta \cdot (u_t - \tau_t)

where utu_t is the utility of the best alternative found at time tt and η(0,1)\eta \in (0, 1) is a learning rate. This is an exponential moving average of past experience.

This adaptation rule resembles learning rate schedules in optimization. A high initial τ\tau (ambitious aspiration) corresponds to aggressive early stopping. A gradually decreasing τ\tau corresponds to increasing patience. The analogy is precise: both aspiration adaptation and learning rate decay manage the tradeoff between exploration cost and solution quality.

Sciences of the Artificial

Simon's broader framework, developed in The Sciences of the Artificial (1969), considers any system that is designed to achieve goals within constraints imposed by an environment. The key distinction:

  • Natural sciences study what is: the laws governing physical systems.
  • Sciences of the artificial study what could be: the space of possible designs, given goals and constraints.

An ML model is an artifact in Simon's sense. It is designed (trained) to achieve a goal (minimize loss) within constraints (architecture, computation budget, data). Understanding the model requires understanding both the goal and the constraints, not just the final weights. Two models with identical weights but different training constraints may generalize differently because the constraints shape the optimization trajectory.

Connections to ML

Beam Search as Satisficing

Beam search in sequence generation keeps only the top kk candidates at each step, discarding the rest. This is satisficing: the agent searches a restricted set and accepts the best within that set, rather than searching the full exponential space of all possible sequences. The beam width kk controls the tradeoff between search thoroughness and computational cost.

Early Stopping as Bounded Optimization

Early stopping halts training when validation loss stops improving. The agent could continue training (potentially finding a lower training loss) but accepts the current solution as good enough. The patience parameter is the aspiration adaptation rate: how many epochs of non-improvement the agent tolerates before stopping.

Pruning as Resource-Bounded Rationality

Alpha-beta pruning in game trees eliminates branches that cannot affect the final decision. This reduces the effective branching factor from bb to approximately b1/2b^{1/2} (with perfect move ordering). The pruned agent makes the same decision as the unpruned agent but uses far less computation. This is bounded rationality without any loss in decision quality.

MCTS as Heuristic Search

Monte Carlo Tree Search (MCTS) combines tree search with random rollouts. Instead of evaluating all nodes (infeasible in large game trees), it allocates evaluation budget to promising subtrees using the UCB1 criterion. This is a formal version of satisficing: the agent allocates its bounded computational budget where it expects the highest information gain.

LLM Inference and Temperature

Sampling from an LLM with temperature TT controls the breadth of search. At T=0T = 0 (greedy decoding), the model selects the single highest-probability token: maximally satisficing. At high TT, the model explores a broader set of alternatives. The temperature parameter is an explicit satisficing threshold in the probability simplex.

Agent Tool Selection

When an LLM agent must choose which tool to call, it faces a search problem: evaluate available tools, estimate which one best serves the current subgoal, and select. With many tools, exhaustive evaluation is infeasible. Practical agents use heuristics: keyword matching, retrieval over tool descriptions, or learned routing functions. These are fast-and-frugal heuristics applied to tool selection.

Common Confusions

Watch Out

Bounded rationality does not mean irrationality

Bounded rationality does not mean irrationality. It means rational behavior under real computational and informational constraints. Simon's agents are not stupid; they are smart about being resource-limited. A satisficing agent that finds a good solution in polynomial time outperforms a "perfectly rational" agent that cannot finish computing the optimum before the deadline. The bounded agent makes a better decision in practice because it makes a decision.

Watch Out

Satisficing is not laziness

Satisficing is not the claim that agents should accept the first option they see. The threshold τ\tau matters. A well-calibrated satisficer with a high threshold and low search costs will find near-optimal solutions. Satisficing becomes suboptimal only when the threshold is poorly calibrated or when the search cost model is wrong. The theory prescribes how to set τ\tau optimally given the search cost structure.

Watch Out

Heuristics do not always underperform optimal strategies

The bias-variance tradeoff applies to decision strategies, not just estimators. An optimal strategy estimated from limited data has high variance. A simple heuristic has high bias but zero variance. In small-sample regimes, the heuristic often wins. Gigerenzer's program shows this empirically across dozens of domains. The question is not "is the heuristic optimal?" but "does the heuristic outperform the estimated optimum in practice?"

Exercises

ExerciseCore

Problem

Alternatives are drawn uniformly from [0,1][0, 1] (so F(u)=uF(u) = u and f(u)=1f(u) = 1). The search cost is c=0.01c = 0.01 per evaluation. Using the satisficing threshold theorem, find the optimal threshold τ\tau^*.

ExerciseCore

Problem

A chess engine has 0.5 seconds to choose a move. It can evaluate 10610^6 positions per second. The game tree has branching factor b=35b = 35 and the engine uses minimax search. What is the maximum search depth the engine can explore exhaustively? If instead it uses alpha-beta pruning with perfect move ordering (effective branching factor b\sqrt{b}), what depth can it reach? Relate this to bounded rationality.

ExerciseAdvanced

Problem

The 1/N1/N portfolio rule allocates wealth equally across NN assets. Mean-variance optimization selects weights w=Σ1μ/1Σ1μw^* = \Sigma^{-1}\mu / \mathbf{1}^\top \Sigma^{-1}\mu where μ\mu is the vector of expected returns and Σ\Sigma is the covariance matrix. With TT time periods of return data, the estimation error in Σ^\hat{\Sigma} scales as O(N2/T)O(N^2/T). How large must TT be relative to NN for mean-variance optimization to outperform 1/N1/N? State the qualitative answer and explain why this illustrates bounded rationality.

ExerciseResearch

Problem

Design a formal model of an LLM agent that must decide when to stop "thinking" (generating chain-of-thought tokens) and commit to an answer. The agent pays cost cc per token generated and receives reward r(q)r(q) where qq is the quality of the final answer (a function of the reasoning trace). Formulate this as an optimal stopping problem. Under what conditions does the optimal policy have a threshold structure (stop when estimated answer quality exceeds some τ\tau)?

References

Canonical:

  • Simon, Models of Bounded Rationality, Vols. 1-2 (1982), Chapters 1-2, 7-8
  • Simon, The Sciences of the Artificial (3rd ed., 1996), Chapters 1-4

Heuristics and ecological rationality:

  • Gigerenzer, Todd, and the ABC Research Group, Simple Heuristics That Make Us Smart (1999), Chapters 1-4
  • Gigerenzer and Selten (eds.), Bounded Rationality: The Adaptive Toolbox (2001), Chapters 1-3

Formal connections:

  • DeMiguel, Garlappi, and Uppal, "Optimal Versus Naive Diversification," Review of Financial Studies 22(5), 2009
  • Papadimitriou and Yannakakis, "On the Approximability of Trade-offs and Optimal Access of Web Sources," FOCS 2000

Search and stopping theory:

  • DeGroot, Optimal Statistical Decisions (1970), Chapters 12-13
  • Ferguson, Optimal Stopping and Applications (2006), Chapters 1-2

Next Topics

Natural extensions from bounded rationality:

  • Game theory: strategic interaction between agents, each of whom is boundedly rational
  • Expected utility: the idealized framework that bounded rationality critiques and refines

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics