Decision Theory
Bounded Rationality
Real agents optimize under limited information, limited compute, and limited foresight. Simon's satisficing, heuristics, and the implications for search, planning, and agent design in ML.
Why This Matters
Classical decision theory assumes agents can enumerate all actions, compute all consequences, and select the optimal one. This assumption is false for every real system. A chess player cannot evaluate all possible games. A language model cannot score all possible continuations. A robot cannot plan over an infinite horizon. The computational cost of finding the optimum exceeds any real agent's budget.
Herbert Simon introduced bounded rationality in the 1950s to replace the fiction of perfect optimization with a realistic account of what agents actually do: search until they find something good enough, then stop. This is not a failure mode. It is the only feasible strategy when optimization is computationally intractable.
Every ML system that uses beam search, early stopping, pruning, or sampling with a temperature parameter is implementing a form of bounded rationality. Understanding the theory tells you when these shortcuts are justified, when they fail, and how to design better ones.
The Critique of Perfect Rationality
Classical rational choice theory (Savage, von Neumann-Morgenstern) assumes:
- The agent knows all available alternatives.
- The agent can compute the consequences of each alternative.
- The agent has a complete, transitive preference ordering over consequences.
- The agent selects the alternative that maximizes expected utility.
Simon's critique targets assumptions (1) and (2). In any nontrivial environment, the set of alternatives is too large to enumerate, and the consequences of each alternative are too expensive to compute. A chess player does not maximize over all strategies. A firm does not compute the profit-maximizing price to arbitrary precision. An LLM does not score all possible token sequences.
The question is not whether real agents are rational. The question is: what does rational behavior look like when computation itself is a scarce resource?
Satisficing
Satisficing
An agent satisfices if it searches through alternatives and selects the first one that meets or exceeds an aspiration level . Formally, given a set of alternatives , a utility function , and a threshold , the agent selects:
The agent does not compare to all other alternatives. It stops searching as soon as it finds one that is good enough.
Aspiration Level
The aspiration level is the threshold that separates satisfactory from unsatisfactory alternatives. Simon proposed that aspiration levels are not fixed. They adapt based on experience: if the agent easily finds alternatives above , it raises . If it repeatedly fails to find satisfactory alternatives, it lowers .
Search Cost
Let be the cost of evaluating one additional alternative. The agent faces a tradeoff: searching more increases the expected quality of the selected alternative but incurs additional cost. The optimal stopping problem balances the marginal benefit of continued search against the marginal cost .
The Satisficing Threshold Theorem
Optimal Satisficing Threshold
Statement
Let alternatives be drawn i.i.d. from a distribution with CDF and PDF on . Each evaluation costs . The agent uses a satisficing strategy: evaluate alternatives sequentially and accept the first with utility . The expected net payoff of this strategy is:
The optimal threshold satisfies:
This is the reservation value equation: the threshold equals the expected net payoff of continuing to search.
Intuition
The agent should set so that accepting any alternative at exactly the threshold gives the same expected payoff as rejecting it and continuing to search. If the threshold is too low, the agent accepts mediocre alternatives too quickly. If the threshold is too high, the agent wastes too much on search costs before finding anything acceptable. The optimal threshold balances these two errors.
Proof Sketch
The expected number of evaluations before finding is , since each draw exceeds with probability (geometric distribution). The expected total search cost is . The expected value of the accepted alternative is . The net payoff is the difference. Setting and rearranging gives the reservation value equation. The fixed-point characterization follows from the envelope condition of the optimal stopping problem.
Why It Matters
This theorem formalizes Simon's intuition that satisficing is not irrational but is instead the rational response to search costs. The optimal threshold depends on the distribution of alternatives and the cost of search. When search is cheap (), the threshold rises toward the maximum of . When search is expensive, the threshold drops, and the agent accepts lower quality. This maps directly to ML: beam width in search, early stopping patience, and sampling temperature all control an implicit satisficing threshold.
Failure Mode
The theorem assumes the agent knows the distribution . In practice, agents must estimate from experience, which introduces a learning problem on top of the search problem. The theorem also assumes i.i.d. draws. If alternatives arrive in a correlated or adversarial order (as in adversarial search or game trees), the analysis changes substantially. The optimal stopping literature (secretary problem, prophet inequalities) addresses some of these extensions.
Computational Complexity as a Foundation
Simon's argument gains formal teeth from computational complexity theory.
Intractability as Bounded Rationality Justification
A decision problem is NP-hard if no known algorithm solves all instances in time polynomial in the input size. Many optimization problems faced by real agents are NP-hard: traveling salesman, satisfiability, protein folding, optimal neural architecture search. For these problems, perfect optimization is not merely difficult; it is (under standard conjectures) provably infeasible for large instances.
When the optimization problem is NP-hard, satisficing is not a cognitive bias. It is the only available strategy. The question becomes: which heuristics achieve good approximation ratios, and under what conditions?
This connects bounded rationality to the theory of approximation algorithms. A heuristic that finds a solution within a factor of optimal in polynomial time is a formal model of satisficing with a quantifiable performance guarantee.
Heuristics as Rational Strategies
Fast-and-Frugal Heuristic
A fast-and-frugal heuristic (Gigerenzer) is a decision rule that uses limited information and computation to achieve good performance in a specific environment. Examples:
- Take-the-best: rank cues by validity, choose the first cue that discriminates between alternatives, and decide based on that cue alone.
- 1/N rule: allocate resources equally among options, ignoring all information about correlations and expected returns.
- Recognition heuristic: if you recognize one option but not the other, choose the recognized one.
These heuristics ignore most available information. They succeed because ignoring irrelevant information reduces overfitting to noise, especially in small-sample environments.
The portfolio allocation rule illustrates a key insight. DeMiguel, Garlappi, and Uppal (2009) showed that outperforms mean-variance optimization on real financial data when the estimation window is short. Mean-variance optimization is the theoretically optimal strategy, but it requires estimating covariance parameters from limited data. The estimation error exceeds the optimization gain. The "naive" heuristic wins because it has zero estimation error.
Aspiration Adaptation
Aspiration Adaptation
Aspiration adaptation is the process by which an agent adjusts its satisficing threshold over time based on the outcomes of past searches. If the agent consistently finds alternatives above , it raises the threshold. If it consistently fails, it lowers it. Formally:
where is the utility of the best alternative found at time and is a learning rate. This is an exponential moving average of past experience.
This adaptation rule resembles learning rate schedules in optimization. A high initial (ambitious aspiration) corresponds to aggressive early stopping. A gradually decreasing corresponds to increasing patience. The analogy is precise: both aspiration adaptation and learning rate decay manage the tradeoff between exploration cost and solution quality.
Sciences of the Artificial
Simon's broader framework, developed in The Sciences of the Artificial (1969), considers any system that is designed to achieve goals within constraints imposed by an environment. The key distinction:
- Natural sciences study what is: the laws governing physical systems.
- Sciences of the artificial study what could be: the space of possible designs, given goals and constraints.
An ML model is an artifact in Simon's sense. It is designed (trained) to achieve a goal (minimize loss) within constraints (architecture, computation budget, data). Understanding the model requires understanding both the goal and the constraints, not just the final weights. Two models with identical weights but different training constraints may generalize differently because the constraints shape the optimization trajectory.
Connections to ML
Beam Search as Satisficing
Beam search in sequence generation keeps only the top candidates at each step, discarding the rest. This is satisficing: the agent searches a restricted set and accepts the best within that set, rather than searching the full exponential space of all possible sequences. The beam width controls the tradeoff between search thoroughness and computational cost.
Early Stopping as Bounded Optimization
Early stopping halts training when validation loss stops improving. The agent could continue training (potentially finding a lower training loss) but accepts the current solution as good enough. The patience parameter is the aspiration adaptation rate: how many epochs of non-improvement the agent tolerates before stopping.
Pruning as Resource-Bounded Rationality
Alpha-beta pruning in game trees eliminates branches that cannot affect the final decision. This reduces the effective branching factor from to approximately (with perfect move ordering). The pruned agent makes the same decision as the unpruned agent but uses far less computation. This is bounded rationality without any loss in decision quality.
MCTS as Heuristic Search
Monte Carlo Tree Search (MCTS) combines tree search with random rollouts. Instead of evaluating all nodes (infeasible in large game trees), it allocates evaluation budget to promising subtrees using the UCB1 criterion. This is a formal version of satisficing: the agent allocates its bounded computational budget where it expects the highest information gain.
LLM Inference and Temperature
Sampling from an LLM with temperature controls the breadth of search. At (greedy decoding), the model selects the single highest-probability token: maximally satisficing. At high , the model explores a broader set of alternatives. The temperature parameter is an explicit satisficing threshold in the probability simplex.
Agent Tool Selection
When an LLM agent must choose which tool to call, it faces a search problem: evaluate available tools, estimate which one best serves the current subgoal, and select. With many tools, exhaustive evaluation is infeasible. Practical agents use heuristics: keyword matching, retrieval over tool descriptions, or learned routing functions. These are fast-and-frugal heuristics applied to tool selection.
Common Confusions
Bounded rationality does not mean irrationality
Bounded rationality does not mean irrationality. It means rational behavior under real computational and informational constraints. Simon's agents are not stupid; they are smart about being resource-limited. A satisficing agent that finds a good solution in polynomial time outperforms a "perfectly rational" agent that cannot finish computing the optimum before the deadline. The bounded agent makes a better decision in practice because it makes a decision.
Satisficing is not laziness
Satisficing is not the claim that agents should accept the first option they see. The threshold matters. A well-calibrated satisficer with a high threshold and low search costs will find near-optimal solutions. Satisficing becomes suboptimal only when the threshold is poorly calibrated or when the search cost model is wrong. The theory prescribes how to set optimally given the search cost structure.
Heuristics do not always underperform optimal strategies
The bias-variance tradeoff applies to decision strategies, not just estimators. An optimal strategy estimated from limited data has high variance. A simple heuristic has high bias but zero variance. In small-sample regimes, the heuristic often wins. Gigerenzer's program shows this empirically across dozens of domains. The question is not "is the heuristic optimal?" but "does the heuristic outperform the estimated optimum in practice?"
Exercises
Problem
Alternatives are drawn uniformly from (so and ). The search cost is per evaluation. Using the satisficing threshold theorem, find the optimal threshold .
Problem
A chess engine has 0.5 seconds to choose a move. It can evaluate positions per second. The game tree has branching factor and the engine uses minimax search. What is the maximum search depth the engine can explore exhaustively? If instead it uses alpha-beta pruning with perfect move ordering (effective branching factor ), what depth can it reach? Relate this to bounded rationality.
Problem
The portfolio rule allocates wealth equally across assets. Mean-variance optimization selects weights where is the vector of expected returns and is the covariance matrix. With time periods of return data, the estimation error in scales as . How large must be relative to for mean-variance optimization to outperform ? State the qualitative answer and explain why this illustrates bounded rationality.
Problem
Design a formal model of an LLM agent that must decide when to stop "thinking" (generating chain-of-thought tokens) and commit to an answer. The agent pays cost per token generated and receives reward where is the quality of the final answer (a function of the reasoning trace). Formulate this as an optimal stopping problem. Under what conditions does the optimal policy have a threshold structure (stop when estimated answer quality exceeds some )?
References
Canonical:
- Simon, Models of Bounded Rationality, Vols. 1-2 (1982), Chapters 1-2, 7-8
- Simon, The Sciences of the Artificial (3rd ed., 1996), Chapters 1-4
Heuristics and ecological rationality:
- Gigerenzer, Todd, and the ABC Research Group, Simple Heuristics That Make Us Smart (1999), Chapters 1-4
- Gigerenzer and Selten (eds.), Bounded Rationality: The Adaptive Toolbox (2001), Chapters 1-3
Formal connections:
- DeMiguel, Garlappi, and Uppal, "Optimal Versus Naive Diversification," Review of Financial Studies 22(5), 2009
- Papadimitriou and Yannakakis, "On the Approximability of Trade-offs and Optimal Access of Web Sources," FOCS 2000
Search and stopping theory:
- DeGroot, Optimal Statistical Decisions (1970), Chapters 12-13
- Ferguson, Optimal Stopping and Applications (2006), Chapters 1-2
Next Topics
Natural extensions from bounded rationality:
- Game theory: strategic interaction between agents, each of whom is boundedly rational
- Expected utility: the idealized framework that bounded rationality critiques and refines
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Decision Theory FoundationsLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Bayesian EstimationLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Differentiation in RnLayer 0A
- Convex Optimization BasicsLayer 1
- Matrix Operations and PropertiesLayer 0A