Numerical Optimization
Trust Region Methods
Build a local quadratic model of the objective, minimize it within a trusted ball, and adjust the ball size based on prediction quality. Trust region methods provide convergence guarantees even for non-convex problems and inspired TRPO in reinforcement learning.
Prerequisites
Why This Matters
Newton's method converges fast when it works, but it can diverge spectacularly when the quadratic approximation is poor (far from the optimum, non-convex regions, negative curvature). Line search methods address this by controlling the step direction and finding a good step length. Trust region methods take the opposite approach: they control the step size directly.
The idea is simple and powerful: build a local model you trust within a ball of radius . If the model predicts well, grow the ball. If it predicts poorly, shrink the ball. This adaptive strategy gives global convergence guarantees even for non-convex objectives, making trust regions the method of choice in many engineering optimization solvers.
Mental Model
You are standing on a hilly landscape in fog. You can see the ground clearly within a few meters (your trust region) but not further. You build a quadratic model of the terrain within your visible range and walk to the lowest point of that model. If the actual terrain matched your model well (you descended as expected), you trust yourself to see further next time (expand the region). If the terrain surprised you (you went up instead of down), you shrink your trusted range and try a more cautious step.
Formal Setup
Consider the unconstrained optimization problem where is twice continuously differentiable.
Quadratic Model
At the current iterate , the local quadratic model is:
where is the step and is either the exact Hessian or an approximation (e.g., BFGS).
Trust Region Subproblem
The trust region subproblem minimizes the model within a ball:
where is the trust region radius. When is positive definite and the unconstrained Newton step satisfies , the trust region constraint is inactive and the step equals the Newton step.
Ratio of Actual to Predicted Reduction
The quality of the model is measured by:
The numerator is the actual reduction in . The denominator is the predicted reduction from the model. If , the model is accurate. If , the step increased .
The Trust Region Algorithm
The algorithm has a clean structure:
- Solve the subproblem. Compute (approximately)
- Evaluate the ratio. Compute
- Accept or reject. If (typically ), accept the step: . Otherwise, reject:
- Update the radius.
- If (e.g., 0.75) and : expand
- If : contract
- Otherwise: keep
Solving the Subproblem
The trust region subproblem is a constrained quadratic program. Exact solution requires finding a Lagrange multiplier such that:
In practice, approximate solutions suffice. Two popular approaches:
Cauchy point. The steepest-descent step clipped to the trust region boundary. Cheap to compute and sufficient for convergence guarantees.
Dogleg method. Interpolate between the Cauchy point and the full Newton step along a piecewise linear path. Gives a better step than the Cauchy point with minimal additional cost.
Cauchy Point Sufficient Decrease
Statement
The Cauchy point (the steepest-descent step restricted to the trust region) achieves a predicted reduction of at least:
Intuition
The Cauchy point guarantees meaningful progress. Either the trust region is small (and you take the full steepest-descent step scaled to the boundary) or it is large (and you take an optimal steepest-descent step). Either way, the reduction is proportional to the gradient norm.
Why It Matters
This bound is the engine of the global convergence proof. Any trust region method that achieves at least the Cauchy decrease is guaranteed to converge. More sophisticated subproblem solvers (dogleg, Steihaug-CG) do better but are not strictly necessary for convergence.
Global Convergence
Global Convergence of Trust Region Methods
Statement
If the trust region method produces iterates with steps achieving at least the Cauchy decrease, then:
That is, the method finds points with gradient arbitrarily close to zero. If has a finite number of stationary points and the iterates are bounded, then converges to a stationary point.
Intuition
The argument is by contradiction. If the gradient is bounded away from zero, then each accepted step achieves at least a fixed reduction (by the Cauchy point bound). Since is bounded below, there can be only finitely many accepted steps, contradiction. The key mechanism: if a step is rejected, the radius shrinks, the model becomes more accurate locally, and eventually a step must be accepted.
Proof Sketch
Suppose for all . By the Cauchy decrease lemma, each accepted step reduces by at least where bounds . If stays bounded below, the sum of reductions diverges, which contradicts being bounded below. If , then eventually (the model becomes exact locally by Taylor's theorem), so steps are accepted and the radius is increased. This contradicts .
Why It Matters
This is a global convergence result: it holds regardless of the starting point and without assuming convexity. Line search Newton methods also converge globally under similar conditions, but trust region methods handle negative curvature more naturally (the trust region constraint prevents steps along directions of negative curvature from being too long).
Failure Mode
Convergence to a stationary point does not mean convergence to a local minimum. The method can converge to a saddle point. Modifications (using negative curvature directions) can ensure convergence to a point with positive semidefinite Hessian (second-order stationary point), but this requires more sophisticated subproblem solvers.
Trust Region vs Line Search
| Property | Trust Region | Line Search |
|---|---|---|
| Controls | Step size (radius ) | Step length () along a direction |
| Direction | Emerges from subproblem | Chosen first (gradient, Newton, etc.) |
| Negative curvature | Handled naturally | Requires modifications |
| Non-convex convergence | Global convergence with Cauchy decrease | Requires Wolfe conditions |
| Cost per iteration | Subproblem solve | Function evaluations for line search |
| Adjusts | Radius based on | Step length based on sufficient decrease |
The fundamental difference: line search first chooses a direction and then decides how far to go. Trust region decides both direction and length simultaneously by optimizing within a ball. Trust region is more robust in non-convex settings.
Connection to TRPO in Reinforcement Learning
Trust Region Policy Optimization (TRPO) applies the trust region idea to policy optimization. Instead of a Euclidean ball , TRPO uses a KL divergence constraint:
The KL constraint ensures the new policy is "close" to the old policy in a distribution-theoretic sense. This prevents catastrophic policy updates, the same role that the trust region radius plays in numerical optimization. PPO (Proximal Policy Optimization) approximates TRPO by clipping the importance ratio instead of solving the constrained problem, trading some theoretical elegance for simplicity.
Common Confusions
The trust region is not a constraint on the final solution
The trust region constrains each step, not the solution. The solution is unconstrained; the trust region only limits how far you move per iteration. It is a tool for safe exploration, not a constraint on the problem itself.
Shrinking the radius does not mean the method is failing
Radius contraction is a healthy part of the algorithm. It means the local model was inaccurate, so the method reduces its ambition and tries a more cautious step. The convergence proof relies on radius contraction: when shrinks sufficiently, the quadratic Taylor approximation becomes accurate and the step is accepted.
Summary
- Trust region: minimize a quadratic model within
- Accept step if is above a threshold
- Expand when the model predicts well; shrink when it does not
- Cauchy point guarantees sufficient decrease proportional to
- Global convergence: gradient norms converge to zero, even for non-convex
- Trust region controls step SIZE; line search controls step DIRECTION
- TRPO in RL is a trust region method with KL divergence as the distance metric
Exercises
Problem
At iteration , the gradient is (so ), the trust region radius is , and there is no curvature information (). What is the Cauchy point? What is the predicted reduction?
Problem
Explain why trust region methods handle negative curvature better than line search methods. Consider a 2D function where the Hessian has one positive and one negative eigenvalue at the current point.
Problem
TRPO uses a KL divergence constraint instead of a Euclidean ball. Why is the KL divergence a better choice for policy optimization? What goes wrong with a Euclidean constraint on the parameters ?
References
Canonical:
- Nocedal & Wright, Numerical Optimization (2006), Chapter 4
- Conn, Gould, Toint, Trust-Region Methods (2000)
Current:
- Schulman et al., "Trust Region Policy Optimization" (ICML 2015)
- Curtis, Robinson, "Exploiting Negative Curvature in Trust Region Methods" (2019)
Next Topics
Trust region methods connect to the broader optimization landscape in ML, including natural gradient methods (Fisher information as a trust region metric) and proximal policy optimization.
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Newton's MethodLayer 1
- Convex Optimization BasicsLayer 1
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix Operations and PropertiesLayer 0A