RL Theory
Bayesian State Estimation
The filtering problem: recursively estimate a hidden state from noisy observations using predict-update cycles. Kalman filter for linear Gaussian systems, particle filters for the general case.
Prerequisites
Why This Matters
In robotics, control, and RL, the agent rarely observes the true state of the world. A robot sees noisy camera images, not exact positions. A financial model observes prices, not the hidden economic state. Bayesian state estimation provides the principled framework for maintaining a probability distribution over the hidden state given all observations so far.
Mental Model
At each time step, two things happen. First, the hidden state evolves according to some dynamics model (the predict step). Second, you receive a noisy observation (the update step). The Bayes filter combines these into a recursive formula: your belief at time is computed from your belief at time , the dynamics model, and the new observation.
Formal Setup
State-Space Model
A discrete-time state-space model consists of:
- Hidden state evolving via (dynamics model)
- Observation generated via (observation model)
- Prior on the initial state
The filtering problem: compute recursively as new observations arrive.
Belief
The belief at time is the posterior distribution over the hidden state given all observations:
The goal of Bayesian state estimation is to maintain as a sufficient statistic for decision-making.
Main Theorems
Bayes Filter Recursion
Statement
The posterior satisfies the predict-update recursion:
Predict:
Update:
where is the normalizing constant.
Intuition
The predict step propagates uncertainty forward through the dynamics model: if I was uncertain about , I am even more uncertain about before seeing . The update step sharpens the belief using the new observation: states that explain well get higher probability. This is just Bayes' rule applied recursively.
Proof Sketch
The predict step follows from marginalizing out of the joint and using the Markov property . The update step is a direct application of Bayes' theorem with the conditional independence assumption .
Why It Matters
This recursion is the foundation of all Bayesian filters. Every specific algorithm (Kalman, extended Kalman, unscented Kalman, particle filter) is an approximation to this exact recursion under different assumptions about the state space and distributions.
Failure Mode
The predict step requires computing an integral that is analytically tractable only in special cases (linear Gaussian). For nonlinear dynamics or non-Gaussian noise, you must approximate. The quality of the filter depends entirely on how well you approximate this integral.
Kalman Filter
Statement
Under linear Gaussian assumptions, the Bayes filter recursion has a closed-form solution. The belief is Gaussian at every time step, with:
Predict:
Update:
where is the Kalman gain, is the process noise covariance, and is the observation noise covariance.
Intuition
The Kalman gain controls how much you trust the new observation versus your prediction. If observation noise is large, is small: you mostly trust the prediction. If prediction uncertainty is large, is large: you mostly trust the observation. The filter automatically balances these two sources of information.
Proof Sketch
Since Gaussians are closed under linear transformations and conditioning, the predict step produces a Gaussian (sum of Gaussian variables) and the update step conditions a joint Gaussian on the observed variable, which is again Gaussian. The Kalman gain formula follows from the standard conditional Gaussian formula.
Why It Matters
The Kalman filter is one of the most widely used algorithms in engineering: GPS navigation, spacecraft tracking, econometrics. Its importance comes from being the exact Bayesian solution (not an approximation) for linear Gaussian systems, and from being computationally cheap ( per step where is the state dimension).
Failure Mode
Linearity and Gaussianity are strong assumptions. Real systems with nonlinear dynamics (robot arm kinematics), multi-modal beliefs (robot unsure which side of a wall it is on), or heavy-tailed noise (financial data) violate these assumptions. The extended Kalman filter (EKF) linearizes locally, but this can diverge for highly nonlinear systems.
Particle Filters
When the state-space model is nonlinear or non-Gaussian, particle filters approximate the belief with a weighted set of samples (particles).
The algorithm: (1) Sample for each particle (predict). (2) Assign weight (update). (3) Resample particles according to weights to avoid weight degeneracy.
Particle filters converge to the true posterior as the number of particles , but suffer from the curse of dimensionality: for high-dimensional state spaces, the required number of particles grows exponentially.
Common Confusions
Filtering is not smoothing
Filtering computes : the belief at time given observations up to . Smoothing computes for : the belief at given future observations too. Smoothing uses more information and is more accurate but requires waiting until time .
The Kalman filter is not an approximation
For linear Gaussian systems, the Kalman filter computes the exact posterior. It is not a heuristic or approximation. The extended Kalman filter (EKF), which linearizes a nonlinear system, is an approximation. This distinction matters: the EKF can diverge, the Kalman filter cannot (given its assumptions).
Exercises
Problem
A 1D state evolves as with , and observations are with . Starting from , compute the Kalman filter belief after one observation .
Problem
Explain why particle filters suffer from weight degeneracy and how resampling helps. What problem does resampling itself introduce?
References
Canonical:
- Anderson & Moore, Optimal Filtering (1979), Chapters 2-4
- Kalman, "A New Approach to Linear Filtering and Prediction Problems," ASME Journal (1960)
Current:
-
Thrun, Burgard, Fox, Probabilistic Robotics (2005), Chapters 2-4
-
Doucet & Johansen, "A Tutorial on Particle Filtering," Handbook of Nonlinear Filtering (2009)
-
Murphy, Machine Learning: A Probabilistic Perspective (2012)
Next Topics
- Markov decision processes: when state estimation meets decision-making under uncertainty
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Bayesian EstimationLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A