Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

RL Theory

Bayesian State Estimation

The filtering problem: recursively estimate a hidden state from noisy observations using predict-update cycles. Kalman filter for linear Gaussian systems, particle filters for the general case.

CoreTier 2Stable~45 min

Why This Matters

In robotics, control, and RL, the agent rarely observes the true state of the world. A robot sees noisy camera images, not exact positions. A financial model observes prices, not the hidden economic state. Bayesian state estimation provides the principled framework for maintaining a probability distribution over the hidden state given all observations so far.

Mental Model

At each time step, two things happen. First, the hidden state evolves according to some dynamics model (the predict step). Second, you receive a noisy observation (the update step). The Bayes filter combines these into a recursive formula: your belief at time tt is computed from your belief at time t1t-1, the dynamics model, and the new observation.

Formal Setup

Definition

State-Space Model

A discrete-time state-space model consists of:

  • Hidden state xtXx_t \in \mathcal{X} evolving via xtp(xtxt1)x_t \sim p(x_t | x_{t-1}) (dynamics model)
  • Observation ztZz_t \in \mathcal{Z} generated via ztp(ztxt)z_t \sim p(z_t | x_t) (observation model)
  • Prior p(x0)p(x_0) on the initial state

The filtering problem: compute p(xtz1:t)p(x_t | z_{1:t}) recursively as new observations arrive.

Definition

Belief

The belief at time tt is the posterior distribution over the hidden state given all observations:

bt(x)=p(xt=xz1,z2,,zt)b_t(x) = p(x_t = x \mid z_1, z_2, \ldots, z_t)

The goal of Bayesian state estimation is to maintain btb_t as a sufficient statistic for decision-making.

Main Theorems

Theorem

Bayes Filter Recursion

Statement

The posterior p(xtz1:t)p(x_t | z_{1:t}) satisfies the predict-update recursion:

Predict:

p(xtz1:t1)=p(xtxt1)p(xt1z1:t1)dxt1p(x_t | z_{1:t-1}) = \int p(x_t | x_{t-1}) \, p(x_{t-1} | z_{1:t-1}) \, dx_{t-1}

Update:

p(xtz1:t)=p(ztxt)p(xtz1:t1)p(ztz1:t1)p(x_t | z_{1:t}) = \frac{p(z_t | x_t) \, p(x_t | z_{1:t-1})}{p(z_t | z_{1:t-1})}

where p(ztz1:t1)=p(ztxt)p(xtz1:t1)dxtp(z_t | z_{1:t-1}) = \int p(z_t | x_t) \, p(x_t | z_{1:t-1}) \, dx_t is the normalizing constant.

Intuition

The predict step propagates uncertainty forward through the dynamics model: if I was uncertain about xt1x_{t-1}, I am even more uncertain about xtx_t before seeing ztz_t. The update step sharpens the belief using the new observation: states that explain ztz_t well get higher probability. This is just Bayes' rule applied recursively.

Proof Sketch

The predict step follows from marginalizing xt1x_{t-1} out of the joint p(xt,xt1z1:t1)p(x_t, x_{t-1} | z_{1:t-1}) and using the Markov property p(xtxt1,z1:t1)=p(xtxt1)p(x_t | x_{t-1}, z_{1:t-1}) = p(x_t | x_{t-1}). The update step is a direct application of Bayes' theorem with the conditional independence assumption p(ztxt,z1:t1)=p(ztxt)p(z_t | x_t, z_{1:t-1}) = p(z_t | x_t).

Why It Matters

This recursion is the foundation of all Bayesian filters. Every specific algorithm (Kalman, extended Kalman, unscented Kalman, particle filter) is an approximation to this exact recursion under different assumptions about the state space and distributions.

Failure Mode

The predict step requires computing an integral that is analytically tractable only in special cases (linear Gaussian). For nonlinear dynamics or non-Gaussian noise, you must approximate. The quality of the filter depends entirely on how well you approximate this integral.

Theorem

Kalman Filter

Statement

Under linear Gaussian assumptions, the Bayes filter recursion has a closed-form solution. The belief p(xtz1:t)p(x_t | z_{1:t}) is Gaussian N(μt,Σt)\mathcal{N}(\mu_t, \Sigma_t) at every time step, with:

Predict:

μtt1=Aμt1,Σtt1=AΣt1A+Q\mu_{t|t-1} = A\mu_{t-1}, \qquad \Sigma_{t|t-1} = A\Sigma_{t-1}A^\top + Q

Update:

Kt=Σtt1H(HΣtt1H+R)1K_t = \Sigma_{t|t-1} H^\top (H\Sigma_{t|t-1}H^\top + R)^{-1}

μt=μtt1+Kt(ztHμtt1)\mu_t = \mu_{t|t-1} + K_t(z_t - H\mu_{t|t-1})

Σt=(IKtH)Σtt1\Sigma_t = (I - K_tH)\Sigma_{t|t-1}

where KtK_t is the Kalman gain, QQ is the process noise covariance, and RR is the observation noise covariance.

Intuition

The Kalman gain KtK_t controls how much you trust the new observation versus your prediction. If observation noise RR is large, KtK_t is small: you mostly trust the prediction. If prediction uncertainty Σtt1\Sigma_{t|t-1} is large, KtK_t is large: you mostly trust the observation. The filter automatically balances these two sources of information.

Proof Sketch

Since Gaussians are closed under linear transformations and conditioning, the predict step produces a Gaussian (sum of Gaussian variables) and the update step conditions a joint Gaussian on the observed variable, which is again Gaussian. The Kalman gain formula follows from the standard conditional Gaussian formula.

Why It Matters

The Kalman filter is one of the most widely used algorithms in engineering: GPS navigation, spacecraft tracking, econometrics. Its importance comes from being the exact Bayesian solution (not an approximation) for linear Gaussian systems, and from being computationally cheap (O(d3)O(d^3) per step where dd is the state dimension).

Failure Mode

Linearity and Gaussianity are strong assumptions. Real systems with nonlinear dynamics (robot arm kinematics), multi-modal beliefs (robot unsure which side of a wall it is on), or heavy-tailed noise (financial data) violate these assumptions. The extended Kalman filter (EKF) linearizes locally, but this can diverge for highly nonlinear systems.

Particle Filters

When the state-space model is nonlinear or non-Gaussian, particle filters approximate the belief with a weighted set of samples (particles).

The algorithm: (1) Sample xt(i)p(xtxt1(i))x_t^{(i)} \sim p(x_t | x_{t-1}^{(i)}) for each particle (predict). (2) Assign weight wt(i)p(ztxt(i))w_t^{(i)} \propto p(z_t | x_t^{(i)}) (update). (3) Resample particles according to weights to avoid weight degeneracy.

Particle filters converge to the true posterior as the number of particles NN \to \infty, but suffer from the curse of dimensionality: for high-dimensional state spaces, the required number of particles grows exponentially.

Common Confusions

Watch Out

Filtering is not smoothing

Filtering computes p(xtz1:t)p(x_t | z_{1:t}): the belief at time tt given observations up to tt. Smoothing computes p(xtz1:T)p(x_t | z_{1:T}) for t<Tt < T: the belief at tt given future observations too. Smoothing uses more information and is more accurate but requires waiting until time TT.

Watch Out

The Kalman filter is not an approximation

For linear Gaussian systems, the Kalman filter computes the exact posterior. It is not a heuristic or approximation. The extended Kalman filter (EKF), which linearizes a nonlinear system, is an approximation. This distinction matters: the EKF can diverge, the Kalman filter cannot (given its assumptions).

Exercises

ExerciseCore

Problem

A 1D state evolves as xt=xt1+wtx_t = x_{t-1} + w_t with wtN(0,1)w_t \sim \mathcal{N}(0, 1), and observations are zt=xt+vtz_t = x_t + v_t with vtN(0,4)v_t \sim \mathcal{N}(0, 4). Starting from x0N(0,1)x_0 \sim \mathcal{N}(0, 1), compute the Kalman filter belief (μ1,Σ1)(\mu_1, \Sigma_1) after one observation z1=3z_1 = 3.

ExerciseAdvanced

Problem

Explain why particle filters suffer from weight degeneracy and how resampling helps. What problem does resampling itself introduce?

References

Canonical:

  • Anderson & Moore, Optimal Filtering (1979), Chapters 2-4
  • Kalman, "A New Approach to Linear Filtering and Prediction Problems," ASME Journal (1960)

Current:

  • Thrun, Burgard, Fox, Probabilistic Robotics (2005), Chapters 2-4

  • Doucet & Johansen, "A Tutorial on Particle Filtering," Handbook of Nonlinear Filtering (2009)

  • Murphy, Machine Learning: A Probabilistic Perspective (2012)

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics