Bayesian State Estimation

Sneiderman, Robby

Applied Math

Bayesian State Estimation

The filtering problem: recursively estimate a hidden state from noisy observations using predict-update cycles. Kalman filter for linear Gaussian systems, particle filters for the general case.

CoreTier 2StableSupporting~45 min

Prerequisites

Bayesian Estimation Common Probability Distributions Gaussian Processes in Astronomy Kalman Filter

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

applied-math | layer 2 | tier 2. This page has 5 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Markov Decision Processes

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Source-grounded page

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

In robotics, control, and RL, the agent rarely observes the true state of the world. A robot sees noisy camera images, not exact positions. A financial model observes prices, not the hidden economic state. Bayesian state estimation provides the principled framework for maintaining a probability distribution over the hidden state given all observations so far.

Mental Model

At each time step, two things happen. First, the hidden state evolves according to some dynamics model (the predict step). Second, you receive a noisy observation (the update step). The Bayes filter combines these into a recursive formula: your belief at time $t$ is computed from your belief at time $t-1$ , the dynamics model, and the new observation.

Formal Setup

Definition

State-Space Model

A discrete-time state-space model consists of:

Hidden state $x_t \in \mathcal{X}$ evolving via $x_t \sim p(x_t | x_{t-1})$ (dynamics model)
Observation $z_t \in \mathcal{Z}$ generated via $z_t \sim p(z_t | x_t)$ (observation model)
Prior $p(x_0)$ on the initial state

The filtering problem: compute $p(x_t | z_{1:t})$ recursively as new observations arrive.

Definition

Belief $b_{t} (x)$

The belief at time $t$ is the posterior distribution over the hidden state given all observations:

$b_t(x) = p(x_t = x \mid z_1, z_2, \ldots, z_t)$

The goal of Bayesian state estimation is to maintain $b_t$ as a sufficient statistic for decision-making.

Main Theorems

Theorem

Bayes Filter Recursion

Statement

The posterior $p(x_t | z_{1:t})$ satisfies the predict-update recursion:

Predict:

$p(x_t | z_{1:t-1}) = \int p(x_t | x_{t-1}) \, p(x_{t-1} | z_{1:t-1}) \, dx_{t-1}$

Update:

$p(x_t | z_{1:t}) = \frac{p(z_t | x_t) \, p(x_t | z_{1:t-1})}{p(z_t | z_{1:t-1})}$

where $p(z_t | z_{1:t-1}) = \int p(z_t | x_t) \, p(x_t | z_{1:t-1}) \, dx_t$ is the normalizing constant.

Intuition

The predict step propagates uncertainty forward through the dynamics model: if I was uncertain about $x_{t-1}$ , I am even more uncertain about $x_t$ before seeing $z_t$ . The update step sharpens the belief using the new observation: states that explain $z_t$ well get higher probability. This is just Bayes' rule applied recursively.

Proof Sketch

The predict step follows from marginalizing $x_{t-1}$ out of the joint $p(x_t, x_{t-1} | z_{1:t-1})$ and using the Markov property $p(x_t | x_{t-1}, z_{1:t-1}) = p(x_t | x_{t-1})$ . The update step is a direct application of Bayes' theorem with the conditional independence assumption $p(z_t | x_t, z_{1:t-1}) = p(z_t | x_t)$ .

Why It Matters

This recursion is the foundation of all Bayesian filters. Every specific algorithm (Kalman, extended Kalman, unscented Kalman, particle filter) is an approximation to this exact recursion under different assumptions about the state space and distributions.

Failure Mode

The predict step requires computing an integral that is analytically tractable only in special cases (linear Gaussian). For nonlinear dynamics or non-Gaussian noise, you must approximate. The quality of the filter depends entirely on how well you approximate this integral.

report a correction →

Theorem

Kalman Filter

Statement

Under linear Gaussian assumptions, the Bayes filter recursion has a closed-form solution. The belief $p(x_t | z_{1:t})$ is Gaussian $\mathcal{N}(\mu_t, \Sigma_t)$ at every time step, with:

Predict:

$\mu_{t|t-1} = A\mu_{t-1}, \qquad \Sigma_{t|t-1} = A\Sigma_{t-1}A^\top + Q$

Update:

$K_t = \Sigma_{t|t-1} H^\top (H\Sigma_{t|t-1}H^\top + R)^{-1}$

$\mu_t = \mu_{t|t-1} + K_t(z_t - H\mu_{t|t-1})$

$\Sigma_t = (I - K_tH)\Sigma_{t|t-1}$

where $K_t$ is the Kalman gain, $Q$ is the process noise covariance, and $R$ is the observation noise covariance.

Intuition

The Kalman gain $K_t$ controls how much you trust the new observation versus your prediction. If observation noise $R$ is large, $K_t$ is small: you mostly trust the prediction. If prediction uncertainty $\Sigma_{t|t-1}$ is large, $K_t$ is large: you mostly trust the observation. The filter automatically balances these two sources of information.

Proof Sketch

Since Gaussians are closed under linear transformations and conditioning, the predict step produces a Gaussian (sum of Gaussian variables) and the update step conditions a joint Gaussian on the observed variable, which is again Gaussian. The Kalman gain formula follows from the standard conditional Gaussian formula.

Why It Matters

The Kalman filter is one of the most widely used algorithms in engineering: GPS navigation, spacecraft tracking, econometrics. Its importance comes from being the exact Bayesian solution (not an approximation) for linear Gaussian systems, and from being computationally cheap ( $O(d^3)$ per step where $d$ is the state dimension).

Failure Mode

Linearity and Gaussianity are strong assumptions. Real systems with nonlinear dynamics (robot arm kinematics), multi-modal beliefs (robot unsure which side of a wall it is on), or heavy-tailed noise (financial data) violate these assumptions. The extended Kalman filter (EKF) linearizes locally, but this can diverge for highly nonlinear systems.

report a correction →

Particle Filters

When the state-space model is nonlinear or non-Gaussian, particle filters approximate the belief with a weighted set of samples (particles).

The algorithm: (1) Sample $x_t^{(i)} \sim p(x_t | x_{t-1}^{(i)})$ for each particle (predict). (2) Assign weight $w_t^{(i)} \propto p(z_t | x_t^{(i)})$ (update). (3) Resample particles according to weights to avoid weight degeneracy.

Particle filters converge to the true posterior as the number of particles $N \to \infty$ , but suffer from the curse of dimensionality: for high-dimensional state spaces, the required number of particles grows exponentially.

Common Confusions

Watch Out

Filtering is not smoothing

Filtering computes $p(x_t | z_{1:t})$ : the belief at time $t$ given observations up to $t$ . Smoothing computes $p(x_t | z_{1:T})$ for $t < T$ : the belief at $t$ given future observations too. Smoothing uses more information and is more accurate but requires waiting until time $T$ .

Watch Out

The Kalman filter is not an approximation

For linear Gaussian systems, the Kalman filter computes the exact posterior. It is not a heuristic or approximation. The extended Kalman filter (EKF), which linearizes a nonlinear system, is an approximation. This distinction matters: the EKF can diverge, the Kalman filter cannot (given its assumptions).

Exercises

ExerciseCore

Problem

A 1D state evolves as $x_t = x_{t-1} + w_t$ with $w_t \sim \mathcal{N}(0, 1)$ , and observations are $z_t = x_t + v_t$ with $v_t \sim \mathcal{N}(0, 4)$ . Starting from $x_0 \sim \mathcal{N}(0, 1)$ , compute the Kalman filter belief $(\mu_1, \Sigma_1)$ after one observation $z_1 = 3$ .

ExerciseAdvanced

Problem

Explain why particle filters suffer from weight degeneracy and how resampling helps. What problem does resampling itself introduce?

References

Canonical:

Anderson & Moore, Optimal Filtering (1979), Chapters 2-4
Kalman, "A New Approach to Linear Filtering and Prediction Problems," ASME Journal (1960)

Current:

Thrun, Burgard, Fox, Probabilistic Robotics (2005), Chapters 2-4
Doucet & Johansen, "A Tutorial on Particle Filtering," Handbook of Nonlinear Filtering (2009)
Murphy, Machine Learning: A Probabilistic Perspective (2012)

Next Topics

Markov decision processes: when state estimation meets decision-making under uncertainty

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

5

Common Probability Distributionslayer 0A · tier 1
Kalman Filterlayer 2 · tier 1
Bayesian Estimationlayer 0B · tier 2
No-U-Turn Sampler and Neal's Funnellayer 3 · tier 2
Gaussian Processes in Astronomylayer 4 · tier 3

Derived topics

1

Markov Decision Processeslayer 2 · tier 1

Graph-backed continuations

Markov Decision Processes