Particle Filters

Sneiderman, Robby

Applied Math

Particle Filters

Sequential Monte Carlo: represent the posterior over hidden states as a set of weighted particles, propagate through dynamics, reweight by likelihood, and resample to combat degeneracy.

AdvancedTier 3StableSupporting~50 min

Prerequisites

Metropolis Hastings Importance Sampling Graphslam and Factor Graphs Kalman Filter

Prereq Map

Learning position

Read this page in the graph.

applied-math | layer 3 | tier 3. This page has 5 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Take the diagnostic

No published continuation is declared yet, so the diagnostic is the clean next route.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Many real-world problems involve tracking a hidden state that evolves over time, given noisy observations. A robot estimating its position from sensor readings. A financial model tracking volatility from asset prices. A radar system tracking aircraft from noisy returns.

The Kalman filter solves this optimally when the dynamics and observations are linear and Gaussian. When they are not (and they usually are not), you need particle filters. Particle filters approximate the posterior distribution over hidden states using a set of weighted samples (particles), applying importance sampling sequentially in time. They handle arbitrary nonlinear, non-Gaussian state-space models with no structural assumptions.

Mental Model

Imagine you are tracking a robot in a building. You have a map, a model of how the robot moves, and noisy sensor readings (e.g., distance to walls). You represent your belief about the robot's location as a cloud of dots (particles), each representing a hypothesis about where the robot might be.

At each time step: (1) move each particle according to the dynamics model (predict), (2) assign each particle a weight based on how well it explains the new sensor reading (update), (3) duplicate high-weight particles and discard low-weight ones (resample). Over time, the particle cloud concentrates on the true location.

Formal Setup and Notation

Definition

State-Space Model

A state-space model (also called a hidden Markov model in discrete settings) consists of:

State transition: $x_t \sim f(x_t \mid x_{t-1})$ where $x_t \in \mathbb{R}^d$ is the hidden state
Observation model: $y_t \sim g(y_t \mid x_t)$ where $y_t$ is the observation
Prior: $x_0 \sim p(x_0)$

The goal is to compute the filtering distribution: $p(x_t \mid y_{1:t})$ , the posterior over the current state given all observations so far.

Definition

Particle Filter (Bootstrap Filter)

The bootstrap particle filter approximates the filtering distribution using $N$ weighted particles $\{(x_t^{(i)}, w_t^{(i)})\}_{i=1}^N$ :

Initialize: Sample $x_0^{(i)} \sim p(x_0)$ for $i = 1, \ldots, N$ . Set $w_0^{(i)} = 1/N$ .
Predict: Propagate each particle through the dynamics: $\tilde{x}_t^{(i)} \sim f(x_t \mid x_{t-1}^{(i)})$
Update: Reweight by the likelihood of the observation: $\tilde{w}_t^{(i)} = w_{t-1}^{(i)} \cdot g(y_t \mid \tilde{x}_t^{(i)})$
Normalize: $w_t^{(i)} = \tilde{w}_t^{(i)} / \sum_j \tilde{w}_t^{(j)}$
Resample: Draw $N$ particles with replacement from $\{\tilde{x}_t^{(i)}\}$ with probabilities $\{w_t^{(i)}\}$ . Reset all weights to $1/N$ .

The filtering distribution is approximated by:

$\hat{p}(x_t \mid y_{1:t}) = \sum_{i=1}^{N} w_t^{(i)} \, \delta_{x_t^{(i)}}(x_t)$

Main Theorems

Theorem

Particle Filter Consistency

Statement

For any bounded test function $\varphi$ , the particle approximation converges to the true filtering expectation:

$\left|\frac{1}{N}\sum_{i=1}^{N} w_t^{(i)}\,\varphi(x_t^{(i)}) - \mathbb{E}[\varphi(X_t) \mid y_{1:t}]\right| \xrightarrow{a.s.} 0 \quad \text{as } N \to \infty$

Moreover, for fixed $t$ , the $L^2$ error satisfies:

$\mathbb{E}\left[\left(\hat{I}_N - I\right)^2\right] = O(1/N)$

where $\hat{I}_N = \sum_i w_t^{(i)} \varphi(x_t^{(i)})$ and $I = \mathbb{E}[\varphi(X_t) \mid y_{1:t}]$ .

Intuition

The particle filter is applying importance sampling at each time step, with the transition prior as the proposal. The law of large numbers guarantees convergence as the number of particles grows. The resampling step prevents weight degeneracy from accumulating over time, keeping the effective sample size reasonable.

Why It Matters

This result justifies the use of particle filters as a practical algorithm: with enough particles, you can approximate the true posterior to any desired accuracy. The $O(1/N)$ rate matches standard Monte Carlo, confirming that the sequential structure does not slow convergence (for fixed $t$ ).

Failure Mode

The constant in the $O(1/N)$ bound can grow exponentially in $t$ (and in the state dimension $d$ ) if the model is poorly suited to the bootstrap proposal. In practice, this means particle filters can degrade over long time horizons or in high-dimensional state spaces. The number of particles needed for a given accuracy can become prohibitively large.

report a correction →

The Degeneracy Problem

Proposition

Weight Degeneracy Without Resampling

Statement

Without resampling, the effective sample size (ESS) of the importance weights decreases over time. Specifically, the variance of the unnormalized weights grows exponentially:

$\text{Var}\left[\frac{p(x_{0:t} \mid y_{1:t})}{q(x_{0:t} \mid y_{1:t})}\right] \geq C \cdot \exp(\alpha t)$

for some constants $C, \alpha > 0$ depending on the model. This causes the ESS to collapse to 1, with one particle carrying nearly all the weight.

Intuition

Without resampling, each particle carries the accumulated weight from all past time steps. Small differences in weight compound multiplicatively over time, leading to a "rich get richer" dynamic where a single particle dominates. This is the curse of dimensionality applied to the path space $x_{0:t}$ , which grows in dimension with each time step.

Why It Matters

This explains why resampling is essential, not optional. Without it, the particle filter degenerates into a single-particle approximation within a few time steps. Resampling "resets" the weights by duplicating good particles and discarding bad ones, preventing the exponential weight divergence.

report a correction →

Resampling Strategies

Multinomial resampling: Draw $N$ samples with replacement from the current weighted particles. Simple but has high variance.

Systematic resampling: Use a single uniform random number plus equally-spaced points on $[0,1]$ to select particles. Lower variance than multinomial, widely used in practice.

Stratified resampling: Divide $[0,1]$ into $N$ strata and draw one sample from each. Provably lower variance than multinomial.

Residual resampling: Deterministically allocate $\lfloor Nw^{(i)} \rfloor$ copies of particle $i$ , then randomly allocate the remaining particles. Combines deterministic and stochastic allocation.

When to resample: Resampling every step adds unnecessary noise when weights are already balanced. A common adaptive scheme: resample only when $\text{ESS} < N/2$ .

Canonical Examples

Example

Robot localization

A robot moves in a 2D plane. State: $(x, y, \theta)$ (position and heading). Dynamics: noisy motion model based on wheel odometry. Observations: laser range measurements to walls (highly nonlinear). The Kalman filter cannot handle this because the observation model is nonlinear and multimodal (the robot might be in multiple possible locations before loop closure).

A particle filter with 1000 particles can track the robot's position. Each particle represents a hypothesis about the robot's pose. After a few observations, particles cluster around the true pose. After loop closure (recognizing a previously visited location), the particle cloud snaps to a precise estimate.

Common Confusions

Watch Out

Particle filters are not MCMC

Both use random samples to approximate distributions, but the mechanisms differ. MCMC (e.g., Metropolis-Hastings) generates a chain of samples that converge to the stationary distribution. Particle filters generate a population of weighted samples that approximate the filtering distribution at each time step. MCMC is for static distributions; particle filters are for sequential (time-varying) distributions.

Watch Out

More particles do not always help

In high-dimensional state spaces, the number of particles needed for accurate approximation grows exponentially with the dimension. Throwing more particles at a 100-dimensional state estimation problem does not help unless you also improve the proposal distribution. This is the curse of dimensionality for particle methods.

Watch Out

Resampling introduces its own problems

Resampling combats weight degeneracy but introduces sample impoverishment: after resampling, many particles are duplicates. In the extreme, all particles could be copies of a single particle, losing diversity entirely. The tradeoff between degeneracy (not resampling enough) and impoverishment (resampling too often) is managed by adaptive resampling schemes.

Summary

Particle filters approximate the filtering distribution $p(x_t \mid y_{1:t})$ with weighted samples
Three steps per time step: predict (propagate), update (reweight), resample (duplicate/discard)
Consistent: for fixed $t$ and bounded test functions, the mean-squared error of the importance-sampling estimate is $O(1/N)$ , equivalently the root-mean-squared error is $O(1/\sqrt{N})$ for $N$ particles (the constant can grow with $t$ and dimension)
Without resampling, weights degenerate exponentially over time
With resampling, diversity is maintained but sample impoverishment is a risk
Adaptive resampling (resample when ESS drops below threshold) balances both
Works for arbitrary nonlinear, non-Gaussian state-space models

Exercises

ExerciseCore

Problem

You run a particle filter with $N = 500$ particles. After the update step, the normalized weights are $w^{(1)} = 0.4$ , $w^{(2)} = 0.3$ , and the remaining 498 particles have weights summing to 0.3 (roughly 0.0006 each). What is the approximate ESS? Should you resample?

ExerciseAdvanced

Problem

Explain why the bootstrap particle filter uses the transition prior $f(x_t \mid x_{t-1})$ as the proposal distribution. What is the advantage of an "optimal" proposal $p(x_t \mid x_{t-1}, y_t)$ that also conditions on the current observation? Why is it rarely used in practice?

ExerciseResearch

Problem

Particle filters suffer from the curse of dimensionality: the number of particles needed grows exponentially with the state dimension. Describe two approaches that partially mitigate this problem and explain the structural assumptions each exploits.

References

Canonical:

Gordon, Salmond, Smith, "Novel approach to nonlinear/non-Gaussian Bayesian state estimation" (1993). The original bootstrap particle filter
Doucet, de Freitas, Gordon, Sequential Monte Carlo Methods in Practice (2001)

Current:

Chopin & Papaspiliopoulos, An Introduction to Sequential Monte Carlo (2020), Chapters 10-11
Naesseth et al., "Elements of Sequential Monte Carlo" (2019)
Doucet & Johansen, "A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later" (2009), in Oxford Handbook of Nonlinear Filtering
Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems (2004)

Next Topics

The natural next steps from particle filters:

Kalman filter: the linear-Gaussian special case where particles are not needed
Variational inference: an alternative approximation framework for intractable posteriors

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

5

Importance Samplinglayer 2 · tier 1
Kalman Filterlayer 2 · tier 1
Metropolis-Hastings Algorithmlayer 2 · tier 1
State Space Modelslayer 2 · tier 2
GraphSLAM and Factor Graphslayer 3 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.