Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

RL Theory

Particle Filters

Sequential Monte Carlo: represent the posterior over hidden states as a set of weighted particles, propagate through dynamics, reweight by likelihood, and resample to combat degeneracy.

AdvancedTier 3Stable~50 min
0

Why This Matters

Many real-world problems involve tracking a hidden state that evolves over time, given noisy observations. A robot estimating its position from sensor readings. A financial model tracking volatility from asset prices. A radar system tracking aircraft from noisy returns.

The Kalman filter solves this optimally when the dynamics and observations are linear and Gaussian. When they are not (and they usually are not), you need particle filters. Particle filters approximate the posterior distribution over hidden states using a set of weighted samples (particles), applying importance sampling sequentially in time. They handle arbitrary nonlinear, non-Gaussian state-space models with no structural assumptions.

Mental Model

Imagine you are tracking a robot in a building. You have a map, a model of how the robot moves, and noisy sensor readings (e.g., distance to walls). You represent your belief about the robot's location as a cloud of dots (particles), each representing a hypothesis about where the robot might be.

At each time step: (1) move each particle according to the dynamics model (predict), (2) assign each particle a weight based on how well it explains the new sensor reading (update), (3) duplicate high-weight particles and discard low-weight ones (resample). Over time, the particle cloud concentrates on the true location.

Formal Setup and Notation

Definition

State-Space Model

A state-space model (also called a hidden Markov model in discrete settings) consists of:

  • State transition: xtf(xtxt1)x_t \sim f(x_t \mid x_{t-1}) where xtRdx_t \in \mathbb{R}^d is the hidden state
  • Observation model: ytg(ytxt)y_t \sim g(y_t \mid x_t) where yty_t is the observation
  • Prior: x0p(x0)x_0 \sim p(x_0)

The goal is to compute the filtering distribution: p(xty1:t)p(x_t \mid y_{1:t}), the posterior over the current state given all observations so far.

Definition

Particle Filter (Bootstrap Filter)

The bootstrap particle filter approximates the filtering distribution using NN weighted particles {(xt(i),wt(i))}i=1N\{(x_t^{(i)}, w_t^{(i)})\}_{i=1}^N:

  1. Initialize: Sample x0(i)p(x0)x_0^{(i)} \sim p(x_0) for i=1,,Ni = 1, \ldots, N. Set w0(i)=1/Nw_0^{(i)} = 1/N.

  2. Predict: Propagate each particle through the dynamics: x~t(i)f(xtxt1(i))\tilde{x}_t^{(i)} \sim f(x_t \mid x_{t-1}^{(i)})

  3. Update: Reweight by the likelihood of the observation: w~t(i)=wt1(i)g(ytx~t(i))\tilde{w}_t^{(i)} = w_{t-1}^{(i)} \cdot g(y_t \mid \tilde{x}_t^{(i)})

  4. Normalize: wt(i)=w~t(i)/jw~t(j)w_t^{(i)} = \tilde{w}_t^{(i)} / \sum_j \tilde{w}_t^{(j)}

  5. Resample: Draw NN particles with replacement from {x~t(i)}\{\tilde{x}_t^{(i)}\} with probabilities {wt(i)}\{w_t^{(i)}\}. Reset all weights to 1/N1/N.

The filtering distribution is approximated by:

p^(xty1:t)=i=1Nwt(i)δxt(i)(xt)\hat{p}(x_t \mid y_{1:t}) = \sum_{i=1}^{N} w_t^{(i)} \, \delta_{x_t^{(i)}}(x_t)

Main Theorems

Theorem

Particle Filter Consistency

Statement

For any bounded test function φ\varphi, the particle approximation converges to the true filtering expectation:

1Ni=1Nwt(i)φ(xt(i))E[φ(Xt)y1:t]a.s.0as N\left|\frac{1}{N}\sum_{i=1}^{N} w_t^{(i)}\,\varphi(x_t^{(i)}) - \mathbb{E}[\varphi(X_t) \mid y_{1:t}]\right| \xrightarrow{a.s.} 0 \quad \text{as } N \to \infty

Moreover, for fixed tt, the L2L^2 error satisfies:

E[(I^NI)2]=O(1/N)\mathbb{E}\left[\left(\hat{I}_N - I\right)^2\right] = O(1/N)

where I^N=iwt(i)φ(xt(i))\hat{I}_N = \sum_i w_t^{(i)} \varphi(x_t^{(i)}) and I=E[φ(Xt)y1:t]I = \mathbb{E}[\varphi(X_t) \mid y_{1:t}].

Intuition

The particle filter is applying importance sampling at each time step, with the transition prior as the proposal. The law of large numbers guarantees convergence as the number of particles grows. The resampling step prevents weight degeneracy from accumulating over time, keeping the effective sample size reasonable.

Why It Matters

This result justifies the use of particle filters as a practical algorithm: with enough particles, you can approximate the true posterior to any desired accuracy. The O(1/N)O(1/N) rate matches standard Monte Carlo, confirming that the sequential structure does not slow convergence (for fixed tt).

Failure Mode

The constant in the O(1/N)O(1/N) bound can grow exponentially in tt (and in the state dimension dd) if the model is poorly suited to the bootstrap proposal. In practice, this means particle filters can degrade over long time horizons or in high-dimensional state spaces. The number of particles needed for a given accuracy can become prohibitively large.

The Degeneracy Problem

Proposition

Weight Degeneracy Without Resampling

Statement

Without resampling, the effective sample size (ESS) of the importance weights decreases over time. Specifically, the variance of the unnormalized weights grows exponentially:

Var[p(x0:ty1:t)q(x0:ty1:t)]Cexp(αt)\text{Var}\left[\frac{p(x_{0:t} \mid y_{1:t})}{q(x_{0:t} \mid y_{1:t})}\right] \geq C \cdot \exp(\alpha t)

for some constants C,α>0C, \alpha > 0 depending on the model. This causes the ESS to collapse to 1, with one particle carrying nearly all the weight.

Intuition

Without resampling, each particle carries the accumulated weight from all past time steps. Small differences in weight compound multiplicatively over time, leading to a "rich get richer" dynamic where a single particle dominates. This is the curse of dimensionality applied to the path space x0:tx_{0:t}, which grows in dimension with each time step.

Why It Matters

This explains why resampling is essential, not optional. Without it, the particle filter degenerates into a single-particle approximation within a few time steps. Resampling "resets" the weights by duplicating good particles and discarding bad ones, preventing the exponential weight divergence.

Resampling Strategies

Multinomial resampling: Draw NN samples with replacement from the current weighted particles. Simple but has high variance.

Systematic resampling: Use a single uniform random number plus equally-spaced points on [0,1][0,1] to select particles. Lower variance than multinomial, widely used in practice.

Stratified resampling: Divide [0,1][0,1] into NN strata and draw one sample from each. Provably lower variance than multinomial.

Residual resampling: Deterministically allocate Nw(i)\lfloor Nw^{(i)} \rfloor copies of particle ii, then randomly allocate the remaining particles. Combines deterministic and stochastic allocation.

When to resample: Resampling every step adds unnecessary noise when weights are already balanced. A common adaptive scheme: resample only when ESS<N/2\text{ESS} < N/2.

Canonical Examples

Example

Robot localization

A robot moves in a 2D plane. State: (x,y,θ)(x, y, \theta) (position and heading). Dynamics: noisy motion model based on wheel odometry. Observations: laser range measurements to walls (highly nonlinear). The Kalman filter cannot handle this because the observation model is nonlinear and multimodal (the robot might be in multiple possible locations before loop closure).

A particle filter with 1000 particles can track the robot's position. Each particle represents a hypothesis about the robot's pose. After a few observations, particles cluster around the true pose. After loop closure (recognizing a previously visited location), the particle cloud snaps to a precise estimate.

Common Confusions

Watch Out

Particle filters are not MCMC

Both use random samples to approximate distributions, but the mechanisms differ. MCMC (e.g., Metropolis-Hastings) generates a chain of samples that converge to the stationary distribution. Particle filters generate a population of weighted samples that approximate the filtering distribution at each time step. MCMC is for static distributions; particle filters are for sequential (time-varying) distributions.

Watch Out

More particles do not always help

In high-dimensional state spaces, the number of particles needed for accurate approximation grows exponentially with the dimension. Throwing more particles at a 100-dimensional state estimation problem does not help unless you also improve the proposal distribution. This is the curse of dimensionality for particle methods.

Watch Out

Resampling introduces its own problems

Resampling combats weight degeneracy but introduces sample impoverishment: after resampling, many particles are duplicates. In the extreme, all particles could be copies of a single particle, losing diversity entirely. The tradeoff between degeneracy (not resampling enough) and impoverishment (resampling too often) is managed by adaptive resampling schemes.

Summary

  • Particle filters approximate the filtering distribution p(xty1:t)p(x_t \mid y_{1:t}) with weighted samples
  • Three steps per time step: predict (propagate), update (reweight), resample (duplicate/discard)
  • Consistent: approximation error is O(1/N)O(1/\sqrt{N}) for NN particles
  • Without resampling, weights degenerate exponentially over time
  • With resampling, diversity is maintained but sample impoverishment is a risk
  • Adaptive resampling (resample when ESS drops below threshold) balances both
  • Works for arbitrary nonlinear, non-Gaussian state-space models

Exercises

ExerciseCore

Problem

You run a particle filter with N=500N = 500 particles. After the update step, the normalized weights are w(1)=0.4w^{(1)} = 0.4, w(2)=0.3w^{(2)} = 0.3, and the remaining 498 particles have weights summing to 0.3 (roughly 0.0006 each). What is the approximate ESS? Should you resample?

ExerciseAdvanced

Problem

Explain why the bootstrap particle filter uses the transition prior f(xtxt1)f(x_t \mid x_{t-1}) as the proposal distribution. What is the advantage of an "optimal" proposal p(xtxt1,yt)p(x_t \mid x_{t-1}, y_t) that also conditions on the current observation? Why is it rarely used in practice?

ExerciseResearch

Problem

Particle filters suffer from the curse of dimensionality: the number of particles needed grows exponentially with the state dimension. Describe two approaches that partially mitigate this problem and explain the structural assumptions each exploits.

References

Canonical:

  • Gordon, Salmond, Smith, "Novel approach to nonlinear/non-Gaussian Bayesian state estimation" (1993). The original bootstrap particle filter
  • Doucet, de Freitas, Gordon, Sequential Monte Carlo Methods in Practice (2001)

Current:

  • Chopin & Papaspiliopoulos, An Introduction to Sequential Monte Carlo (2020)
  • Naesseth et al., "Elements of Sequential Monte Carlo" (2019)

Next Topics

The natural next steps from particle filters:

  • Kalman filter: the linear-Gaussian special case where particles are not needed
  • Variational inference: an alternative approximation framework for intractable posteriors

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.