Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

RL Theory

GraphSLAM and Factor Graphs

SLAM as graph optimization: poses as nodes, constraints as edges, factor graph representation, MAP estimation via nonlinear least squares, and the sparsity structure that makes large-scale mapping tractable.

AdvancedTier 2Stable~55 min
0

Why This Matters

Simultaneous Localization and Mapping (SLAM) is the problem of building a map of an unknown environment while simultaneously tracking the agent's position within it. It is one of the most important problems in robotics: every autonomous vehicle, delivery robot, and AR headset solves some form of SLAM.

GraphSLAM reformulates SLAM as a graph optimization problem. Instead of maintaining a probability distribution over all poses (which is what filter-based SLAM does), GraphSLAM collects all constraints (odometry, landmark observations, loop closures) and solves for all poses simultaneously via nonlinear least squares. This batch approach produces more accurate results than filtering and naturally exploits sparsity in the constraint structure.

Mental Model

Imagine you walk through a building while recording your position at each step and noting observations of landmarks. Each step gives you a noisy estimate of the displacement (odometry). Each landmark observation gives you a noisy constraint between your current position and the landmark.

Build a graph: nodes are your poses at each time step plus landmark positions. Edges are the constraints from odometry and observations. Now find the configuration of all nodes that best satisfies all constraints simultaneously. This is a least-squares optimization problem, and its structure makes it efficiently solvable.

Formal Setup and Notation

Definition

SLAM as a Graphical Model

Let x0:T=(x0,x1,,xT)x_{0:T} = (x_0, x_1, \ldots, x_T) be the robot's poses and m=(m1,,mM)m = (m_1, \ldots, m_M) be the landmark positions. The SLAM problem seeks the maximum a posteriori (MAP) estimate:

x0:T,m=argmaxx0:T,mp(x0:T,mz1:T,u1:T)x_{0:T}^*, m^* = \arg\max_{x_{0:T}, m} \, p(x_{0:T}, m \mid z_{1:T}, u_{1:T})

where z1:Tz_{1:T} are observations and u1:Tu_{1:T} are odometry inputs.

Definition

Factor Graph

A factor graph is a bipartite graph with two types of nodes:

  • Variable nodes: represent unknown quantities (x0,x1,,xT,m1,,mMx_0, x_1, \ldots, x_T, m_1, \ldots, m_M)
  • Factor nodes: represent probabilistic constraints between variables

Each factor fkf_k connects to the variables it depends on. The joint distribution factorizes as:

p(x0:T,mz,u)kfk(Xk)p(x_{0:T}, m \mid z, u) \propto \prod_{k} f_k(\mathcal{X}_k)

where Xk\mathcal{X}_k is the set of variables connected to factor kk.

Definition

SLAM Factors

The standard SLAM factor graph contains three types of factors:

  1. Prior factor: f0(x0)=N(x0;μ0,Σ0)f_0(x_0) = \mathcal{N}(x_0; \mu_0, \Sigma_0) anchoring the initial pose
  2. Odometry factors: ftodom(xt1,xt)=N(xtg(xt1,ut);0,Qt)f_t^{\text{odom}}(x_{t-1}, x_t) = \mathcal{N}(x_t - g(x_{t-1}, u_t); 0, Q_t) constraining consecutive poses via motion model gg
  3. Observation factors: ftobs(xt,mj)=N(zth(xt,mj);0,Rt)f_t^{\text{obs}}(x_t, m_j) = \mathcal{N}(z_t - h(x_t, m_j); 0, R_t) constraining a pose and a landmark via observation model hh

Main Theorems

Proposition

GraphSLAM as Nonlinear Least Squares

Statement

Under Gaussian noise assumptions, the MAP estimation problem is equivalent to a nonlinear least-squares problem:

θ=argminθk=1Nek(θ)Σk12\theta^* = \arg\min_{\theta} \sum_{k=1}^{N} \|e_k(\theta)\|_{\Sigma_k^{-1}}^2

where ek(θ)e_k(\theta) is the error function for factor kk, vΣ12=vΣ1v\|v\|_{\Sigma^{-1}}^2 = v^\top \Sigma^{-1} v is the Mahalanobis norm, and Σk\Sigma_k is the noise covariance for factor kk.

Intuition

Taking the negative log of the posterior (which is a product of Gaussians) converts multiplication into summation and Gaussian exponents into squared Mahalanobis norms. Maximizing the posterior becomes minimizing the sum of squared errors. Each factor contributes one error term, weighted by the inverse of its noise covariance.

Why It Matters

This equivalence transforms SLAM from a probabilistic inference problem into an optimization problem, giving access to the entire toolbox of nonlinear least squares: Gauss-Newton, Levenberg-Marquardt, and their sparse variants. These are well-understood algorithms with convergence guarantees for well-conditioned problems.

Failure Mode

The equivalence requires Gaussian noise. Non-Gaussian noise (e.g., heavy-tailed sensor errors, outlier measurements from incorrect data associations) breaks the least-squares formulation. Robust variants use M-estimators (Huber, Cauchy kernels) instead of squared error to downweight outliers, but lose the clean Gaussian interpretation.

Solving the Optimization

Gauss-Newton Method

Linearize each error function around the current estimate θ(k)\theta^{(k)}:

ei(θ)ei(θ(k))+Jiδθe_i(\theta) \approx e_i(\theta^{(k)}) + J_i \, \delta\theta

where Ji=ei/θθ(k)J_i = \partial e_i / \partial \theta |_{\theta^{(k)}} is the Jacobian. Substituting into the cost function and setting the gradient to zero gives the normal equations:

Hδθ=bH \, \delta\theta^* = -b

where:

H=iJiΣi1Jib=iJiΣi1ei(θ(k))H = \sum_i J_i^\top \Sigma_i^{-1} J_i \qquad b = \sum_i J_i^\top \Sigma_i^{-1} e_i(\theta^{(k)})

HH is the approximate Hessian (also called the information matrix). Update: θ(k+1)=θ(k)+δθ\theta^{(k+1)} = \theta^{(k)} + \delta\theta^*.

Levenberg-Marquardt

Add a damping term: (H+λI)δθ=b(H + \lambda I)\,\delta\theta = -b. When λ\lambda is large, the step is small and steepest-descent-like (safe but slow). When λ\lambda is small, the step is Gauss-Newton (fast but may overshoot). Adaptive λ\lambda interpolates between the two regimes.

Sparsity Structure

Proposition

Sparsity of the SLAM Information Matrix

Statement

The information matrix HRn×nH \in \mathbb{R}^{n \times n} (where n=dim(θ)n = \dim(\theta)) has a nonzero entry HijH_{ij} only if variables ii and jj appear together in at least one factor. For SLAM:

  • Odometry factors create a block-tridiagonal structure in the pose-pose block
  • Observation factors create sparse connections between poses and landmarks
  • The total number of nonzero entries is O(T+L)O(T + L) where LL is the number of observations, compared to the dense matrix size O(n2)O(n^2)

Solving Hδθ=bH\,\delta\theta = -b via sparse Cholesky factorization costs O(n)O(n) to O(n1.5)O(n^{1.5}) depending on the graph structure, compared to O(n3)O(n^3) for dense factorization.

Intuition

Each factor connects only a small number of variables (typically 2). The information matrix inherits this sparse connectivity: most variable pairs never appear in the same factor, so Hij=0H_{ij} = 0. A robot that has visited 1000 poses and observed 500 landmarks has a state vector of dimension \sim4500 (in 3D), but the information matrix has only \sim10,000 nonzero entries out of \sim20 million possible.

Why It Matters

Sparsity is what makes large-scale SLAM tractable. EKF-SLAM maintains a dense covariance matrix, which requires O(n2)O(n^2) storage and O(n2)O(n^2) per update. GraphSLAM exploits sparsity to solve problems with millions of poses and landmarks. The key insight: the information (inverse covariance) matrix is sparse even though the covariance matrix is dense. This is why GraphSLAM works in the information form.

Loop Closure

Loop closure occurs when the robot revisits a previously mapped location and recognizes it. This adds a constraint between the current pose and a distant past pose, creating a long-range edge in the factor graph.

Loop closure is critical because odometry drift accumulates over time. Without loop closures, the map becomes increasingly distorted. A single correct loop closure can correct drift accumulated over thousands of poses, because the optimization distributes the correction across the entire trajectory.

The danger of incorrect loop closure: A false positive (incorrectly believing you have revisited a location) introduces a constraint between unrelated poses. The optimizer satisfies this constraint by distorting the map, often catastrophically. Robust loop closure detection is one of the hardest problems in SLAM.

Practical Systems

Modern SLAM systems (ORB-SLAM, RTAB-Map, Google Cartographer) all use factor-graph-based backends. The frontend extracts features and detects loop closures. The backend (typically iSAM2 or g2o) solves the nonlinear least squares problem incrementally: when new factors arrive, the solution is updated efficiently without re-solving from scratch.

Incremental solving (iSAM2): Maintains a Bayes tree (a data structure encoding the sparse factorization) and updates only the affected portion of the tree when new factors are added. This gives amortized O(1)O(1) cost per new variable for chain-like graphs.

Common Confusions

Watch Out

GraphSLAM is batch, not filtering

EKF-SLAM processes observations one at a time and maintains a running state estimate. GraphSLAM collects all constraints and optimizes over the entire trajectory simultaneously. Incremental solvers (iSAM2) blur this distinction by updating the batch solution efficiently, but the formulation is still batch optimization, not filtering.

Watch Out

The information matrix is sparse but the covariance matrix is dense

In SLAM, the information matrix Λ=Σ1\Lambda = \Sigma^{-1} is sparse because each measurement connects only a few variables. But the covariance matrix Σ=Λ1\Sigma = \Lambda^{-1} is dense because every variable is correlated with every other variable through the chain of constraints. This is why information-form algorithms (like GraphSLAM) are efficient while covariance-form algorithms (like EKF-SLAM) are not.

Watch Out

SLAM is not just localization

Localization assumes the map is known and estimates only the robot's pose. Mapping assumes the robot's poses are known and builds the map. SLAM solves both simultaneously, which creates the chicken-and-egg difficulty: you need the map to localize, and you need to be localized to build the map.

Summary

  • SLAM as graph: nodes are poses and landmarks, edges are constraints
  • Factor graph factorizes the joint posterior as a product of local factors
  • MAP estimation under Gaussian noise reduces to nonlinear least squares
  • Solve with Gauss-Newton or Levenberg-Marquardt on the normal equations
  • The information matrix is sparse: O(n)O(n) nonzeros, not O(n2)O(n^2)
  • Sparsity enables solving SLAM with millions of variables
  • Loop closure corrects accumulated drift but false positives are catastrophic
  • Incremental solvers (iSAM2) update the solution efficiently as new data arrives

Exercises

ExerciseCore

Problem

A robot takes 100 steps, observing 20 landmarks along the way. Each pose is 3D (x,y,θx, y, \theta) and each landmark is 2D (x,yx, y). What is the dimension of the state vector θ\theta? If the robot makes 80 landmark observations (some landmarks are seen from multiple poses), how many factors are in the factor graph (including the prior and odometry)?

ExerciseAdvanced

Problem

Explain why a single loop closure observation can dramatically improve the map quality even though it is just one additional factor in a graph with hundreds of factors. What happens to the information matrix structure when a loop closure is added?

References

Canonical:

  • Thrun & Montemerlo, "The Graph-SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures" (2006)
  • Dellaert & Kaess, "Factor Graphs for Robot Perception" (2017)

Current:

  • Kaess et al., "iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree" (2012)
  • Cadena et al., "Past, Present, and Future of Simultaneous Localization and Mapping" (2016)

Next Topics

The natural next steps from GraphSLAM and factor graphs:

  • Particle filters: the sequential alternative for nonlinear state estimation
  • Convex optimization basics: the optimization tools underlying least-squares solvers

Last reviewed: April 2026

Builds on This