Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Beyond Llms

3D Gaussian Splatting

Represent a 3D scene as millions of 3D Gaussians, each with position, covariance, opacity, and color. Render by projecting to 2D and alpha-compositing. Real-time, high-quality novel view synthesis without neural networks at render time.

AdvancedTier 3Frontier~45 min
0

Why This Matters

Neural Radiance Fields (NeRFs) showed that you can reconstruct 3D scenes from photographs with stunning quality. But NeRFs are slow: rendering a single frame requires millions of neural network evaluations along each ray. 3D Gaussian Splatting (3DGS) achieves comparable or better quality at 100+ FPS by replacing the neural network with an explicit representation: a cloud of 3D Gaussians.

Instead of querying a neural network for every pixel, you project Gaussians onto the image plane and blend them. The result is differentiable (for optimization via gradient descent) and fast (for real-time rendering). Applications span VR/AR, robotics, autonomous driving, film production, and digital twins.

Mental Model

Imagine spraying millions of tiny, colored, semi-transparent blobs into a 3D scene. Each blob is an ellipsoidal Gaussian with a position, shape (covariance), opacity, and color that can vary with viewing direction. To render an image from a new viewpoint, project all the blobs onto the camera's image plane and blend them front-to-back. The blobs that are close to surfaces become dense and opaque; the blobs in empty space fade to transparent.

Formal Setup

Scene Representation

A scene is represented by a set of NN 3D Gaussians {Gi}i=1N\{G_i\}_{i=1}^{N}, where each Gaussian has:

  • Position μiR3\boldsymbol{\mu}_i \in \mathbb{R}^3: the center of the Gaussian
  • Covariance ΣiR3×3\boldsymbol{\Sigma}_i \in \mathbb{R}^{3 \times 3}: a positive semidefinite matrix defining the shape and orientation of the ellipsoid
  • Opacity αi[0,1]\alpha_i \in [0, 1]: how opaque the Gaussian is
  • Color represented by spherical harmonic (SH) coefficients, allowing view-dependent appearance
Definition

3D Gaussian

Each Gaussian GiG_i defines a density in 3D space:

Gi(x)=exp ⁣(12(xμi)TΣi1(xμi))G_i(\mathbf{x}) = \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu}_i)^T \boldsymbol{\Sigma}_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)\right)

The covariance Σi\boldsymbol{\Sigma}_i is parameterized as Σi=RiSiSiTRiT\boldsymbol{\Sigma}_i = \mathbf{R}_i \mathbf{S}_i \mathbf{S}_i^T \mathbf{R}_i^T, where Ri\mathbf{R}_i is a rotation matrix (stored as a quaternion) and Si\mathbf{S}_i is a diagonal scaling matrix. This ensures positive semidefiniteness. The eigenvalues of Σi\boldsymbol{\Sigma}_i determine the extent of the ellipsoid along each principal axis.

Definition

Spherical Harmonics for Color

Spherical harmonics (SH) encode view-dependent color. Each Gaussian stores SH coefficients {cilm}\{c_i^{lm}\} for each color channel. Given a viewing direction d\mathbf{d}, the color is:

ci(d)=l=0Lm=llcilmYlm(d)\mathbf{c}_i(\mathbf{d}) = \sum_{l=0}^{L} \sum_{m=-l}^{l} c_i^{lm} Y_l^m(\mathbf{d})

where YlmY_l^m are the spherical harmonic basis functions. Degree L=3L = 3 (16 coefficients per channel) is typical, capturing specular highlights and view- dependent shading effects.

Rendering

Proposition

Projection of 3D Gaussian to 2D

Statement

A 3D Gaussian with mean μ\boldsymbol{\mu} and covariance Σ\boldsymbol{\Sigma} projects to a 2D Gaussian on the image plane with mean μ=π(μ)\boldsymbol{\mu}' = \pi(\boldsymbol{\mu}) (the projected center) and covariance:

Σ=JWΣWTJT\boldsymbol{\Sigma}' = \mathbf{J} \mathbf{W} \boldsymbol{\Sigma} \mathbf{W}^T \mathbf{J}^T

where W\mathbf{W} is the world-to-camera rotation matrix and J\mathbf{J} is the Jacobian of the projective transformation evaluated at μ\boldsymbol{\mu}.

Intuition

Projecting a 3D ellipsoid through a camera gives a 2D ellipse. The Jacobian linearizes the perspective projection locally, turning the 3D covariance into a 2D covariance on the image plane. This is exact for affine projections and a good approximation for perspective when the Gaussian is not too close to the camera.

Why It Matters

This projection is computed once per Gaussian per frame and is the key to real-time performance. Unlike NeRF, which evaluates a neural network at many points along each ray, splatting projects each Gaussian to a 2D footprint and rasterizes it directly.

Proposition

Alpha-Compositing Rendering

Statement

The color of a pixel p\mathbf{p} is computed by alpha-compositing the contributing Gaussians sorted by depth:

C(p)=iNciαiGi(p)j=1i1(1αjGj(p))C(\mathbf{p}) = \sum_{i \in \mathcal{N}} \mathbf{c}_i \, \alpha_i \, G'_i(\mathbf{p}) \prod_{j=1}^{i-1} (1 - \alpha_j \, G'_j(\mathbf{p}))

where N\mathcal{N} is the set of Gaussians overlapping pixel p\mathbf{p}, Gi(p)G'_i(\mathbf{p}) is the 2D Gaussian evaluated at p\mathbf{p}, and the product term is the accumulated transmittance.

Intuition

Each Gaussian contributes its color weighted by its opacity and how much of it covers the pixel, attenuated by all the Gaussians in front of it. This is the standard volume rendering equation, but instead of integrating along a ray through a continuous field, you sum over discrete Gaussians. The sorting ensures correct occlusion.

Why It Matters

This rendering equation is fully differentiable with respect to all Gaussian parameters (position, covariance, opacity, color). Gradients flow from the image loss back to each Gaussian, enabling optimization by gradient descent. The tile-based rasterization makes this GPU-friendly and fast.

Optimization and Adaptive Density Control

Training starts from a sparse point cloud (from Structure-from-Motion) and optimizes the Gaussian parameters by minimizing a photometric loss:

L=(1λ)L1+λLD-SSIM\mathcal{L} = (1 - \lambda) \, \mathcal{L}_1 + \lambda \, \mathcal{L}_{\text{D-SSIM}}

where L1\mathcal{L}_1 is the pixel-wise L1L_1 error and LD-SSIM\mathcal{L}_{\text{D-SSIM}} is a structural similarity loss.

Adaptive density control adjusts the number of Gaussians during optimization:

  • Densification by cloning: Gaussians with large positional gradients in under-reconstructed regions are duplicated
  • Densification by splitting: Large Gaussians covering too much area are split into smaller ones
  • Pruning: Gaussians with very low opacity (α<ϵ\alpha < \epsilon) are removed

This grow-and-prune strategy lets the representation adapt its capacity to the scene complexity.

Gaussian Splatting vs NeRF

PropertyNeRF3D Gaussian Splatting
RepresentationImplicit (MLP); see also occupancy networks and neural fieldsExplicit (point cloud of Gaussians)
RenderingRay marching through MLPProject and rasterize Gaussians
Render speedSeconds per frame100+ FPS
Training speedHoursMinutes
MemoryLow (MLP weights)High (millions of Gaussians)
EditabilityHard (implicit)Easy (move, delete, add Gaussians)
QualityExcellentComparable or better

Common Confusions

Watch Out

Gaussians are not points

3D Gaussian Splatting is sometimes described as "fancy point rendering," but each Gaussian is an ellipsoidal volume, not a point. The covariance matrix defines the shape and extent. This is why the method can represent smooth surfaces: overlapping Gaussians blend together to form continuous appearance.

Watch Out

Sorting is per-tile, not global

A full depth sort of all Gaussians per pixel would be expensive. In practice, the scene is divided into image-space tiles (e.g., 16x16 pixels), and Gaussians are sorted per tile. This makes the sorting step GPU-parallelizable and introduces only minor artifacts at tile boundaries.

Summary

  • A 3D scene is represented as millions of 3D Gaussians, each with position, covariance, opacity, and spherical harmonic color
  • Rendering: project each Gaussian to 2D using the camera Jacobian, then alpha-composite front-to-back
  • The entire pipeline is differentiable: optimize Gaussian parameters by backpropagating through the renderer
  • Adaptive density control (clone, split, prune) adjusts the number of Gaussians during training
  • Real-time rendering (100+ FPS) at quality comparable to NeRF
  • The explicit representation enables easy editing, compositing, and streaming

Exercises

ExerciseCore

Problem

A 3D Gaussian has covariance Σ=diag(4,1,1)\boldsymbol{\Sigma} = \text{diag}(4, 1, 1). Describe the shape of this Gaussian in 3D space. If the camera looks along the x-axis, what shape does the projected 2D Gaussian have?

ExerciseAdvanced

Problem

Why is the covariance matrix parameterized as Σ=RSSTRT\boldsymbol{\Sigma} = \mathbf{R}\mathbf{S}\mathbf{S}^T\mathbf{R}^T instead of optimizing Σ\boldsymbol{\Sigma} directly? What goes wrong with direct optimization?

ExerciseResearch

Problem

3D Gaussian Splatting uses per-tile sorting for depth ordering. Under what conditions can this approximation produce visible artifacts? Propose a method to reduce these artifacts without global per-pixel sorting.

References

Canonical:

  • Kerbl et al., "3D Gaussian Splatting for Real-Time Radiance Field Rendering" (SIGGRAPH 2023)
  • Zwicker et al., "EWA Splatting" (2002)

Current:

  • Huang et al., "2D Gaussian Splatting for Accurate Surface Reconstruction" (SIGGRAPH 2024)

  • Lu et al., "Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering" (CVPR 2024)

  • Zhang et al., Dive into Deep Learning (2023), Chapters 14-17

Next Topics

Gaussian splatting connects to the broader landscape of 3D representation learning and real-time rendering for robotics, VR, and autonomous systems.

Last reviewed: April 2026