3D Gaussian Splatting

Sneiderman, Robby

Beyond LLMS

3D Gaussian Splatting

Represent a 3D scene as millions of 3D Gaussians, each with position, covariance, opacity, and color. Render by projecting to 2D and alpha-compositing. Real-time, high-quality novel view synthesis without neural networks at render time.

AdvancedTier 3FrontierSupporting~45 min

Prerequisites

Occupancy Networks and Neural Fields Positive Semidefinite Matrices Diffusion Models Webgpu for ML

Prereq Map

Learning position

Read this page in the graph.

beyond-llms | layer 4 | tier 3. This page has 4 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

WebGPU for Machine Learning

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Neural Radiance Fields (NeRFs) showed that you can reconstruct 3D scenes from photographs with stunning quality. But NeRFs are slow: rendering a single frame requires millions of neural network evaluations along each ray. 3D Gaussian Splatting (3DGS) achieves comparable or better quality at 100+ FPS by replacing the neural network with an explicit representation: a cloud of 3D Gaussians.

Instead of querying a neural network for every pixel, you project Gaussians onto the image plane and blend them. The result is differentiable (for optimization via gradient descent) and fast (for real-time rendering). Applications span VR/AR, robotics, autonomous driving, film production, and digital twins.

3D Gaussian splatting is a rendering pipeline, not just a representation

Sparse calibrated photos seed a point cloud. The optimizer inflates those points into anisotropic Gaussians, then each frame projects the Gaussians to 2D ellipses and alpha-composites them tile by tile.

Why it runs fast

No neural network is queried along every ray at render time. The heavy work shifts into rasterization, sorting, and alpha blending.

Why current papers still matter

$The original 3DGS paper solved real-time rendering; later work such as Mip-Splatting and 2DGS mostly fights aliasing, geometry quality, and view-consistency.$

Mental Model

Imagine spraying millions of tiny, colored, semi-transparent blobs into a 3D scene. Each blob is an ellipsoidal Gaussian with a position, shape (covariance), opacity, and color that can vary with viewing direction. To render an image from a new viewpoint, project all the blobs onto the camera's image plane and blend them front-to-back. The blobs that are close to surfaces become dense and opaque; the blobs in empty space fade to transparent.

Formal Setup

Scene Representation

A scene is represented by a set of $N$ 3D Gaussians $\{G_i\}_{i=1}^{N}$ , where each Gaussian has:

Position $\boldsymbol{\mu}_i \in \mathbb{R}^3$ : the center of the Gaussian
Covariance $\boldsymbol{\Sigma}_i \in \mathbb{R}^{3 \times 3}$ : a positive semidefinite matrix defining the shape and orientation of the ellipsoid
Opacity $\alpha_i \in [0, 1]$ : how opaque the Gaussian is
Color represented by spherical harmonic (SH) coefficients, allowing view-dependent appearance

Definition

3D Gaussian

Each Gaussian $G_i$ defines an unnormalized volumetric contribution in 3D space:

$G_i(\mathbf{x}) = \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu}_i)^T \boldsymbol{\Sigma}_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)\right)$

Note the absence of the $(2\pi)^{-3/2} |\boldsymbol{\Sigma}_i|^{-1/2}$ normalizer: this is not a probability density. The normalizer is folded into the opacity $\alpha_i$ , so $G_i(\mathbf{x}) \in (0, 1]$ acts as a spatial weighting factor that peaks at the center and decays to zero.

The covariance $\boldsymbol{\Sigma}_i$ is parameterized as $\boldsymbol{\Sigma}_i = \mathbf{R}_i \mathbf{S}_i \mathbf{S}_i^T \mathbf{R}_i^T$ , where $\mathbf{R}_i$ is a rotation matrix (stored as a quaternion) and $\mathbf{S}_i$ is a diagonal scaling matrix. This ensures positive semidefiniteness. The eigenvalues of $\boldsymbol{\Sigma}_i$ determine the extent of the ellipsoid along each principal axis.

Definition

Spherical Harmonics for Color

Spherical harmonics (SH) encode view-dependent color. Each Gaussian stores SH coefficients $\{c_i^{lm}\}$ for each color channel. Given a viewing direction $\mathbf{d}$ , the color is:

$\mathbf{c}_i(\mathbf{d}) = \sum_{l=0}^{L} \sum_{m=-l}^{l} c_i^{lm} Y_l^m(\mathbf{d})$

where $Y_l^m$ are the spherical harmonic basis functions. Degree $L = 3$ (16 coefficients per channel) is typical, capturing specular highlights and view- dependent shading effects.

Rendering

Proposition

Projection of 3D Gaussian to 2D

Statement

A 3D Gaussian with mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Sigma}$ projects to a 2D Gaussian on the image plane with mean $\boldsymbol{\mu}' = \pi(\boldsymbol{\mu})$ (the projected center) and covariance:

$\boldsymbol{\Sigma}' = \mathbf{J} \mathbf{W} \boldsymbol{\Sigma} \mathbf{W}^T \mathbf{J}^T$

where $\mathbf{W}$ is the world-to-camera rotation matrix and $\mathbf{J}$ is the Jacobian of the projective transformation evaluated at $\boldsymbol{\mu}$ .

Intuition

Projecting a 3D ellipsoid through a camera gives a 2D ellipse. The Jacobian linearizes the perspective projection locally, turning the 3D covariance into a 2D covariance on the image plane. This is exact for affine projections and a good approximation for perspective when the Gaussian is not too close to the camera.

Why It Matters

This projection is computed once per Gaussian per frame and is the key to real-time performance. Unlike NeRF, which evaluates a neural network at many points along each ray, splatting projects each Gaussian to a 2D footprint and rasterizes it directly.

report a correction →

Proposition

Alpha-Compositing Rendering

Statement

The color of a pixel $\mathbf{p}$ is computed by alpha-compositing the contributing Gaussians sorted by depth:

$C(\mathbf{p}) = \sum_{i \in \mathcal{N}} \mathbf{c}_i \, \alpha_i \, G'_i(\mathbf{p}) \prod_{j=1}^{i-1} (1 - \alpha_j \, G'_j(\mathbf{p}))$

where $\mathcal{N}$ is the set of Gaussians overlapping pixel $\mathbf{p}$ , $G'_i(\mathbf{p})$ is the 2D Gaussian evaluated at $\mathbf{p}$ , and the product term is the accumulated transmittance.

Intuition

Each Gaussian contributes its color weighted by its opacity and how much of it covers the pixel, attenuated by all the Gaussians in front of it. This is the standard volume rendering equation, but instead of integrating along a ray through a continuous field, you sum over discrete Gaussians. The sorting ensures correct occlusion.

Why It Matters

This rendering equation is fully differentiable with respect to all Gaussian parameters (position, covariance, opacity, color). Gradients flow from the image loss back to each Gaussian, enabling optimization by gradient descent. The tile-based rasterization makes this GPU-friendly and fast.

report a correction →

Optimization and Adaptive Density Control

Training starts from a sparse point cloud (from Structure-from-Motion) and optimizes the Gaussian parameters by minimizing a photometric loss:

$\mathcal{L} = (1 - \lambda) \, \mathcal{L}_1 + \lambda \, \mathcal{L}_{\text{D-SSIM}}$

where $\mathcal{L}_1$ is the pixel-wise $L_1$ error and $\mathcal{L}_{\text{D-SSIM}}$ is a structural similarity loss.

Adaptive density control adjusts the number of Gaussians during optimization:

Densification by cloning: Gaussians with large positional gradients in under-reconstructed regions are duplicated
Densification by splitting: Large Gaussians covering too much area are split into smaller ones
Pruning: Gaussians with very low opacity ( $\alpha < \epsilon$ ) are removed

This grow-and-prune strategy lets the representation adapt its capacity to the scene complexity.

What Moved After The Original 3DGS Paper

The 2023 Kerbl et al. paper established the core pipeline: anisotropic 3D Gaussians, tile-based depth sorting, alpha compositing, and adaptive densification. The next wave of papers mostly tightened three weak spots:

Aliasing across scale changes. Mip-Splatting introduced filtering so zooming and multi-scale rendering do not produce the jagged or blurred artifacts common in naive 3DGS.
Geometry quality. 2D Gaussian Splatting constrains the primitives toward surface-like elements, which often improves reconstruction fidelity and mesh extraction.
Structure and scalability. Methods like Scaffold-GS add stronger organization around the Gaussian set so rendering stays view-adaptive without exploding the primitive count.

So the field is no longer just "3DGS versus NeRF." It is already a family of explicit rendering pipelines with different tradeoffs in anti-aliasing, geometry, memory, and scene scale.

Gaussian Splatting vs NeRF

Property	NeRF	3D Gaussian Splatting
Representation	Implicit (MLP); see also occupancy networks and neural fields	Explicit (point cloud of Gaussians)
Rendering	Ray marching through MLP	Project and rasterize Gaussians
Render speed	Seconds per frame	100+ FPS
Training speed	Hours	Minutes
Memory	Low (MLP weights)	High (millions of Gaussians)
Editability	Hard (implicit)	Easy (move, delete, add Gaussians)
Quality	Excellent	Comparable or better

Common Confusions

Watch Out

Gaussians are not points

3D Gaussian Splatting is sometimes described as "fancy point rendering," but each Gaussian is an ellipsoidal volume, not a point. The covariance matrix defines the shape and extent. This is why the method can represent smooth surfaces: overlapping Gaussians blend together to form continuous appearance.

Watch Out

Sorting is per-tile, not global

A full depth sort of all Gaussians per pixel would be expensive. In practice, the scene is divided into image-space tiles (e.g., 16x16 pixels), and Gaussians are sorted per tile. This makes the sorting step GPU-parallelizable and introduces only minor artifacts at tile boundaries.

Summary

A 3D scene is represented as millions of 3D Gaussians, each with position, covariance, opacity, and spherical harmonic color
Rendering: project each Gaussian to 2D using the camera Jacobian, then alpha-composite front-to-back
The entire pipeline is differentiable: optimize Gaussian parameters by backpropagating through the renderer
Adaptive density control (clone, split, prune) adjusts the number of Gaussians during training
Real-time rendering (100+ FPS) at quality comparable to NeRF
The explicit representation enables easy editing, compositing, and streaming

Exercises

ExerciseCore

Problem

A 3D Gaussian has covariance $\boldsymbol{\Sigma} = \text{diag}(4, 1, 1)$ . Describe the shape of this Gaussian in 3D space. If the camera looks along the x-axis, what shape does the projected 2D Gaussian have?

ExerciseAdvanced

Problem

Why is the covariance matrix parameterized as $\boldsymbol{\Sigma} = \mathbf{R}\mathbf{S}\mathbf{S}^T\mathbf{R}^T$ instead of optimizing $\boldsymbol{\Sigma}$ directly? What goes wrong with direct optimization?

ExerciseResearch

Problem

3D Gaussian Splatting uses per-tile sorting for depth ordering. Under what conditions can this approximation produce visible artifacts? Propose a method to reduce these artifacts without global per-pixel sorting.

References

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuhler, and George Drettakis, 3D Gaussian Splatting for Real-Time Radiance Field Rendering, SIGGRAPH 2023. The canonical 3DGS paper.
Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross, EWA Splatting, IEEE TVCG 2002. The classical splatting lineage behind elliptical footprints and filtering.
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger, Mip-Splatting: Alias-Free 3D Gaussian Splatting, CVPR 2024. Best primary source on anti-aliasing in modern Gaussian splatting.
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao, 2D Gaussian Splatting for Geometrically Accurate Radiance Fields, SIGGRAPH 2024. Strongest current source on the geometry-focused branch.
Yihang Lu, Chenpeng Zhang, Xiaoqi Wang, Ziyi Liu, Lifeng Luo, Jun Zhu, and Weiwei Xu, Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering, CVPR 2024. Important systems paper on structure and scalability.
Ben Mildenhall et al., NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV 2020. The immediate baseline Gaussian splatting was compared against.
Nelson Max, Optical Models for Direct Volume Rendering, IEEE TVCG 1995. Classic alpha-compositing reference.
Ayush Tewari et al., Advances in Neural Rendering, Eurographics STAR 2022. Good bridge from neural fields to explicit rendering pipelines.

Next Topics

Gaussian splatting connects to the broader landscape of 3D representation learning and real-time rendering for robotics, VR, and autonomous systems:

WebGPU for Machine Learning for the browser GPU substrate behind future rendering demos,
Occupancy Networks and Neural Fields for the implicit-representation side,
and GraphSLAM and Factor Graphs for a robotics-facing geometric estimation counterpart.

Last reviewed: May 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Positive Semidefinite Matriceslayer 0A · tier 1
Diffusion Modelslayer 4 · tier 1
WebGPU for Machine Learninglayer 0B · tier 2
Occupancy Networks and Neural Fieldslayer 4 · tier 3

Derived topics

0

No published topic currently declares this as a prerequisite.