Beyond Llms
3D Gaussian Splatting
Represent a 3D scene as millions of 3D Gaussians, each with position, covariance, opacity, and color. Render by projecting to 2D and alpha-compositing. Real-time, high-quality novel view synthesis without neural networks at render time.
Why This Matters
Neural Radiance Fields (NeRFs) showed that you can reconstruct 3D scenes from photographs with stunning quality. But NeRFs are slow: rendering a single frame requires millions of neural network evaluations along each ray. 3D Gaussian Splatting (3DGS) achieves comparable or better quality at 100+ FPS by replacing the neural network with an explicit representation: a cloud of 3D Gaussians.
Instead of querying a neural network for every pixel, you project Gaussians onto the image plane and blend them. The result is differentiable (for optimization via gradient descent) and fast (for real-time rendering). Applications span VR/AR, robotics, autonomous driving, film production, and digital twins.
Mental Model
Imagine spraying millions of tiny, colored, semi-transparent blobs into a 3D scene. Each blob is an ellipsoidal Gaussian with a position, shape (covariance), opacity, and color that can vary with viewing direction. To render an image from a new viewpoint, project all the blobs onto the camera's image plane and blend them front-to-back. The blobs that are close to surfaces become dense and opaque; the blobs in empty space fade to transparent.
Formal Setup
Scene Representation
A scene is represented by a set of 3D Gaussians , where each Gaussian has:
- Position : the center of the Gaussian
- Covariance : a positive semidefinite matrix defining the shape and orientation of the ellipsoid
- Opacity : how opaque the Gaussian is
- Color represented by spherical harmonic (SH) coefficients, allowing view-dependent appearance
3D Gaussian
Each Gaussian defines a density in 3D space:
The covariance is parameterized as , where is a rotation matrix (stored as a quaternion) and is a diagonal scaling matrix. This ensures positive semidefiniteness. The eigenvalues of determine the extent of the ellipsoid along each principal axis.
Spherical Harmonics for Color
Spherical harmonics (SH) encode view-dependent color. Each Gaussian stores SH coefficients for each color channel. Given a viewing direction , the color is:
where are the spherical harmonic basis functions. Degree (16 coefficients per channel) is typical, capturing specular highlights and view- dependent shading effects.
Rendering
Projection of 3D Gaussian to 2D
Statement
A 3D Gaussian with mean and covariance projects to a 2D Gaussian on the image plane with mean (the projected center) and covariance:
where is the world-to-camera rotation matrix and is the Jacobian of the projective transformation evaluated at .
Intuition
Projecting a 3D ellipsoid through a camera gives a 2D ellipse. The Jacobian linearizes the perspective projection locally, turning the 3D covariance into a 2D covariance on the image plane. This is exact for affine projections and a good approximation for perspective when the Gaussian is not too close to the camera.
Why It Matters
This projection is computed once per Gaussian per frame and is the key to real-time performance. Unlike NeRF, which evaluates a neural network at many points along each ray, splatting projects each Gaussian to a 2D footprint and rasterizes it directly.
Alpha-Compositing Rendering
Statement
The color of a pixel is computed by alpha-compositing the contributing Gaussians sorted by depth:
where is the set of Gaussians overlapping pixel , is the 2D Gaussian evaluated at , and the product term is the accumulated transmittance.
Intuition
Each Gaussian contributes its color weighted by its opacity and how much of it covers the pixel, attenuated by all the Gaussians in front of it. This is the standard volume rendering equation, but instead of integrating along a ray through a continuous field, you sum over discrete Gaussians. The sorting ensures correct occlusion.
Why It Matters
This rendering equation is fully differentiable with respect to all Gaussian parameters (position, covariance, opacity, color). Gradients flow from the image loss back to each Gaussian, enabling optimization by gradient descent. The tile-based rasterization makes this GPU-friendly and fast.
Optimization and Adaptive Density Control
Training starts from a sparse point cloud (from Structure-from-Motion) and optimizes the Gaussian parameters by minimizing a photometric loss:
where is the pixel-wise error and is a structural similarity loss.
Adaptive density control adjusts the number of Gaussians during optimization:
- Densification by cloning: Gaussians with large positional gradients in under-reconstructed regions are duplicated
- Densification by splitting: Large Gaussians covering too much area are split into smaller ones
- Pruning: Gaussians with very low opacity () are removed
This grow-and-prune strategy lets the representation adapt its capacity to the scene complexity.
Gaussian Splatting vs NeRF
| Property | NeRF | 3D Gaussian Splatting |
|---|---|---|
| Representation | Implicit (MLP); see also occupancy networks and neural fields | Explicit (point cloud of Gaussians) |
| Rendering | Ray marching through MLP | Project and rasterize Gaussians |
| Render speed | Seconds per frame | 100+ FPS |
| Training speed | Hours | Minutes |
| Memory | Low (MLP weights) | High (millions of Gaussians) |
| Editability | Hard (implicit) | Easy (move, delete, add Gaussians) |
| Quality | Excellent | Comparable or better |
Common Confusions
Gaussians are not points
3D Gaussian Splatting is sometimes described as "fancy point rendering," but each Gaussian is an ellipsoidal volume, not a point. The covariance matrix defines the shape and extent. This is why the method can represent smooth surfaces: overlapping Gaussians blend together to form continuous appearance.
Sorting is per-tile, not global
A full depth sort of all Gaussians per pixel would be expensive. In practice, the scene is divided into image-space tiles (e.g., 16x16 pixels), and Gaussians are sorted per tile. This makes the sorting step GPU-parallelizable and introduces only minor artifacts at tile boundaries.
Summary
- A 3D scene is represented as millions of 3D Gaussians, each with position, covariance, opacity, and spherical harmonic color
- Rendering: project each Gaussian to 2D using the camera Jacobian, then alpha-composite front-to-back
- The entire pipeline is differentiable: optimize Gaussian parameters by backpropagating through the renderer
- Adaptive density control (clone, split, prune) adjusts the number of Gaussians during training
- Real-time rendering (100+ FPS) at quality comparable to NeRF
- The explicit representation enables easy editing, compositing, and streaming
Exercises
Problem
A 3D Gaussian has covariance . Describe the shape of this Gaussian in 3D space. If the camera looks along the x-axis, what shape does the projected 2D Gaussian have?
Problem
Why is the covariance matrix parameterized as instead of optimizing directly? What goes wrong with direct optimization?
Problem
3D Gaussian Splatting uses per-tile sorting for depth ordering. Under what conditions can this approximation produce visible artifacts? Propose a method to reduce these artifacts without global per-pixel sorting.
References
Canonical:
- Kerbl et al., "3D Gaussian Splatting for Real-Time Radiance Field Rendering" (SIGGRAPH 2023)
- Zwicker et al., "EWA Splatting" (2002)
Current:
-
Huang et al., "2D Gaussian Splatting for Accurate Surface Reconstruction" (SIGGRAPH 2024)
-
Lu et al., "Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering" (CVPR 2024)
-
Zhang et al., Dive into Deep Learning (2023), Chapters 14-17
Next Topics
Gaussian splatting connects to the broader landscape of 3D representation learning and real-time rendering for robotics, VR, and autonomous systems.
Last reviewed: April 2026