Beyond Llms
Occupancy Networks and Neural Fields
Representing 3D geometry and appearance as continuous functions parameterized by neural networks: NeRF, occupancy networks, DeepSDF, volume rendering, and the connection to Gaussian splatting.
Prerequisites
Why This Matters
Traditional 3D representations (meshes, voxel grids, point clouds) are discrete and fixed-resolution. Neural fields represent 3D geometry and appearance as continuous functions parameterized by neural networks. This allows querying the scene at arbitrary resolution and learning 3D structure directly from 2D images.
NeRF (Neural Radiance Fields) demonstrated that a simple MLP can represent complex scenes with photorealistic quality, trained only from posed photographs. This opened new directions in 3D reconstruction, view synthesis, and scene understanding.
Mental Model
A neural field is a function where the input is a coordinate (position in space, or position plus viewing direction) and the output is a property at that coordinate (color, density, occupancy, signed distance). The network parameters encode the entire scene. Querying the function at a new coordinate gives you the scene property at that point.
Neural Radiance Fields (NeRF)
Neural Radiance Field
A NeRF represents a scene as a continuous function:
where is 3D position, is viewing direction, is emitted color, and is volume density. The density depends only on position (geometry is view-independent), while color depends on both position and direction (capturing view-dependent effects like specular highlights).
The network architecture is a simple MLP with positional encoding. The input coordinates are mapped through sinusoidal functions at multiple frequencies before being fed to the network:
This positional encoding lets the MLP represent high-frequency spatial detail that it would otherwise smooth over (due to the spectral bias of MLPs toward low-frequency functions).
Volume Rendering
Volume Rendering for Neural Radiance Fields
Statement
The expected color of a camera ray is:
where is the accumulated transmittance from the near plane to point . The product gives the probability density that the ray terminates at .
Intuition
A ray travels through space, accumulating color from each point weighted by two factors: how dense the material is at that point () and how much light has already been blocked before reaching that point (). Dense regions contribute more color. Regions behind opaque surfaces contribute nothing because is near zero.
Proof Sketch
Model light transport as a 1D absorption-emission process along the ray. The transmittance satisfies , giving the exponential form. The color integral follows from summing the emitted radiance at each point, weighted by the probability of the ray reaching that point and being absorbed there.
Why It Matters
This equation is differentiable with respect to and , which are outputs of the neural network. By comparing the rendered pixel color to the observed pixel color in a training image, you can backpropagate through the volume rendering integral to train the NeRF. The only supervision needed is posed 2D images.
Failure Mode
The integral is approximated by quadrature (summing over discrete samples along the ray). Too few samples produce aliasing and miss thin structures. Too many samples are computationally expensive. Hierarchical sampling (coarse then fine) mitigates this but does not eliminate it. Training also requires accurate camera poses; errors in pose estimation produce blurry reconstructions.
In practice, the integral is approximated as:
where is the distance between adjacent samples and .
Occupancy Networks
Occupancy Network
An occupancy network represents a 3D surface as the decision boundary of a classifier:
where is the probability that point is inside the object. The surface is the level set .
The surface can be extracted at any resolution using marching cubes on a grid of query points. Unlike voxel grids, the resolution is limited only by the density of the query grid, not by the representation itself.
DeepSDF: Signed Distance Functions
DeepSDF
A neural signed distance function maps points to their signed distance from the surface:
where outside the object, inside, and on the surface. The gradient gives the surface normal at any point.
DeepSDF has a geometric advantage over occupancy networks: the SDF value gives the distance to the nearest surface point, enabling efficient sphere tracing for rendering and providing a natural regularizer ( almost everywhere for a true SDF).
Gaussian Splatting
3D Gaussian Splatting (2023) represents scenes as a collection of 3D Gaussian primitives, each with position, covariance, color, and opacity. Rendering projects these Gaussians onto the image plane and alpha-composites them.
This is an explicit representation (a finite set of primitives with explicit parameters) rather than an implicit one (a function evaluated at query points). The key advantages:
- Rendering speed: Rasterization of Gaussians is much faster than ray marching through a neural field. Real-time rendering at high resolution is possible.
- Optimization: Each Gaussian's parameters are optimized directly via gradient descent on the rendering loss. Adaptive densification adds Gaussians where the reconstruction error is high.
The tradeoff: Gaussian splatting requires storing millions of Gaussian parameters (memory-intensive), while NeRF compresses the scene into a compact MLP. NeRF generalizes better to unseen viewpoints; Gaussian splatting can have artifacts at extreme novel views.
Common Confusions
Neural fields are not neural networks that output meshes
A neural field is a function from coordinates to properties, evaluated pointwise. It does not output a mesh or point cloud directly. Extracting a mesh requires querying the field on a dense grid and running marching cubes (for occupancy/SDF) or rendering many views (for NeRF). The representation is continuous and implicit; the mesh is a derived output.
NeRF requires posed images, not just any photo collection
NeRF needs accurate camera intrinsics and extrinsics (position and orientation) for each training image. These are typically obtained from structure-from-motion (SfM) tools like COLMAP. Without accurate poses, NeRF cannot learn a consistent 3D scene. Recent work (Nerfacto, BARF) jointly optimizes poses and the neural field, but this remains harder than the fixed-pose setting.
Gaussian splatting is not a neural network
3D Gaussian Splatting uses gradient-based optimization but the scene representation is a set of Gaussians with explicit parameters, not a neural network. There are no learned weights, hidden layers, or activation functions. It is a differentiable rendering framework, not a neural field.
Key Takeaways
- Neural fields represent 3D scenes as continuous functions parameterized by neural networks
- NeRF maps (position, direction) to (color, density) and renders via volume integration
- Volume rendering is differentiable, enabling training from 2D images alone
- Occupancy networks use a binary classifier; DeepSDF uses signed distance
- Gaussian splatting trades implicit compactness for explicit rendering speed
- Positional encoding is critical for representing high-frequency detail in MLPs
Exercises
Problem
A NeRF samples 64 points along each ray, and the image is 800x800 pixels. How many forward passes through the MLP are needed to render one image? If each forward pass takes 10 microseconds, how long does rendering take?
Problem
Explain why a standard MLP without positional encoding struggles to represent a scene with sharp edges and fine texture. What does the positional encoding specifically enable?
References
Canonical:
- Mildenhall et al., "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" (ECCV 2020)
- Mescheder et al., "Occupancy Networks: Learning 3D Reconstruction in Function Space" (CVPR 2019)
Current:
-
Kerbl et al., "3D Gaussian Splatting for Real-Time Radiance Field Rendering" (SIGGRAPH 2023)
-
Park et al., "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" (CVPR 2019)
-
Zhang et al., Dive into Deep Learning (2023), Chapters 14-17
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1
- Matrix Operations and PropertiesLayer 0A