Neural Radiance Fields (NeRFs)

What are neural radiance fields (NeRFs)?

‍

Quick definition

Neural Radiance Fields (NeRF) represent 3D scenes as neural networks trained on photographs or synthetically generated datasets. Given a set of images captured from different angles, a NeRF learns to predict the scene from any viewpoint, including angles not captured in the images or videos. Train a NeRF on photos from 20 angles, and it can render convincing images from any angle without ever constructing a mesh or polygon model.

How NeRFs work

A NeRF encodes a scene as a continuous function. Given a 3D coordinate (x, y, z) and a viewing direction, it outputs the color and density at that point. The neural network (typically a multilayer perceptron) learns this function by training on photographs with known camera positions.

During training, the network compares its rendered predictions with ground-truth images and updates its weights to minimize the difference between them. After training, rendering a new view requires shooting virtual rays through the scene, sampling the network at points along each ray, and compositing colors based on predicted densities. This volumetric approach naturally handles complex phenomena like transparency, reflections, and fine geometric detail that challenge traditional polygon-based representations.

The original NeRF paper (2020) demonstrated remarkable visual quality but required hours of training and seconds per frame to render. Subsequent research has dramatically improved both. Methods such as Instant-NGP reduced training time to minutes and rendering time to interactive rates, though real-time performance on consumer hardware remains challenging.

NeRF vs. traditional 3D representations

Traditional 3D pipelines operate on explicit geometry: meshes define surfaces as collections of triangles, textures map images onto those surfaces, and renderers compute how light interacts with the resulting scene. Creating these assets from real-world captures (photogrammetry) requires reconstructing geometry, unwrapping UVs, and baking textures, which together constitute a complex pipeline with multiple failure modes.

NeRFs bypass explicit geometry entirely. The neural network implicitly encodes shape, appearance, and lighting effects as learned weights. This makes NeRFs exceptionally good at capturing complex, hard-to-model phenomena like hair, fur, transparent materials, and intricate structures like foliage or fabric. Details that would require enormous polygon counts or sophisticated shaders emerge naturally from the learned representation.

The tradeoff is editability. Mesh-based assets can be modified, rigged, animated, and composited using established tools and workflows, whereas NeRFs encode everything in opaque network weights. Changes to a scene element mean retraining. NeRFs excel at capture and replay, less so at manipulation.

Practical applications

NeRFs have gained traction where photorealistic capture is more important than editability. Real estate virtual tours, cultural heritage preservation, visual effects reference capture, and product photography can all benefit from NeRF's ability to synthesize convincing novel views from relatively sparse photo sets.

The technology also enables new creative possibilities. Filmmakers can capture scenes and reframe shots in post-production. Photographers can generate additional angles from existing shoots. Researchers use NeRFs to study and reconstruct historical sites, natural environments, and scientific specimens.

NeRFs and gaussian splatting

Gaussian splatting emerged partly in response to the rendering costs of NeRFs. While NeRFs require evaluating a neural network at many points along each ray, Gaussian splatting represents scenes as collections of 3D Gaussians that can be rasterized directly, offering a simpler, faster operation that modern GPUs handle efficiently.

Both approaches serve similar capture-to-render workflows and often achieve comparable visual quality. Gaussian splats tend to render faster, are editable, and are easier to stream progressively; NeRFs can sometimes achieve higher fidelity for complex, view-dependent effects. But in practice, the field is evolving rapidly, with hybrid approaches and new techniques appearing regularly.

For delivering NeRF-based content at scale, conversion to more streaming-friendly representations (e.g. Gaussian splats)is often warranted. The capture quality of neural approaches combined with efficient delivery representations offers a practical path to photorealistic 3D on the web.