Gaussian splatting

What is gaussian splatting?

Quick definition

Gaussian splatting is a 3D representation technique that models scenes as collections of oriented 3D Gaussian functions, which are mathematical distributions that describe how light and color appear from any viewpoint. To help visualize this, imagine Gaussian splats as 'clouds of colored fog' or 'dabs of paint in space' that collectively form complex images. Unlike traditional polygon meshes that define surfaces through connected vertices, Gaussian splatting represents scenes through millions of small, semi-transparent ellipsoids that blend together during rendering to create fantastic 3D representations with efficient computation and natural support for progressive refinement.

What is gaussian splatting?

Gaussian splatting represents a fundamental shift in how 3D scenes are stored and rendered. For decades, 3D graphics have relied primarily on polygon meshes—surfaces defined by vertices, edges, and faces that approximate object geometry. While mesh-based approaches work well for hand-modeled content and enable precise geometric control, they struggle with the photorealistic reproduction of real-world scenes captured through photogrammetry or other scanning techniques. Converting complex real-world materials, lighting effects, and fine surface details into mesh geometry with textures requires significant manual optimization and often sacrifices fidelity.

Gaussian splatting takes a different approach, rooted in volumetric representations rather than surface geometry. Instead of defining explicit surfaces, Gaussian splat representations describe scenes through large collections of 3D Gaussian functions—each one a mathematical formula defining how a small region of space contributes color and opacity from any viewing angle. Think of each Gaussian as a semi-transparent, oriented ellipsoid floating in 3D space. Individually, these ellipsoids are simple. But arranged by the millions and rendered with appropriate blending, they can reconstruct photorealistic scenes with remarkable efficiency.

The technique originated from research into neural radiance fields (NeRF) and related approaches that have shown impressive achievements in photorealistic scene reconstruction from photographs. NeRF models scenes using neural networks that learn to predict color and density at any given point in space, enabling the synthesis of new views from perspectives not included in the original training photographs. However, NeRF's rendering process involves evaluating neural networks thousands of times per pixel, which makes real-time performance difficult despite its visual quality. Gaussian splatting builds upon these foundations by representing scenes with explicit parameters for each Gaussian splat, which reduces computational overhead and simplifies rendering, making it feasible for real-time applications.

Gaussian splatting achieves similar visual fidelity to neural approaches while enabling real-time rendering through an explicit representation. Rather than encoding scenes in neural network weights that require expensive evaluation, Gaussian splats store explicit parameters for each Gaussian—position, orientation, scale, color, and opacity. Rendering becomes a process of projecting these Gaussians onto the image plane and blending them in proper depth order, operations that modern GPUs execute efficiently. The results can be photorealistic quality approaching neural methods but with performance suitable for interactive applications.

What makes Gaussian splatting particularly significant for content distribution is its inherent compatibility with progressive transmission and adaptive fidelity. The representation consists of discrete elements (individual Gaussians) that can be transmitted independently, prioritized by importance, and refined progressively. This structural characteristic aligns naturally with streaming architectures in ways that monolithic mesh representations do not.

How gaussian splatting works

Gaussian splatting systems operate through several stages that transform captured scenes or synthetic content into renderable Gaussian representations, then display them through specialized rasterization techniques optimized for this format.

Scene capture and reconstruction can begin with multiple photographs or video frames of a scene from different viewpoints. Structure-from-motion algorithms analyze these images to determine camera positions and estimate rough 3D geometry. Similarly, engineers and artists can render photos of a 3D asset and generate a synthetic dataset with highly accurate camera position and pose. The reconstruction process then optimizes a Gaussian splat representation to match the input photographs—initially placing Gaussians throughout the scene volume, then iteratively adjusting their parameters (position, size, orientation, color, opacity) to minimize differences between rendered views and actual photographs.

This optimization process uses gradient-based learning similar to neural network training, but the parameters being optimized are explicit Gaussian properties rather than network weights. The system automatically determines how many Gaussians are needed, where they should be positioned, how they should be oriented and scaled, and what colors and opacities produce an accurate reproduction of input views. Dense regions with complex appearance detail accumulate more Gaussians. Simpler regions use fewer, larger Gaussians. The representation emerges through this optimization rather than requiring manual modeling.

Each Gaussian in the final representation is defined by several parameters. Position specifies its 3D location in space. A 3x3 covariance matrix describes its orientation and scale. In simpler terms, this matrix helps define the shape and orientation of the ellipsoid by determining how long and wide it is stretched in different directions. Color is typically stored as spherical harmonic coefficients. These coefficients can be thought of as a set of numbers that capture how the color of an object changes when viewed from different angles, which helps in simulating realistic lighting effects and reflections. Opacity determines how transparent or solid each Gaussian appears.

Rendering Gaussian splats projects these 3D Gaussians onto the 2D image plane from the current camera viewpoint. Each Gaussian becomes a 2D splat in screen space—an elliptical region with color and opacity that varies smoothly according to the Gaussian function. The renderer sorts these splats by depth and blends them from back to front, similar to alpha-blended transparency rendering but with mathematically-defined falloff from each splat's center.

Modern implementations optimize this rendering process through tile-based approaches that efficiently handle millions of Gaussians. The screen divides into small tiles, and the system determines which Gaussians affect which tiles, enabling parallel processing across many GPU cores. This tiled rendering, combined with the mathematical simplicity of Gaussian evaluation, enables real-time performance even for scenes containing tens of millions of splats.

Adaptive level-of-detail emerges naturally from the representation. Gaussians can be selectively removed based on viewing distance, screen-space size, or contribution to the final image without restructuring the entire dataset. This characteristic makes Gaussian splatting particularly amenable to streaming—important splats transmit first, followed by progressively finer detail. Users can interact with reduced-fidelity versions immediately while additional Gaussians arrive and enhance the representation.

Why gaussian splatting matters

Gaussian splatting addresses several limitations that have constrained photorealistic 3D content creation and distribution. The technique enables workflows and applications that weren't previously practical at acceptable quality or performance levels.

Photorealistic capture and reproduction become substantially more accessible. Converting real-world scenes into traditional 3D representations typically requires extensive manual optimization, such as simplifying geometry, baking lighting into textures, and adjusting materials to approximate complex real-world appearance under different lighting conditions. The process is time-consuming and often sacrifices fidelity for performance. Gaussian splatting automates this conversion while maintaining photographic quality, enabling rapid deployment of captured environments without manual asset optimization.

This automation matters particularly for applications requiring frequent content updates or large scene volumes. Retail product visualization benefits from rapid capture workflows—photograph products from multiple angles, generate Gaussian splat representations automatically, and deploy to e-commerce platforms at photorealistic quality. Real estate and architecture can capture properties through phone video, generate walkable 3D representations without manual modeling, and stream them to potential buyers instantly. Digital twin applications can regularly re-capture facilities to reflect current states without requiring 3D artists to manually update models.

Rendering performance at photorealistic quality shifts the feasibility boundary for real-time applications. Previous approaches to photorealistic rendering either required offline pre-computation (baked lighting, pre-rendered frames) or expensive real-time ray tracing with significant GPU requirements. Gaussian splatting achieves comparable visual quality with rendering costs closer to traditional rasterization, enabling photorealistic experiences on consumer devices including mobile phones and standalone VR headsets.

The explicit, discrete representation structure provides advantages for distribution and streaming. Unlike neural representations, where the scene encodes holistically in network weights, Gaussian splats consist of independent elements that can be transmitted separately, cached individually, and refined progressively. This modularity enables streaming architectures that deliver content with immediate interactivity that continuously improves. Here, users see reasonable quality within seconds, while additional Gaussians arrive to enhance fidelity.

File size efficiency relative to quality represents another practical advantage. High-quality Gaussian splat representations often compress to tens of megabytes for complete scenes that would require hundreds of megabytes or gigabytes as high-resolution meshes with detailed textures. This compression efficiency, combined with progressive refinement support, makes Gaussian splatting particularly suitable for web delivery where file size directly impacts user experience.

Industry adoption signals the technique's maturation from research to production use. Adobe integrated Gaussian splat support into Photoshop for 3D object generation. SideFX added native Gaussian splat handling to Houdini for VFX workflows. Snap's PlayCanvas engine supports Gaussian splat rendering for web-based experiences. This tooling ecosystem indicates Gaussian splatting is becoming a production-ready format alongside traditional meshes and textures, not just an experimental research technique.

Gaussian splatting vs. polygon meshes vs. neural radiance fields

These three approaches to representing 3D scenes reflect different tradeoffs between control, realism, performance, and workflow requirements. Polygon meshes define surfaces through vertices connected into triangular faces, with appearance controlled through texture maps and material properties. Meshes provide precise geometric control and efficient rendering on modern GPUs, making them ideal for hand-modeled content where artists need explicit control over every surface. Game engines, CAD systems, and animation pipelines are built around mesh representations.

However, meshes struggle with photorealistic reproduction of complex real-world appearance. Capturing fine geometric detail, subtle material variations, complex lighting interactions, and view-dependent effects in mesh format requires high polygon counts, multiple high-resolution texture maps, and sophisticated shaders. The manual optimization required to balance quality with performance makes mesh-based workflows time-consuming for photorealistic captured content.

Neural radiance fields (NeRF) represent the opposite extreme, where scenes are encoded entirely within neural network weights that learn to predict color and density at any point in space. NeRF achieves remarkable photorealistic quality from photographs, capturing subtle lighting effects and view-dependent appearance that meshes struggle with. But rendering requires evaluating neural networks thousands of times per pixel, making real-time performance impractical despite recent acceleration techniques. NeRF representations are also opaque, meaning you can't easily edit, segment, or manipulate specific scene elements because everything encodes holistically in network weights.

Gaussian splatting occupies a middle ground that captures many advantages of each approach while avoiding their primary limitations. Like NeRF, Gaussian splatting reconstructs scenes automatically from photographs without manual modeling. Like meshes, Gaussian splats render efficiently on standard GPU hardware at real-time frame rates. The explicit representation enables editing, segmentation, and progressive refinement that neural approaches make difficult.

The practical distinction manifests in workflow and deployment choices. Meshes are currently optimal for the creation of content where artists need precise control—animated characters, architectural designs, product models built from CAD specifications. Neural approaches serve applications prioritizing ultimate realism where rendering performance isn't critical, such as offline rendering, computational photography, research into scene understanding.

Gaussian splatting serves applications requiring photorealistic capture with real-time rendering and efficient distribution—retail product visualization captured from photographs, real estate walkthroughs generated from video, digital twins reconstructed from laser scans, and training simulations based on real environments. The combination of real-time performance and distribution-friendly representation characteristics makes Gaussian splatting particularly suitable for content that needs to reach users at scale.

Gaussian splatting and content streaming

The structural properties of Gaussian splat representations align particularly well with progressive streaming architectures. Unlike monolithic representations that must transmit completely before rendering, Gaussian splats consist of discrete, independent elements that can be transmitted selectively, prioritized by importance, and refined incrementally.

Spatial partitioning divides Gaussian splat scenes into transmittable regions based on 3D location. A room-scale environment might partition into spatial cells—transmit Gaussians for the currently-viewed region first, then progressively load adjacent areas as users navigate. This spatial streaming enables immediate interaction with local content while background data transfers handle surrounding areas.

Importance-based prioritization determines which Gaussians to transmit first based on their contribution to image quality. Gaussians covering larger screen areas, contributing more to visible pixels, or located in user focus regions receive priority. Less important Gaussians (e.g., small, distant, or outside current view frustums) transmit later. This prioritization ensures users see reasonable quality immediately, which progressively refines as additional splats arrive.

Progressive density refinement transmits Gaussians in multiple passes with increasing detail. Initial transmission might send a sparse subset (perhaps 10% of Gaussians) sufficient for basic scene structure and navigation. Subsequent passes progressively fill in detail, adding Gaussians that capture finer appearance variations, subtle geometric features, and view-dependent effects. Users interact with functional representations within seconds while quality continuously improves.

The mathematical properties of Gaussians support graceful quality degradation. Removing or not-yet-loading specific Gaussians doesn't create visual discontinuities or missing geometry, the remaining Gaussians simply blend together with slightly reduced accuracy. This contrasts with mesh streaming where missing triangles create visible holes, or texture streaming where absent texture data shows as obvious placeholder colors.

Adaptive fidelity adjusts Gaussian transmission based on network conditions and device capabilities. When bandwidth drops, streaming systems transmit fewer Gaussians prioritized by importance, maintaining interaction fluidity at reduced visual quality. When conditions improve, additional Gaussians transmit to enhance fidelity. This adaptation happens transparently—users experience appropriate quality for their current context rather than fixed-fidelity content that either works well or fails completely.

Caching efficiency benefits from Gaussian splats' discrete element structure. Frequently-viewed regions cache at edge servers as collections of Gaussians that serve multiple users without re-transmission from origins. Updates to scenes can transmit as differential Gaussian sets—add these splats, remove those splats—rather than replacing entire representations. This incremental update capability particularly benefits applications with dynamic content that changes over time.

Related terms & concepts

See also: 3D Streaming - The progressive delivery architecture that leverages gaussian splatting's discrete structure for efficient, adaptive content transmission.

‍See also: Level of detail (LOD) - The rendering technique of adjusting detail based on viewing distance, which gaussian splatting supports through selective splat transmission.