Miris

Summary

Meshes weren't designed for streaming. glTF assets with complex geometry and high-resolution textures can take seconds to parse and download on mobile devices. Field-based representations like Gaussian splatting decode faster and can begin rendering before the full asset arrives.
Degradation characteristics matter for variable networks. Reducing mesh resolution produces geometric distortion (jagged edges, broken silhouettes). Reducing splat density produces blur. Users tolerate blur better, making field-based formats more forgiving under bandwidth constraints.
The tooling and standards are catching up. PlayCanvas, Luma, and Unity now support Gaussian splat rendering. Khronos is developing a glTF extension for the format. This is moving from research into production infrastructure.

High-fidelity 3D assets often translate poorly to the web. Whether generated from photogrammetry, CAD exports, or dense authored meshes, a single glTF file can easily exceed hundreds of megabytes once geometry and high-resolution textures are bundled together. Attempting to load assets like this in any browser exposes platform limits quickly, and mobile browsers expose them fastest.

On mid-range Android devices, complex 3D assets routinely produce single-digit frame rates once rendering begins. On iOS, stricter memory limits and aggressive process management can terminate the browser session before parsing or GPU upload completes. The practical question isn't whether an asset can load. It's whether it can load reliably and render interactively within mobile constraints.

Polygonal meshes with textures were designed for offline-rendered animations and games running on engines optimized for specific hardware. They weren't designed as a delivery format for streaming 3D content over variable networks to unknown devices. As developers push content from controlled environments (render farms, high-powered workstations, consoles) to the web and beyond, these representations introduce friction at every stage of the pipeline: creation, storage, encoding, transmission, decode, and rendering.

Field-based representations offer a different approach. They're not universally better than meshes, but for streaming high-fidelity 3D content at scale, they address the right constraints.

Where traditional meshes break down

The architectural problems with mesh-based streaming start with file size, but the deeper issues emerge at load time and during rendering.

Loading a glTF asset involves JSON parsing, decompression of geometry and textures (Draco, Meshopt, Basis/KTX2), typed array construction, and GPU buffer uploads. On mobile CPUs, this sequence can consume hundreds of milliseconds for complex scenes. Main-thread execution, memory allocation, and GPU synchronization often block meaningful rendering until the entire pipeline completes.

There are also practical limits to geometric and texture complexity. In frameworks like Three.js or Babylon.js, performance degrades as vertex counts, draw calls, material complexity, and texture memory increase. While loading meshes with hundreds of thousands of polygons is technically possible, doing so quickly becomes constrained by CPU submission costs, GPU throughput, and memory bandwidth, particularly on mobile and integrated GPUs. Beyond a certain threshold, frame times increase, rendering throughput drops, and visible frame rate degradation follows.

The conventional mitigation is levels of detail (LOD): authoring multiple mesh variants at fixed resolutions and switching between them based on screen coverage or camera distance. While effective in many real-time engines, this model fits streaming scenarios poorly. Each LOD represents redundant geometry that must be stored, managed, and often downloaded in full. Transitions between LOD levels can produce visible popping artifacts, and in immersive environments like AR and VR, these discontinuities negatively impact the user's experience.

Addressing these limitations typically requires labor-intensive optimization: retopology, texture baking, and manual LOD creation. Field-based workflows aren't optimization-free (e.g., splat data still needs filtering, compression, and spatial organization) but the process is more automatable and doesn't require the same destructive simplification of source geometry. The result is a shorter path from capture to delivery without the same ceiling on deliverable fidelity.

What field-based means

Field-based representations encode 3D scenes as continuous functions defined over space, rather than as discrete surface meshes. Geometry and appearance are represented implicitly, often as volumetric fields, signed-distance fields, or radiance fields, that are evaluated by sampling the function at specific spatial locations. Rendering involves reconstructing visible surfaces or radiance through sampling and integration, rather than rasterizing explicit polygonal primitives.

Several point-based and field-based rendering approaches have emerged as alternatives to mesh-centric pipelines, with 3D Gaussian Splatting (3DGS) among the most mature for real-time use. These methods represent scenes as collections of volumetric or point-based primitives (anisotropic Gaussians, surfels, voxels, or learned samples) rather than explicit polygonal topology. Each primitive encodes spatial position along with local shape, density or opacity, and appearance parameters, often including view-dependent color expressed via spherical harmonics.

This structural distinction matters for streaming. Mesh-based assets require the client to assemble indexed geometry and reconstruct explicit surfaces before rendering can begin. Point-based and field-based representations are typically organized as flat or loosely structured collections of primitives. Rendering can start incrementally as soon as an initial subset of samples arrives, enabling progressive refinement as additional data streams in.

Why fields win for streaming

The structural differences between meshes and field-based representations translate to three practical advantages for streaming workloads.

Simpler data, faster decode. Without topological dependencies between primitives, compression and decompression become simpler operations. The SPZ format, developed by Niantic and now open-sourced, achieves roughly 90% compression compared to raw exports. A 250 MB scan can shrink to approximately 25 MB with minimal perceptual degradation.

More importantly, the decode path is meaningfully faster. Optimized splat formats can render a recognizable frame while a comparable glTF asset is still parsing—a difference users notice immediately on constrained networks.

Continuous level of detail. Gaussian splat scenes naturally support progressive refinement. You can stream splats in order of their contribution to the final image (sorted by screen-space size, opacity, or visual importance) and render a recognizable version of the scene with a fraction of the total data. The scene sharpens as additional primitives arrive.

The degradation characteristics differ qualitatively from mesh LOD. When you reduce mesh resolution, you get geometric distortion: jagged edges, collapsed features, broken silhouettes. When you reduce density in a splat cloud, you get blur. Perceptual studies suggest users tolerate blur (which mimics optical defocus) better than geometric distortion (which reads as broken data). This property is important for adaptive streaming over variable networks, where graceful degradation determines the user experience.

Better fit for mobile constraints. Mesh rendering cost scales with geometric complexity. Every vertex passes through the vertex shader regardless of how much screen space the object occupies. Gaussian rendering cost scales primarily with screen coverage. A distant object is cheap to render because its splats project to only a few pixels.

The critical operation in Gaussian rendering is sorting primitives back-to-front for correct alpha blending. This must happen every frame. In WebGL, sorting happens on the CPU. At 2 million splats, you can consume 100% of a mobile CPU core on sort operations alone, and the data transfer to the GPU becomes a bottleneck.

WebGPU changes this equation. Compute shaders move the sort entirely to the GPU. A well-optimized radix sort can process millions of keys in under 1 ms on capable hardware. The CPU stays idle, the data never leaves video memory, and thermal load drops. Benchmarks show WebGPU implementations running 85-135× faster than CPU-sorted WebGL for equivalent scenes. Where WebGL struggles to maintain 30 fps with 2 million splats, WebGPU handles 10-30 million at 60 fps on desktop hardware.

Recent research into mobile-optimized implementations demonstrates 1.6× memory reduction and 1.7× rendering speedup by reorganizing splat data to leverage mobile GPU texture cache hierarchies. These optimizations are shipping in production SDKs today.

What this means for developers

The tooling ecosystem has begun to mature alongside the underlying research.

PlayCanvas provides native Gaussian splat rendering along with the SuperSplat editor, enabling in-browser inspection, cleanup, and optimization of splat-based scenes. Luma's Three.js integration supports depth-aware compositing, allowing conventional meshes and splat primitives to correctly occlude one another in hybrid render pipelines. On the native side, Unity's Gaussian Splatting support targets VR/XR deployments on standalone headsets like Quest, demonstrating real-time rendering of multi-million-splat scenes at interactive frame rates.

Command-line tooling has also emerged to support production workflows. Utilities such as splat-transform enable preprocessing of raw captures (filtering, compressing, format conversion) as part of automated CI/CD pipelines, allowing assets to deploy directly to CDNs without manual intervention.

The standardization path is taking shape as well. Khronos has introduced the KHR_gaussian_splatting extension for glTF, while the Open Geospatial Consortium is incorporating splat-based representations into 3D Tiles Next for geospatial applications. These efforts signal a move toward interoperable, ecosystem-supported formats rather than isolated proprietary solutions.

Where this is heading

Meshes remain the right abstraction for rigged characters, physics-driven interactions, and high-quality offline rendering. When content must deform, simulate, or be authored algorithmically, explicit topology and connectivity are essential. Mesh workflows also benefit from decades of mature editing tools such as Blender, Maya, ZBrush, Houdini, whereas splat-based authoring and compositing tooling is still early-stage. For production pipelines that require extensive post-capture modification, this gap matters.

For captured environments, product scans, and spatial assets that don't require runtime deformation, field-based representations offer a more favorable trade-off. Techniques like 4D Gaussian Splatting extend this to pre-captured motion sequences, though content requiring skeletal animation, physics simulation, or interactive deformation still favors mesh-based pipelines. They address a core streaming constraint: the tension between visual fidelity, bandwidth, and decode latency. In practice, these approaches enable high-quality real-time rendering, compact wire representations, and fast time-to-first-pixel.

Growing adoption across platforms and standards bodies (Khronos, Niantic, Meta, PlayCanvas, World Labs) suggests this shift is moving from experimentation into infrastructure. The technology is no longer the bottleneck. For teams building real-time 3D streaming systems, the question isn't whether field-based representations are viable. It's how, and how soon, to integrate them into the pipeline.

‍

Where traditional meshes break down

What field-based means

Why fields win for streaming

What this means for developers

Where this is heading

Recent posts