Why glTF works until it doesn't, and what comes next

glTF earned its reputation. That matters.

Before diving into limitations, it is worth stating clearly: glTF is a well-designed format that solved real problems. It established a common interchange standard for web 3D at a time when the ecosystem desperately needed one. It is widely supported, well-documented, and works reliably for a range of lightweight use cases.

If you are shipping a low-poly product viewer with modest material complexity, glTF remains a reasonable choice. The ecosystem around it (three.js, Babylon.js, model-viewer) is mature and accessible.

This post is not about replacing glTF where it works. It is about examining what happens when teams push beyond its design constraints, and why the engineering cost of that push reveals a structural limitation in how the industry approaches 3D delivery.

‍

The tradeoff every team hits

Talk to any team shipping 3D on the web at production scale, and you will hear some version of the same story. They started with glTF because it was the standard. It worked for initial prototypes. Then the requirements grew. Higher fidelity. Larger scenes. More device targets. And they ran into a ceiling.

The ceiling is not a single technical limitation. It is a constraint triangle. With glTF, teams can optimize for two of three outcomes: speed (fast load times), fidelity (visual quality that matches source assets), or reach (consistent experience across device classes). Achieving all three simultaneously is not possible within the format's architecture.

Optimizing for speed and reach means compressing assets aggressively, stripping material detail, and reducing polygon counts to keep file sizes small and rendering budgets low. Fidelity suffers. Optimizing for fidelity and speed means targeting capable hardware only, accepting that lower-end devices will struggle or fail. Reach suffers. Optimizing for fidelity and reach means maintaining multiple asset versions per device class, each hand-tuned. Speed to production suffers.

Every project begins with this negotiation. And every team absorbs the cost.

The optimization tax

The cost has a name, even if most teams do not use it: the optimization tax. It is the cumulative engineering effort required to get a high-fidelity source asset into a production-ready glTF deployment.

The tax includes texture compression and resizing, where source textures at 4K or 8K resolution get reduced to meet file size targets, losing material nuance in the process. It includes polygon decimation, where geometry is simplified to stay within rendering budgets, removing the surface detail that communicates product quality. It includes manual LOD authoring, where teams create multiple levels of detail and implement custom loading logic to serve appropriate versions based on camera distance or device capability.

Then there is the versioning problem. glTF was initially positioned as an optimize-once format. In practice, any team deploying across multiple device classes discovers that a single optimization pass does not hold. Desktop browsers, mobile Safari, low-end Android devices, and emerging XR hardware each have different rendering capabilities, memory limits, and GPU budgets. The result is multiple compression configurations, multiple format variants, and a testing matrix that grows with every target platform.

Lighting adds another layer. glTF supports PBR (physically based rendering) materials, which is a meaningful capability. But lighting computation happens at runtime on the client device, which means results vary depending on the renderer, the device GPU, and the environment map configuration. Achieving consistent, realistic lighting across devices requires manual baking, environment tuning, and per-scene adjustments that do not transfer cleanly between projects.

The tax is not paid once. It recurs with every new asset, every catalog expansion, every platform update. For teams managing hundreds or thousands of SKUs, the engineering cost of the optimization pipeline can rival the cost of the 3D content creation itself.

Where the format hits the wall

Beyond the optimization tax, certain visual and technical capabilities sit outside what glTF can deliver well, regardless of engineering effort.

Complex material responses (reflections, refractions, subsurface scattering, translucency) require approximations in glTF that sacrifice accuracy. The materials that communicate product quality in high-value categories, such as the grain of leather, the reflection of polished metal, or the translucency of woven fabric, lose their defining characteristics after format conversion.

High polygon counts push glTF renderers into performance degradation on consumer hardware. Models above roughly one million polygons cause frame rate drops and rendering artifacts on standard devices. For applications requiring geometric precision (product visualization with fine detail, architectural models, CAD-derived assets), this creates a hard constraint.

Soft shadows and ambient occlusion are expensive to compute in real time and are often disabled entirely on mobile platforms. Pre-baking shadow maps is possible but adds file size and limits the interactivity that makes 3D valuable in the first place.
‍

Large scenes and multi-object environments require downloading all geometry before any rendering can begin. Progressive loading is technically possible on top of glTF, but it requires significant custom engineering that the format itself does not provide.

These are not obscure edge cases. They are the exact scenarios where 3D content delivers the most business value: photorealistic product visualization, immersive virtual showrooms, high-fidelity configurators, and large-scale environment walkthroughs. The format works well inside its design constraints. Outside them, teams are building custom infrastructure to compensate.

A different delivery architecture

The limitations above share a common root. glTF is a file format designed for transmission: package geometry, textures, and materials into a static file, transmit it, and render it on arrival. The delivery model is download-first. Every optimization exists to make the download smaller and the client-side rendering faster.

Adaptive spatial streaming starts from a different premise. Instead of packaging assets into static files and hoping the receiving device can handle them, spatial streaming converts source content into optimized representations that stream progressively, adapting in real time to network conditions, device capabilities, and user interaction.
‍

Comparison between Miris streaming and traditional download-first glTF delivery

At Miris, this means teams upload their highest-fidelity source assets (OpenUSD, images, video, traditional 3D scenes) and the platform handles the rest. AI-driven optimization prepares content for streaming without destructive compression. Lighting is baked into the spatial representation during processing, producing consistent, high-realism results on every device rather than depending on variable client-side renderers.

Miris works by streaming content using adaptive fidelity: the maximum quality each user's device and network can handle, adjusted frame by frame. A flagship phone receives full photorealistic detail. An older tablet receives an intelligently adapted version. Every user gets the best experience their setup allows, sub-second load times, and instant interactivity, all from a single upload, without the team maintaining parallel asset versions. No complete downloads. No load screens. Automatic LOD management replaces manual authoring. The optimization tax disappears.

This is not an incremental improvement to the glTF workflow. It is a structural change in how 3D content moves from creation to consumption.
‍

What this means in practice

The practical difference comes down to two factors that matter most in production: speed to fidelity and total cost of ownership.

Speed to fidelity measures how quickly users see content at its intended visual quality. With glTF, speed and fidelity are inversely correlated. Reducing load time requires reducing quality. With adaptive streaming, the relationship is decoupled. Content loads instantly at initial quality and refines to full fidelity within seconds, regardless of asset complexity.

Total cost of ownership includes not just hosting and delivery, but the engineering time spent on optimization pipelines, the maintenance burden of multi-version asset libraries, and the opportunity cost of teams building delivery infrastructure instead of product features. Spatial streaming collapses the optimization pipeline into a single upload step. The platform handles adaptation, distribution, and device-specific delivery automatically.

For teams already feeling the constraints of their glTF workflows, this is not a theoretical improvement. It is a production-ready alternative.

What comes next

glTF is not going away, and it should not. It serves a real purpose for lightweight web 3D use cases and will continue to do so. The ecosystem around it is strong.

But the industry's requirements are moving beyond what any static file format can deliver. Higher fidelity expectations, broader device diversity, larger content catalogs, and tighter production timelines are pushing teams toward delivery architectures that adapt dynamically rather than relying on pre-optimized static assets.

The same shift happened in video. Static file downloads gave way to adaptive streaming protocols that adjust quality in real time. The result was not just a better user experience, it was a fundamentally different economic model that made video distribution viable at internet scale.

3D delivery is at the same inflection point. The question is not whether the shift will happen, but which teams will make it first.

Join our beta launching March 24 and try it with your own assets.

‍