Pixel streaming renders 3D graphics entirely on remote servers, then encodes the output as video and streams it to the user's device in real-time. The user's inputs (mouse movements, clicks, touch) travel back to the server, which updates the scene and sends new frames. The device displays video, never processing heavy 3D files locally.
The architecture splits rendering from display. A powerful GPU server (often running Unreal Engine, Unity, or similar) handles all the computational work: lighting calculations, physics, texture sampling, everything required to produce a final image. That image gets compressed using video codecs (typically H.264 or VP9), transmitted over WebRTC or similar protocols, and decoded on the client for display.
User interactions flow in the opposite direction. When someone clicks, scrolls, or moves a controller, those events travel to the server with minimal delay. The server updates the 3D scene accordingly and pushes new frames. This round-trip happens continuously, ideally 30 to 60 times per second.
Because the client only receives video, device capabilities matter less. A smartphone, an old laptop, or a thin client can display content that would otherwise require dedicated graphics hardware. The computational burden shifts entirely to the data center.
Pixel streaming appears most often in scenarios where local rendering isn't practical. Cloud gaming services like GeForce NOW and Xbox Cloud Gaming use this approach to deliver AAA titles to devices without discrete GPUs. Enterprise visualization platforms stream CAD models and architectural walkthroughs to stakeholders who don't have workstation-class hardware. Training simulators, virtual production environments, and collaborative design reviews also rely on remote rendering when content complexity exceeds what endpoint devices can handle.
Latency defines the pixel streaming experience. Every interaction requires a network round-trip: input travels to the server, the server renders a new frame, and video travels back. Even on fast connections, this adds 50–150 milliseconds of delay compared to local rendering. Users notice this as sluggishness, where the scene reacts a beat behind their movements. For applications requiring precise, responsive interaction, this lag creates friction at best and unusability at worst.
Video compression may introduce visual artifacts such as fine details, sharp edges, and subtle gradients that become smoothed or blocky. The higher the compression (to reduce bandwidth), the more visible these artifacts become. For applications where visual fidelity matters—product visualization, medical imaging, quality inspection—compression artifacts can undermine the purpose of showing 3D content in the first place.
Infrastructure costs scale with concurrent users. Each active session requires dedicated GPU resources on the server side. A product page that handles a thousand simultaneous visitors needs a thousand GPU instances rendering in parallel. This makes pixel streaming expensive for public-facing, high-traffic applications where the economics favor controlled environments with limited concurrent access.
Network dependency is absolute. If the connection drops or degrades, the experience fails completely. There's no local fallback, no cached geometry, no graceful degradation. Offline scenarios are impossible by definition.
These limitations are what break pixel streaming on headsets. High latency from round-trip input and video can disorient or nauseate users as the VR or AR world lags behind their perceived reality by milliseconds.
The distinction matters. Pixel streaming sends rendered video—flat pixels that happen to depict 3D content. 3D streaming sends actual geometry, textures, and scene data that the client device renders locally. Both approaches deliver 3D experiences over networks, but they make fundamentally different architectural choices.
3D streaming preserves interactivity because rendering happens on the device. User inputs produce immediate visual responses without waiting for server round-trips. It also preserves fidelity because there's no video compression—the client renders at native resolution with whatever quality settings the hardware supports. The tradeoff is that 3D streaming requires clients capable of processing geometry, though progressive delivery techniques can scale content to match device capabilities.
For applications serving many concurrent users, tolerating some latency, and targeting devices that truly cannot render 3D locally, pixel streaming can work. For responsive, high-fidelity experiences at scale (especially on the modern web, where even mobile devices have capable GPUs) native 3D streaming typically delivers better results.
Streaming video reshaped media consumption; we're doing the same for 3D. Join a small team solving tough spatial streaming, content delivery, and developer experience challenges.
Technical deep-dives on adaptive spatial streaming, the infrastructure challenges of 3D at scale, and what we're learning as we build. Written by the team doing the work.