Stop building training environments. Start streaming them.



The training environment has always been a built artifact. A scene file. A level. Something an artist or engineer assembled, validated, and locked in before training began.
That made sense when the cost of variation was authoring time, and the cost of delivery was loading every variant into GPU memory before the run. Neither constraint holds anymore. We didn't design training environments to be small. We designed them to be possible.
Models that train against a fixed environment learn that environment. Models that train against a varying environment learn the task. The field has known this for years. Domain randomization and context randomization are standard practice in robotics and Physical AI for exactly this reason.
But there has always been a fidelity ceiling on randomization. Lighting, textures, and object placements are cheap to randomize when the visuals are stylized or low-poly. Try to randomize at photoreal fidelity, where every variant is a fully realized, materially correct, physically plausible asset, and the cost of variation collides with the cost of authoring and the cost of loading.
So most teams settle. Photoreal but static, or varied but stylized. The model has to pick which generalization gap to live with. Turns out randomization works; we have just never been able to afford it at fidelity.
A streamed photoreal asset is enough to train perception. The model sees the world as the world actually looks, and for any perception-based ML pipeline that alone is a meaningful upgrade.

Perception is actually half the loop. Agents being trained on manipulation, and workers being trained on procedural tasks, need an asset they can act on, one with the contact surfaces, mass behavior, and interaction boundaries that turn a visible object into a behavable one.
Miris streams the physics colliders alongside the asset. The same delivery layer that brings the visual model into the runtime brings the volumes and surfaces the simulator's physics engine needs to make that model real inside it.
The key takeaway is that photoreal teaches what the world looks like, while physics teaches what it does.
Miris does not render scenes on cloud GPUs and transmit video frames. The pipeline conditions assets once upstream, then streams 3D spatial data that reconstructs inside the runtime: Unity, Isaac Sim, or a custom Physical AI environment.
That distinction is what makes adaptive simulation possible. The simulator keeps doing its own rendering and physics. The asset library stops being a fixed set on disk and becomes a streamable library that scales like video, not like GPU rental. Variants do not have to be pre-loaded. They arrive when the runtime asks for them. The runtime stays. The asset library becomes fluid.
Real-time domain and context randomization at photoreal fidelity. Every episode can be a different version of the scene: different equipment on the line, different obstacles in the corridor, different objects on the conveyor, drawn from a streamable library instead of an artist's backlog. The training distribution widens without the authoring cost widening with it.
Curriculum that adapts to the learner. Because assets stream on demand, the environment can change while the model is still training. Agent overfitting to a specific gripper geometry? Swap in a hundred others. Worker missing the safety case? Stream more variants. The environment responds to where the learner is, not where the curriculum was planned to go.
Sim-to-real that actually transfers. Variability at photoreal fidelity is what begins to close the sim-to-real gap, and getting the physics right closes the rest of it. Models trained against environments that look and behave like the deployment environment generalize to it. Models trained against fixed scenes do not. And when the variants themselves are captured from the real deployment environment and streamed in as sim-ready assets, the gap narrows further. You are not just training against plausible worlds, you are training against the one the model will actually operate in.
These are not theoretical limits. They are operational ones, and adaptive streaming removes them.
The same shift applies to human training. A worker learning a procedure does not generalize from one rehearsed scenario any better than a model does. An adaptive environment, where equipment, layout, and conditions vary on every run, produces workers who can handle the variant they will actually encounter on the floor.
Humans and machines learn the same way: against worlds that surprise them.
The work in machine learning environments has been the environment as artifact: built, frozen, shipped, used. That model held when the cost of variation was authoring time and the cost of delivery was infinite.
Adaptive streaming changes that. The contents of the environment become as fluid as the policies being trained against them.
If you are building a Physical AI company and the cost of variation is shaping what your models can learn, that is a conversation worth having.