Scaling AI at Omi: Faster Cold Starts and Lower Costs with Inferless

Omi is a Virtual photo studio powered by 3D AI to help brands create photorealistic product photography. Their team recently launched features like PhotoDrop which enable customers to place digital twins of their products into realistic, AI-assisted scenes with accurate lighting, shadows, and surface properties. Behind the scenes, these features rely on a stack of custom-built AI models—each requiring significant GPU resources to run efficiently. But getting those models deployed, tested, and into production was becoming a bottleneck.
Here’s how Omi accelerated their AI workflows and dramatically reduced infrastructure costs using Inferless.
The Challenge: Slow Cold Starts and High Compute Costs
Before Inferless, Omi’s AI development process lacked velocity. “The cold boots on our previous provider took 5 to 15 minutes,” says James, Senior ML Engineer “Sometimes it wasn’t even guaranteed the cold start times. And keeping the models always warm? That would’ve exploded our compute budget—easily 100x higher.”
Omi’s use case is unique:
‍
- They run custom models for lighting prediction, surface understanding, and scene compositing.
- These models require powerful GPUs
- The feedback loop involves both 3D artists and researchers constantly testing, iterating, and tuning results.
“Having to wait 10 minutes or keeping everything warm wasn’t an option,” the team explained. “And the backend team was already stretched thin. Offloading deployments to researchers seemed impossible—until Inferless.”
The Solution: Serverless GPU Inference with Inferless
Omi now uses Inferless across their entire model lifecycle:
- Research & Development: Spin up experimental models on-demand without needing persistent infrastructure.
- Testing with Artists: Cold starts are fast enough that artists barely notice. They run models, give feedback, and iterate quickly.
- Production Deployment: Once approved, models are flipped to “always warm” mode for customers—without changing pipelines.
“Inferless cold boots are so fast, we can afford to run 5-6 custom models without needing them always up. It gave our small team the speed of a much larger org,” they shared.
Even more valuable: Researchers now own the deployment process end-to-end, freeing up backend engineers to focus on core infra.
The Impact
The results have been substantial:
- Up to 100x reduction in compute costs by avoiding always-on infrastructure.
- Faster iteration cycles across research, testing, and deployment.
- Independent deployment workflows for the research team, reducing backend overhead.
- Improved observability with logs and container usage insights via the Inferless console.
- Better end-user experience, with fast, on-demand inference during rendering pipelines.
“Inferless helped us offload GPU-heavy inference while still moving quickly,” said the Omi team. “We now manage our own deployments, control costs, and ship features faster.”