DevOps Engineer
Own the infrastructure behind a growing inference surface, from deploy safety and provider health to uptime, observability, and incident response.
You will improve the infrastructure and delivery systems that keep Direct Inference deployable, observable, and ready for growth.
About Direct Inference
Direct Inference is the endpoint that does everything frontier models can do. Customers bring the SDK and model id they already use; Direct Inference handles capability, quality, cost, latency, failover, and provider churn behind the scenes.
The important product constraint is zero-knowledge: customers never see which model, provider, or version served a request. That lets them build on a stable surface while the model market keeps moving underneath it.
What you'll own
- Improve deploy pipelines, environment configuration, infrastructure automation, monitoring, and release verification.
- Harden production services for uptime, recovery, capacity, and operational clarity.
- Build tooling that makes local development, CI, and production operations easier to trust.
- Partner with SRE, backend, and inference engineering on incident response and preventive automation.
- Document operational procedures so production work is repeatable and auditable.
Projects you might ship
- Improve the production deploy path so it is easier to build, ship, verify, and recover.
- Create automation that checks real production state instead of relying on status messages alone.
- Harden secrets, config overlays, local setup, or CI checks so environment drift is caught early.
What we're looking for
- You have operated production infrastructure and know what breaks during real deploys.
- You are comfortable with Linux, containers, CI/CD, networking basics, secrets, logs, and metrics.
- You can automate without hiding important state from operators.
- You care about developer experience as part of reliability.
- You can balance speed, safety, and clarity when production is involved.
Nice to have
- Experience with Uncloud, Docker, Caddy, GitHub Actions, production Linux, or small-team infrastructure.
- A habit of writing runbooks that another engineer can execute under pressure.
- Comfort proving deploy state from workflow logs, running images, healthchecks, and persisted production data.
Your first 90 days
- Improve one deploy or verification workflow so releases are safer or easier to prove.
- Close a monitoring, configuration, or operational documentation gap.
- Ship an automation that removes repeated manual production work.
Benefits & support
Built for people doing serious work in a small team.
Interview process
A direct loop with the people doing the work.
Intro
A focused conversation about your background, what you want to build, and where this role should create leverage.
Technical
A practical working session around the kind of problem this role owns. We prefer realistic systems over puzzle interviews.
Team
Meet the people you would work with across product, engineering, reliability, and customer-facing work.
Offer
We align on scope, compensation, start timing, and the first problems you would take on.
Application
Apply for DevOps Engineer.
Share the practical context we should know before the first conversation. We read applications for ownership, clarity, and evidence of shipped work.
More openings
Other ways to build Direct Inference.
Forward Deployed Engineer
Engineering · Remote / San Francisco, CA
Work directly with high-intent customers to get production AI workloads running on Direct Inference, then bring the sharp edges back into the product and serving engine.
Senior Inference Engineer
Engineering · Remote / San Francisco
Own and extend the serving engine: the quality, latency, health, and price signals that decide how every request is served.
Platform Reliability Engineer (SRE)
Infrastructure · Remote
Keep one endpoint dependable across a churning set of upstream providers: failover, rate-limit absorption, and the spend caps that fail closed.