Infrastructure engineer at Julius AI building and scaling code-execution sandboxes for a coding agent platform. Focus on Kubernetes, container orchestration, multi-tenant compute reliability, and cloud deployments across AWS/GCP.
Compensation: Competitive base salary and meaningful equity
Benefits: Health & dental insurance, gym reimbursement, daily team meals, commuter benefits
We’re an applied AI lab building coding agents. Julius executes ~1M lines of code every 36 hours for 1M+ users and has generated 3M+ visualizations. All code runs in code sandboxes (isolated remote containers) that we manage. We’re revenue‑generating and backed by AI Grant, YCombinator, Bessemer Venture Partners and the founders from Vercel, Notion, Perplexity, Palantir, Replit, Zapier, Intercom, and Dropbox.
The Role
Build and scale the code‑execution sandboxes that power Julius across cloud environments (AWS and GCP). We orchestrate 500k+ containers/month and growing. You’ll own reliability, performance, and security for multi‑tenant compute.
What You’ll Do
Design and operate secure, multi‑tenant container infrastructure with fast startup and smart autoscaling.
Ship cloud deployments (Helm/Terraform) with SSO, network controls, and audit logging.
Drive observability (metrics, traces, logs) with clear SLOs; lead incident response.
Optimize images, scheduling, networking, and cost ; build fair‑use and rate‑limiting controls.
What You Bring
Production Kubernetes and container internals (Docker/containerd); strong networking fundamentals.
Cloud (AWS/GCP/Azure) and IaC (Terraform/Helm).
Monitoring/Logging (Prometheus, Grafana, OpenTelemetry, ELK/Vector).
Security best practices for containerized, multi‑tenant systems.
Nice to Have
gVisor/Kata/Firecracker; Cilium/eBPF; GPU scheduling; serverless autoscaling (KEDA/Knative/Karpenter).
You’ve built an AI side project and enjoy tinkering with LLMs.
Why Julius
Small, senior team; massive impact surface; hard infra problems at meaningful scale.