← All jobs · Together AI

Director, Data Center Operations

Together AI ·
37
AI-Agency
B45 U25
📍 San Francisco, US 💰 $250K–$300K Director
PDUCDUGPU infrastructure
TL;DR

Director of Data Center Operations at Together AI, responsible for designing and commissioning white space deployments across US and Asia sites, and building a 20-person break-fix and smart hands team from scratch to support high-density GPU infrastructure.

Apply at Together AI →
share:
you'll be redirected to the company's career page

Job description

About the Role

Together AI is scaling its physical AI infrastructure rapidly — and we're looking for a Director of Data Center Operations to help us build it right. This is a ground-floor opportunity to own the operational foundation of Together's growing data center portfolio across the US and Asia.

You'll be responsible for designing and commissioning white space deployments — taking pre-built environments and fitting them out with the power distribution, cooling distribution, and systems infrastructure needed to run high-density GPU workloads at scale. At the same time, you'll be building the break-fix and smart hands team from scratch: hiring, defining the playbook, and standing up the function that keeps our sites running around the clock.

This is not a steady-state operations role. It's a builder role. You'll be joining a small but fast-moving team, with real ownership over outcomes and the autonomy to shape how Together AI operates its physical infrastructure for years to come. If you've scaled data center infrastructure through hypergrowth before and want to do it again with more ownership — this is that opportunity.

Responsibilities

Requirements

About Together AI

Together AI is a research-driven AI infrastructure company on a mission to dramatically lower the cost of modern AI by co-designing software, hardware, algorithms, and models. We believe open and transparent AI systems create the best outcomes for society — and we're building the physical and computational foundation to make that real. Our team has been behind landmark advances including FlashAttention, Hyena, FlexGen, and RedPajama.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $250,000 - $300,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy  



Apply at Together AI →

More open roles at Together AI

Together AI ⚡ AI-native · 🔄 synced 7h ago
AI Researcher, Core ML (Turbo)
📍 San Francisco, US · Senior
AI Researcher, Core ML at Together AI building efficient inference and RL/post-training systems. Role spans algorithms, inference engines (SGLang, vLLM), and production-scale RL pipelines to optimize model speed, cost, and capabilities.
PythonSGLangvLLMGRPORLHFDPO
88
AI-core
Together AI ⚡ AI-native · 🔄 synced 7h ago
Research Engineer, Core ML
📍 San Francisco, US · Staff
Research Engineer, Core ML at Together AI building production inference and RL/post-training systems. Focus on efficient inference algorithms, speculative decoding, and scaling RL pipelines to optimize latency, throughput, and model quality.
PythonSGLangvLLMATLASPyTorchGRPO
82
AI-core
Together AI ⚡ AI-native · 🔄 synced 7h ago
Forward Deployed Engineer (GPU Clusters)
📍 San Francisco, US 🌐 Remote 💰 $270K–$300K · Senior
Forward Deployed Engineer at Together AI supporting large-scale GPU cluster deployments for AI model builders. Focuses on infrastructure optimization, cluster hardening, and technical partnership with strategic customers on Kubernetes/SLURM orchestration and high-performance networking.
KubernetesSLURMInfiniBandRoCENVLinkPython
73
AI-fluent
Together AI ⚡ AI-native · 🔄 synced 7h ago
Machine Learning Engineer - Inference
📍 San Francisco, US 💰 $160K–$230K · Mid
Machine Learning Engineer at Together AI building the inference engine for large language models. Focus on optimizing runtime services, performance at scale, and high-performance systems using PyTorch and low-level systems concepts.
PythonPyTorchCUDATritonRustCython
73
AI-fluent
Together AI ⚡ AI-native · 🔄 synced 7h ago
LLM Inference Frameworks and Optimization Engineer
📍 San Francisco, US 🌐 Remote 💰 $160K–$230K · Mid
LLM inference frameworks and optimization engineer at Together AI building distributed inference engines for large language models. Focus on GPU optimization, tensor parallelism, and software-hardware co-design for scalable model serving.
PythonC++CUDATritonTensorRTTensorRT-LLM
73
AI-fluent