← All jobs · SambaNova

Senior Cloud Platform Engineer

SambaNova ·
51
AI-Agency
B62 U35
📍 Palo Alto, US Senior 5–8+ yrs
PythonGoJavaDockerKubernetesTerraformPrometheusGrafanaDatadogAWSGCPAzure
TL;DR

Senior Cloud Platform Engineer at SambaNova building reliability and scalability for AI inference services. Focus on cloud infrastructure, monitoring, incident response, and automation across AWS, GCP, and on-premises environments.

Apply at SambaNova →
share:
you'll be redirected to the company's career page

Job description

The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

About SambaNova Systems

Join the company that's building the future of AI computing. At SambaNova, we are disrupting the AI and high-performance computing space with our integrated hardware and software platform. Our DataScale systems and SambaFlow software are pushing the boundaries of what's possible with generative AI and large language models. We are a team of passionate innovators tackling some of the world's most challenging computational problems.

 

The Role

As a Senior Cloud Site Reliability Engineer (SRE) specializing in our AI Inferencing Service, you will be the guardian of its reliability, performance, and scalability. You will bridge the gap between software development and operations, applying an engineering mindset to solve operational challenges. Your primary focus will be ensuring our inference endpoints have exceptional uptime, low-latency response times, and efficient resource utilization, directly impacting the experience of our customers and the success of our AI products. This role includes participating in a shared on-call rotation to maintain 24/7 service reliability. 

 

What You'll Do

Service Ownership & On-Call: Take shared ownership of the production inferencing service, including its availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning across multiple regions. This includes implementing and supporting AI infrastructure in new regions, such as Asia, Europe, and Latin America, to support the growth of our business.  Participate in a balanced on-call rotation to provide 24/7 support for the service.

 

On-Call & Work-Life Balance

We believe a sustainable on-call schedule is critical for long-term success and team health. Our on-call philosophy is built on the following principles:

 

What We're Looking For (Must-Haves)

 

What Will Make You Stand Out (Nice-to-Haves)

 

Why SambaNova?

Submission Guidelines
Please note that in order to be considered an applicant for any position at SambaNova Systems, you must submit an application form for each position for which you believe you are qualified. 

EEO Policy
SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex (including pregnancy, childbirth, breastfeeding), sexual orientation, and any other applicable status protected by federal, state, or local laws.

Benefits Summary for US-Based, Full-Time Employment Positions
SambaNova offers a competitive total rewards package, including the base salary, plus equity and benefits. We cover 95% premium coverage for employee medical insurance, and 77% premium coverage for dependents and offer a Health Savings Account (HSA) with employer contribution. We also offer Dental, Vision, Short/Long term Disability, Basic Life, Voluntary Life, and AD&D insurance plans in addition to Flexible Spending Account (FSA) options like Health Care, Limited Purpose, and Dependent Care. Our library of well-being benefits available to you and your dependents includes a full subscription to Headspace, Gympass+ membership with access to physical gyms, One Medical membership, counseling services with an Employee Assistance Program, and much more.

Apply at SambaNova →

More open roles at SambaNova

SambaNova · 🔄 synced 3h ago
Principal AI Solutions Engineer
📍 Austin, US 🛠 AI tools welcome at work · Principal
Principal AI Solutions Engineer at SambaNova building enterprise AI solutions, agentic systems, and reference architectures on the SambaNova Suite platform. Combines deep ML expertise with customer-facing technical leadership for LLM deployment and optimization.
PythonC++PyTorchTensorFlowJAXvLLM
82
AI-core
SambaNova · 🔄 synced 3h ago
ML Features Solutions Engineer
📍 Austin, US · Senior
ML Features Solutions Engineer at SambaNova building production-grade ML features for enterprise AI deployment. Focus on model optimization, inference performance, and translating research into product capabilities on custom hardware.
PythonPyTorchTensorFlowJAXvLLMTensorRT-LLM
79
AI-core
SambaNova · 🔄 synced 3h ago
Senior AI Systems Performance Engineer
📍 Palo Alto, US · Senior
Senior ML performance engineer at SambaNova optimizing foundation models on the SambaNova dataflow platform. Focus on profiling, compiler/runtime tuning, and achieving state-of-the-art inference throughput and latency across distributed systems.
PythonC++PyTorchTensorFlowJAXCUDA
79
AI-core
SambaNova · 🔄 synced 3h ago
Hardware Design Engineer
📍 US 🌐 Remote-only 🛠 AI tools welcome at work · Mid
Hardware Design Engineer at SambaNova designing ASIC components for AI workload acceleration systems. Responsible for microarchitecture specifications, SystemVerilog RTL implementation, and power/performance/area optimization for the SN40L chip.
SystemVerilogPythonLinuxGitAXIPCIe
79
AI-core
SambaNova · 🔄 synced 3h ago
Software Engineer, ML Inference Performance
📍 Palo Alto, US · Principal
Principal Compiler Engineer at SambaNova building ML inference optimization on custom hardware. Focus on compiler infrastructure, PyTorch integration, and performance mapping to the SN40L chip.
PyTorchTensorFlowMLIRcompiler infrastructure
71
AI-fluent