← All jobs · Cognition

Site Reliability Engineer

Cognition ·
63
AI-Agency
B62 U65
📍 San Francisco, US 💰 $260K–$300K 🛠 AI tools welcome at work Senior
AWSGCPAzureKubernetesTerraform
TL;DR

Site Reliability Engineer at Cognition, an applied AI lab building Devin (AI software engineer) and Windsurf (AI-native IDE). Own production reliability, SLOs, incident response, and platform engineering for products used by hundreds of thousands of developers daily.

Apply at Cognition →
share:
you'll be redirected to the company's career page

Job description

Who We Are

We are an applied AI lab building end-to-end software agents. We're the team behind Devin, the first AI software engineer, and Windsurf, an AI-native IDE. These products represent our vision for AI that doesn't just assist engineers, but works alongside them as a genuine teammate.

Our team is small and talent-dense: world-class competitive programmers, former founders, and researchers from the frontier of AI, including Scale AI, Palantir, Cursor, Google DeepMind, and others.

Role Mission

Devin and Windsurf are used by hundreds of thousands of developers every day. When something goes wrong, it goes wrong for all of them at once. This role exists to make sure that doesn't happen, and when it does, to make sure it's resolved faster than anyone expects.

You will own both the production reliability of our user-facing products and the platform engineering that lets our team ship quickly and confidently. That means SLOs, incident response, and on-call on one side, and CI/CD pipelines, deployment infrastructure, and developer tooling on the other. At Cognition, these are not separate jobs. The best SREs here understand that reliability is engineered in, not bolted on.

What You'll Accomplish

Exceptional Candidates Have Demonstrated

Resources & Environment

Compensation & Benefits

Equal Opportunity

Cognition is an equal opportunity employer. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic under applicable law. We are committed to providing reasonable accommodations for candidates with disabilities throughout the hiring process - please let us know if you need any.

Apply at Cognition →

More open roles at Cognition

Cognition ⚡ AI-native · 🔄 synced 6h ago
Research, Post-Training
📍 San Francisco, US 🛠 AI tools welcome at work · Senior
Research scientist at Cognition building post-training methods for AI agents. Focus on training recipes, evaluation design, alignment techniques (RLHF, RLAIF), and scaling methodologies for Devin and Windsurf.
RLHFRLAIFPyTorchJAXdistributed training
93
AI-core
Cognition ⚡ AI-native · 🔄 synced 6h ago
Research, Mid-Training
📍 San Francisco, US 🛠 AI tools welcome at work · Mid
Research scientist at Cognition focused on mid-training optimization for large language models. Responsibilities include data mix design, synthetic data pipelines, annealing schedules, and context length extension to improve model reasoning and coding capabilities.
PythonPyTorch
91
AI-core
Cognition ⚡ AI-native · 🔄 synced 6h ago
Product Engineer
📍 San Francisco, US 🛠 AI tools welcome at work · Mid
Product Engineer at Cognition building end-to-end software agents and AI-native developer tools. Own features across IDE, web, and CLI surfaces; work directly on Devin agent infrastructure and developer experience.
PythonTypeScriptReact
83
AI-core
Cognition ⚡ AI-native · 🔄 synced 6h ago
Software Engineer
📍 San Francisco, US 💰 $260K–$300K 🛠 AI tools welcome at work · Mid
Software engineer at Cognition building core agent infrastructure for Devin and Windsurf. Focus on long-horizon task execution, tool use, subagent orchestration, and production reliability for AI-native developer tools.
Pythondistributed systemsagent infrastructurecode execution environments
83
AI-core
Cognition ⚡ AI-native · 🔄 synced 6h ago
Research Engineer, Infrastructure
📍 San Francisco, US 🛠 AI tools welcome at work · Senior
Research Engineer, Infrastructure at Cognition building distributed training systems and experiment orchestration for large-scale AI agent development. Own GPU cluster infrastructure, data pipelines, and performance optimization across thousands of GPUs.
PythonC++PyTorchGPUKubernetesdistributed systems
83
AI-core