TL;DR
Sr. Site Reliability Engineer at Pinterest building Kubernetes infrastructure and compute platforms. Focus on EKS, automation, reliability tooling, and using AI tools for incident analysis and operational workflows.
About Pinterest:
Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we’re on a mission to bring everyone the inspiration to create a life they love, and that starts with the people behind the product.
Discover a career where you ignite innovation for millions, transform passion into growth opportunities, celebrate each other’s unique experiences and embrace the flexibility to do your best work. Creating a career you love? It’s Possible.
At Pinterest, AI isn't just a feature, it's a powerful partner that augments our creativity and amplifies our impact, and we’re looking for candidates who are excited to be a part of that. To get a complete picture of your experience and abilities, we’ll explore your foundational skills and how you collaborate with AI.
Through our interview process, what matters most is that you can always explain your approach, showing us not just what you know, but how you think. You can read more about our AI interview philosophy and how we use AI in our recruiting process here.
The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams’ capability to design, build and operate robust systems at scale. We are hiring a Sr. SRE to join our Compute SRE team. This team is responsible for ensuring that all compute workloads run smoothly on Pinterest. We're building the future on kubernetes and our job is to connect it with what Pinterest needs.
Pinterest’s applications and infrastructure that handle billions of monthly page views and petabytes of data as Pinterest continues to grow and scale. As a Pinterest SRE, you will design and build systems, platforms, tools, frameworks and methodologies to assure the reliability of our large-scale distributed systems.
What You’ll Do:
- Tackle project challenges on EKS, such as implementing Karpenter. This work affects how every developer codes, tests, and improves their work
- Collaborate across various teams to drive projects forward using open-source tools
- Build a deep understanding of how Pinterest’s systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation
- Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across Pinterest Engineering
- Build meaningful, insightful and actionable SLIs
- Automate critical portions of Pinterest’s engineering processes, to minimize risk and maximize the speed of innovation
- Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world
- Use AI for analysis of incidents, operational signals, and system behaviors to help identify patterns and generate plans and propose remediation approaches.
- Leverage AI to speed development of runbooks, automation workflows, reliability tooling by drafting, iterating, and refining approaches.
What We’re Looking For:
- Strong knowledge of Kubernetes (specially EKS), including deploy patterns, rollout safety, and core debugging workflows
- 4+ years of experience with programming languages (Python or Golang preferred)
- Strong experience managing projects and initiatives end-to-end
- Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot or Claude for code generation, debugging, and documentation
- Demonstrated ability to write effective prompts to get high-quality, reliable outputs from LLMs
- Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs.
- Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
- High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables
- Experience with technologies such as Terraform, Buildkite, and/or ArgoCD is required
- Bachelor’s or Master’s degree in a relevant field such as Computer Science, or equivalent experience
In-Office Requirement Statement
We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
This role will need to be in the office for in-person collaboration 1-2 times per half and therefore can be situated anywhere in Ontario.
#LI-CH1
Our Commitment to Inclusion:
Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, national origin, religion or religious creed, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, marital status, status as a protected veteran, physical or mental disability, medical condition, genetic information or characteristics (or those of a family member) or any other consideration made unlawful by applicable federal, state or local laws. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require a medical or religious accommodation during the job application process, please complete
this form for support.