← All jobs · Elicit

Evaluation Engineer

Elicit ·
69
AI-Agency
B72 U65
📍 Oakland, US 🌐 Remote/hybrid 🛠 AI tools welcome at work Mid 3+ yrs
PythonTypeScriptasyncioPostgreSQL
TL;DR

Evaluation Engineer at Elicit building auto-evaluation systems for an AI research platform. Focus on infrastructure speed, interfaces for ML engineers and product managers, and statistical rigor for pharma decision-making assessments.

Apply at Elicit →
share:
you'll be redirected to the company's career page

Job description

About Elicit

Elicit is an AI research platform that uses language models to help researchers figure out what's true and make better decisions, starting with common research tasks like literature review.

What we're aiming for:

  1. Elicit radically increases the amount of good reasoning in the world.

    • For experts, Elicit pushes the frontier forward.

    • For non-experts, Elicit makes good reasoning more affordable. People who don't have the tools, expertise, time, or mental energy to make well-reasoned decisions on their own can do so with Elicit.

  2. Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.

The mission of Elicit evals

Some orgs build evals to warn us about dangerous capabilities. Others build evals to understand trends and predict future developments. Yet others build evals to hill-climb towards models that users will like more.

At Elicit, we're focused on something different—we want to understand, and hill-climb towards, models that help us make better decisions.

This is tougher than "what will users like better"—it's hard to evaluate decision support, and users' knee-jerk reactions may not align with what actually helps for decision-making. Because it's hard, and because the sales pitch is more complicated, there aren't many doing this well. If we nail this, we have a unique opportunity to push AI toward helping us make better decisions, both within Elicit and beyond.

Why we're hiring for this role

We need someone to own the technical foundation of our auto-evaluation systems. Our evals are currently much slower than they need to be, and our interfaces aren't optimized for the diverse set of people who need to use them—ML engineers iterating on models, product managers monitoring quality, and customers assessing trust in results.

The right person for this role won't just build infrastructure. You'll think deeply about what it actually means for Elicit to help with decision-making in pharma and encode that understanding into our evaluation systems.

What you'll own

The core auto-eval platform

You'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:

Ensuring evaluations are accurate and reliable

A month in your life

In a typical month, expect to spend:

What you bring to the role

Requirements

Will make you more competitive for the role

This is a diverse list of nice-to-haves. We expect the candidate we select to have some, but not all, of these. Other team members can fill in for skills you lack.

Location and travel

We have a lovely office in Oakland, CA, but we’re flexible about where you work. You’re welcome to work remotely, from our Oakland headquarters, or in a hybrid setup. The only in-person requirement is attending our quarterly team retreats, typically held on the west coast.

Compensation, benefits, and perks

In addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location):

For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:

Apply at Elicit →

More open roles at Elicit

Elicit · 🔄 synced 3h ago
ML Research Resident
📍 Oakland, US 🌐 Remote 💰 $144K–$180K 🛠 AI tools welcome at work · Entry
ML Research Resident at Elicit developing computational operators for a research agent that iteratively improves knowledge states over thousands of steps. Focus on designing transparent, epistemically sound reasoning procedures for scientific document analysis and reasoning tasks.
LLMsTransformers
87
AI-core
Elicit · 🔄 synced 3h ago
Machine Learning Engineer
📍 Oakland, US 🌐 Remote 🛠 AI tools welcome at work · Mid
Machine Learning Engineer at Elicit building AI-powered research and decision-making systems. Focus on combining language models with data integrations, evaluation systems, and product interfaces for scientific teams.
Pythonlanguage modelsRAGAPIsevaluation systems
79
AI-core
Elicit · 🔄 synced 3h ago
Senior ML Product Manager
📍 Oakland, US 🌐 Remote 🛠 AI tools welcome at work · Senior
Senior ML Product Manager at Elicit, an AI research assistant using language models for literature review and research tasks. Lead ML-based product projects end-to-end, define product vision, and drive user research to scale reasoning capabilities.
LLMslanguage models
77
AI-core
Elicit · 🔄 synced 3h ago
AI Engineer
📍 Oakland, US 🌐 Remote 🛠 AI tools welcome at work · Mid
AI Engineer at Elicit building backend systems for an AI research assistant. Focus on prompt management, LLM orchestration, and distributed systems powering literature review and reasoning tasks.
Node.jsPythonNext.jsTypeScriptKubernetesGitHub
76
AI-core
Elicit · 🔄 synced 3h ago
Senior Software Engineer
📍 Oakland, US 🌐 Remote 💰 $185K–$305K 🛠 AI tools welcome at work · Senior
Senior Software Engineer at Elicit, an AI research assistant using language models for literature review and reasoning tasks. Build and ship features across a full-stack Node/Python/Next.js system serving thousands of paying users.
Node.jsPythonNext.jsTypeScriptTailwindKubernetes
69
AI-fluent