← All jobs · Anyscale

Software Engineer, Ray Data

Anyscale ·
70
AI-Agency
B78 U55
📍 Bengaluru, IN Senior 5+ yrs
PythonApache ArrowRaySparkDaskC++
TL;DR

Software engineer at Anyscale building Ray Data, a distributed data processing library. Focus on performance optimization, ML pipeline integration, and fault-tolerant systems at scale.

Apply at Anyscale →
you'll be redirected to the company's career page

Job description

About Anyscale:


At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAIUberSpotifyInstacartCruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.


With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.


Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.


About the role:

Ray aims to provide a universal API for building distributed applications (e.g. a machine learning pipeline of feature engineering, model training, and evaluation). Data is usually a core element connecting these different stages, and therefore plays a critical role in Ray’s usability, performance, and stability. We are looking for strong engineers to build, optimize, and scale Ray’s Datasets library and data processing capabilities in general.

About the Ray Data team:

The Ray Data team currently develops and maintains the Ray Datasets library, which is already powering critical production use cases (e.g. large scale data compaction at Amazon, and ML pipeline at Alibaba). Ray Datasets is a Python library built on top of Apache Arrow and Ray Core (Ray’s C++ backend), and the Ray Data team interacts closely with Ray Core components including the scheduler and the memory & I/O subsystems. The Ray Data team also works closely with Ray’s ML libraries including Train, RLlib, and Serve.

A snapshot of projects you will work on:

- Performance of Ray Datasets at large scale (leveraging Arrow primitives, optimizing Ray object manager, etc.)

- Integration with ML training and data sources

- Stability and stress testing infrastructure

- Lead future work integrating streaming workloads into Ray such as Beam on Ray

- Differentiate Data operations in Anyscale hosted Ray service

As part of this role, you will:

We'd love to hear from you if have:

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. 

Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish

Apply at Anyscale →

More open roles at Anyscale

Anyscale ·
Software Engineer (Ray Data)
📍 San Francisco, US · Mid
Software engineer at Anyscale building Ray Data, a Python-native data processing engine for AI workloads. Focus on performance optimization, distributed systems scaling, and fault tolerance for production ML pipelines.
PythonRaydistributed systemsdata processing
75
AI-core
Anyscale ·
Tech Lead Manager, ML Developer Experience
📍 San Francisco, US · Lead
Tech Lead Manager for ML Developer Experience at Anyscale, building tools and infrastructure for Ray-based ML applications. Focus on workspaces, ML Ops tooling, SDKs, and developer-facing platform features.
RayPyTorchMLFlowPython
73
AI-fluent
Anyscale ·
Software Engineer (Ray Core)
📍 San Francisco, US · Senior
Software engineer at Anyscale building Ray Core, a distributed computing backend. Focus on performance optimization, fault tolerance, and reliability of the C++ scheduler and runtime systems.
C++Raydistributed systemsGPU programming
71
AI-fluent
Anyscale ·
Senior / Staff Product Manager - Ray Data
📍 San Francisco, US · Senior
Senior/Staff Product Manager at Anyscale leading Ray Data, a scalable data processing library for ML/AI workloads. Balance open source adoption with commercial differentiation for Anyscale Runtime while owning the product roadmap.
Raydistributed systemsML infrastructuredata processing
69
AI-fluent
Anyscale ·
Software Engineer (Ray Core)
📍 Bengaluru, IN · Mid
Software engineer at Anyscale building Ray Core, an open-source distributed computing framework. Focus on C++ backend development, performance optimization, fault tolerance, and testing infrastructure for scalable ML applications.
C++RayPythondistributed systemsGPU programming
68
AI-fluent