About the Role
Ray aims to provide a universal API for building distributed applications. To achieve this goal requires a distributed system with high levels of performance and reliability. We're looking for engineers with systems software experience that are interested in contributing to the Ray backend. The Ray Core team develops and maintains the Ray C++ backend (e.g., distributed scheduler, language runtime integration, I/O and memory subsystems). We are responsible for the reliability, scalability, and performance of Ray as well as ensuring that Ray provides the right feature set to support higher level libraries and use cases. The team works on a balance of new features / distributed libraries, test infra improvements, debugging, and longer-term architectural improvements to Ray.
Responsibilities
- Develop high quality open source software to simplify distributed programming (Ray)
- Identify, implement, and evaluate architectural improvements to Ray core
- Improve the testing process for Ray to make releases as smooth as possible
- Communicate your work to a broader audience through talks, tutorials, and blog posts
Requirements
- At least 2 year of relevant work experience
- Solid background in algorithms, data structures, system design
- Experience in building scalable and fault-tolerant distributed systems
Qualifications
- Knowledge of distributed model training and inference (e.g. tensor parallel, pipeline parallel) is preferred
- Knowledge of GPU programming is preferred