About the Role
As a Distributed LLM Inference Engineer at Anyscale, you will be instrumental in designing, developing, and optimizing highly scalable and efficient inference systems for large language models (LLMs). This role focuses on leveraging distributed computing frameworks, particularly Ray, to enable real-time and high-throughput LLM inference for various AI applications. You will work on cutting-edge challenges related to model deployment, performance tuning, resource management, and ensuring the reliability of large-scale AI services. Your contributions will directly impact Anyscale's ability to deliver powerful and accessible AI capabilities to developers and enterprises, helping to commercialize Ray and extend its reach in the rapidly evolving field of generative AI. This is a critical role in bringing state-of-the-art LLMs to production at scale.
Benefits
- Base Salary $170,112 – $247,000
- Offers Equity