About the Role
The Inference team at Anthropic is crucial for building and maintaining systems that serve Claude to millions globally, bringing the AI model to life through large-scale, compute-agnostic inference deployments. This involves managing the entire stack, from intelligent request routing to fleet-wide orchestration across diverse AI accelerators. The team's dual mandate is to maximize compute efficiency for customer growth and enable breakthrough research by providing high-performance inference infrastructure for next-generation model development. The Staff Software Engineer will tackle complex, distributed systems challenges across multiple accelerator families and emerging AI hardware in various cloud platforms. This role requires end-to-end involvement, identifying and addressing infrastructure blockers to serve Claude and support AI research.
Responsibilities
- Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
- Autoscaling our compute fleet to dynamically match supply with demand across production, research, and experimental workloads
- Building production-grade deployment pipelines for releasing new models to millions of users
- Integrating new AI accelerator platforms to maintain our hardware-agnostic competitive advantage
- Contributing to new inference features (e.g., structured sampling, prompt caching)
- Supporting inference for new model architectures
- Analyzing observability data to tune performance based on real-world production workloads
- Managing multi-region deployments and geographic routing for global customers
Requirements
- Significant software engineering experience, particularly with distributed systems
- Results-oriented, with a bias towards flexibility and impact
- Pick up slack, even if it goes outside your job description
- Want to learn more about machine learning systems and infrastructure
- Thrive in environments where technical excellence directly drives both business results and research breakthroughs
- Care about the societal impacts of your work
- Bachelor's degree in a related field or equivalent experience
Qualifications
- High-performance, large-scale distributed systems
- Implementing and deploying machine learning systems at scale
- Load balancing, request routing, or traffic management systems
- LLM inference optimization, batching, and caching strategies
- Kubernetes and cloud infrastructure (AWS, GCP)
- Python or Rust
Benefits
- Annual Salary: £325,000 - £390,000GBP
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Lovely office space in which to collaborate with colleagues