How do I apply for this Performance Engineer, GPU position?

Click the "Apply Now" button on the job listing page to be directed to the application. You will be taken to the employer's application page.

Is this position remote?

This role is based in San Francisco, CA | New York City, NY | Seattle, WA (hybrid - 25% in office). Check the full description for remote or hybrid options.

What is the salary range?

The listed salary range for this position is $280,000 - $850,000. Final compensation may vary based on experience, qualifications, and location.

When was this job posted?

This position was posted 22 days ago. We recommend applying promptly as positions can fill quickly.

Performance Engineer, GPU at Anthropic

About the Role

Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer, you'll architect and implement the foundational systems that power Claude and push the frontiers of what's possible with large language models. You'll be responsible for maximizing GPU utilization and performance at unprecedented scale, developing cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.
Working at the intersection of hardware and software, you'll implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.
Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.

Responsibilities

Architect and implement the foundational systems that power Claude and push the frontiers of what's possible with large language models.
Maximize GPU utilization and performance at unprecedented scale.
Develop cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.
Implement state-of-the-art techniques from custom kernel development to distributed system architectures.
Span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.
Co-design attention mechanisms and algorithms for next-generation hardware architectures.
Develop custom kernels for emerging quantization formats and mixed-precision techniques.
Design distributed communication strategies for multi-node GPU clusters.
Optimize end-to-end training and inference pipelines for frontier language models.
Build performance modeling frameworks to predict and optimize GPU utilization.
Implement kernel fusion strategies to minimize memory bandwidth bottlenecks.
Create resilient systems for planet-scale distributed training infrastructure.
Profile and eliminate performance bottlenecks in production serving infrastructure.
Partner with hardware vendors to influence future accelerator capabilities and software stacks.

Requirements

Have deep experience with GPU programming and optimization at scale
Are impact-driven, passionate about delivering measurable performance breakthroughs
Can navigate complex systems from hardware interfaces to high-level ML frameworks
Enjoy collaborative problem-solving and pair programming
Want to work on state-of-the-art language models with real-world impact
Care about the societal impacts of your work
Thrive in ambiguous environments where you define the path forward
At least a Bachelor's degree in a related field or equivalent experience.

Qualifications

GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization
ML Compilers & Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators
Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight
Distributed Systems: NCCL, NVLink, collective communication, model parallelism
Low-Precision: INT8/FP8 quantization, mixed-precision techniques
Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Lovely office space
Visa sponsorship

Performance Engineer, GPU

About the Role

Responsibilities

Requirements

Qualifications

Benefits

Similar Job Alerts

Anthropic

Frequently Asked Questions

How do I apply for this Performance Engineer, GPU position?

Is this position remote?

What is the salary range?

When was this job posted?

Explore More

Career Resources