About the Role
NVIDIA is looking for a Senior Software Engineer specializing in AI Inference to join our remote team. In this role, you will be responsible for optimizing and deploying AI models for high-performance inference on NVIDIA GPUs. This involves working on the entire inference stack, from model quantization and compression to developing efficient runtime engines and APIs. You will collaborate with researchers and product teams to bring state-of-the-art AI models from development to production, ensuring they perform optimally across various deployment scenarios. This is a critical role for enabling real-world applications of NVIDIA's AI technology.
Responsibilities
- Develop and optimize software for high-performance AI inference on NVIDIA GPUs.
- Work on model compression, quantization, and pruning techniques.
- Design and implement efficient AI inference runtime engines and APIs.
- Profile and analyze performance bottlenecks in inference workloads.
- Collaborate with AI researchers to integrate new models and features.
- Ensure robust and scalable deployment of AI models.
Requirements
- BS/MS in Computer Science, Electrical Engineering, or a related field.
- 5+ years of experience in software development, with significant focus on AI/ML inference.
- Strong programming skills in C++ and Python.
- Experience with deep learning frameworks (TensorFlow, PyTorch) and AI inference platforms (TensorRT, ONNX Runtime).
- Familiarity with GPU programming (CUDA).
- Solid understanding of computer architecture and performance optimization.
Qualifications
- Experience with distributed systems and cloud deployments.
- Knowledge of various hardware platforms for AI inference.
- Contributions to open-source AI projects.
Benefits
Competitive salary, comprehensive health benefits, retirement plans, paid time off, and opportunities for professional growth in a leading AI company. Flexible remote work environment.