About the Role
NVIDIA is seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing Retrieval-Augmented Generation (RAG) workflows, and agentic inference. In this role, you will leverage the full NVIDIA software and hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. NVIDIA is widely considered to be one of the world’s most desirable employers due to competitive salaries and a generous benefits package. The company prides itself on having some of the most forward-thinking and hardworking individuals globally, and its best-in-class engineering teams are experiencing rapid growth. If you are a creative and autonomous professional with a strong passion for technology, this opportunity to contribute to cutting-edge generative AI solutions is for you.
Responsibilities
- Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
- Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
- Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
- Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
- Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).
Requirements
- Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
- 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
- Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
- Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
- Competency in agentic inference design and using AI agents to solve business challenges.
- Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.
Qualifications
- Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
- Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
- Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
- In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.
Benefits
Competitive salaries and a generous benefits package.