About the Role
This Site Reliability Engineer will be crucial in ensuring the reliability, scalability, and performance of NVIDIA's hardware infrastructure. The role involves designing, implementing, and maintaining automated systems, monitoring infrastructure health, and responding to incidents to minimize downtime and optimize operational efficiency for large-scale hardware deployments.