NVIDIA is seeking a Network Site Reliability Engineer to join its Enterprise Network Operations and SRE team. The ideal candidate will be passionate about network operations and committed to enhancing the user experience. They will have the opportunity to solve complex network challenges using hands-on debugging and by focusing on network automation, observability, documentation, and operational excellence.
Requirements
- BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
- A minimum of 10 years of industry practice in network operations or related fields concentrating on automation & site reliability engineering.
- Familiarity with both enterprise and the data center networks is critical.
- Proficiency in network fundamentals & fixing complex network issues with expertise in network tech like TCP/UDP, IPv4/IPv6, Wireless, BGP, ISIS, VPN, L2 switching, Firewalls, Load Balancers, Data Center Network technologies, etc.
- Monitoring Tools: Familiarity with network management tools such as Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda.
- Network Automation: Expertise in automating networks using frameworks such as Salt, Ansible, or similar.
- Process & Service Tooling: Skills with ServiceNow, Jira & foundational knowledge of ITIL framework
- System Administration: Knowledge of Linux system fundamentals.
- Problem-Solving and Communication: Detailed problem-solving approach, critical thinking, coupled with good interpersonal skills and a solid grasp of ownership and drive.
Benefits
- Eligible for equity and benefits