We are looking for a Lead Site Reliability Engineer to join our team. The candidate will work with multiple engineering teams, providing a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. The role will focus on maximum availability, observability, reliability, security, and performance for Nike Digital Experiences.
Requirements
- Ability to observe, diagnose, and develop fixes for production issues quickly and efficiently
- Ability to develop and drive real-time monitoring solutions that provide visibility into site health and key performance indicators
- Strong communication skills (written and verbal)
- Highly confident and capable of reporting and communicating high-value metrics to leadership
- Working understanding of IT service management (Incident, Problem, Change and Knowledge management)
- Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
- Practical experience in managing and leading application reliability practices for consumer-facing web and mobile experiences
- Demonstrated negotiation and influencing skills
- Passion for coaching, teaching, mentoring and learning
- Bachelor’s degree in computer science, Information Systems, Business, or other relevant subject areas
- 7+ years of professional experience in software development, operations, or support
- Strong design and development experience with Java
- Proficient with JavaScript on the frontend (React, Angular, etc.) and backend (Node.js) components
- Kubernetes working knowledge and experience
- Experience in other modern enterprise languages (functional or other – Scala, Python, Golang, etc.) is preferred
- Basic understanding of DNS, Networking, Virtualization, Linux
- Expertise in designing/building/supporting scalable cloud-based Micro Services
- Experience with Docker and/or Serverless patterns
- Experience with at least one No-SQL database like DynamoDb, Cassandra, etc.
- Good understanding of RESTful APIs
- Basic understanding of common tools for service management, agile, and observability: ServiceNow, Jira, Jenkins, Splunk, New Relic, SignalFx
- Background with ITIL or Lean is a plus