We are seeking a Site Reliability Engineer - Service Availability Manager to ensure the peak performance and availability of our Enterprise IT infrastructure and services. This role combines proactive site reliability engineering with adept incident command to lead our efforts in minimizing service disruptions and enhancing our technology landscape.
Requirements
- 5+ years of experience in an information technology environment
- 3 years of experience in information technology focused on IT Operations that include troubleshooting complex network, server, storage, and/or application issues.
- 2 years minimum operations experience involving incident, problem, change, and release management that included leading calls and documenting outcomes.
- Undergraduate degree or or equivalent experience/certification.
- Ability to cover shifts in a 24x7x365 environment and on-call responsibilities.
- Proficiency in scripting languages (Python, Shell) and familiarity with automation tools (such as Ansible, Jenkins).
- Experience with cloud platforms (AWS, Azure, GCP), infrastructure as code, and containerization technologies.
- Experience in incident command or incident management in a technology environment.
- Strong problem-solving, organizational, and analytical skills.
- ITIL Foundations v3+ Certification.
- Demonstrated experience with ITSM suites, e.g., ServiceNow.
- Demonstrated experience with various monitoring, performance, or capacity tools.
- Experience with continuous integration/continuous deployment (CI/CD) pipelines and DevOps practices.
- Familiarity with Site Reliability Engineering principles and concepts.
- Strong leadership qualities, including decisiveness, and the ability to motivate teams, along with the ability to manage stressful situations calmly and effectively.
- Ability to create constructive relationships, influence, and communicate with varying levels of associates and management.
- Ability to solve complex, cross-functional issues.
- Strong knowledge of Server, Storage, Network, Middleware, Application and Cloud technologies.
- A high degree of curiosity and a drive to seek more efficient ways of delivering service.
Benefits
- medical
- dental
- vision
- health care flexible spending account
- dependent care flexible spending account
- life insurance
- disability insurance
- accident insurance
- adoption expense reimbursements
- paid parental leave
- 401(k) plan
- stock purchase plan
- discounts at Marriott properties
- commuter benefits
- employee assistance plan
- childcare discounts