A Senior Site Reliability Engineer will ensure reliability, availability, and performance of Azure services by designing scalable, secure systems, automating operations, managing incidents, and collaborating across teams for continuous improvement and robust disaster recovery.
Requirements
- Ensure the reliability, availability, and performance of Azure-based services and infrastructure, meeting strict SLAs and business requirements.
- Design, implement, and maintain highly scalable, resilient, and secure systems within Azure environments.
- Automate repetitive operational and deployment tasks using scripting (Python, Go, Bash), infrastructure-as-code (Terraform, Bicep, Ansible), and CI/CD pipelines to streamline processes and reduce manual intervention.
- Monitor system performance using advanced tools (Azure Monitor, Prometheus, Grafana), proactively identify issues, and implement solutions to prevent service disruptions.
- Lead incident response, perform root cause analysis, and manage post-incident reviews to ensure continuous improvement and reliability.
- Develop, document, and enforce best practices for system operations, security, and compliance within Azure environments.
- Work closely with development, security, and operations teams to enhance system design, implement security controls, and support modern application platforms (Docker, Kubernetes).
- Participate in on-call rotations to provide rapid response and resolution for critical incidents.
- Utilize IT Service Management tools (ServiceNow, Jira) for incident tracking, change management, and security automation.
- Collaborate with cross-functional teams to analyze trends, resolve persistent issues, and implement enhancements to products and processes.
- Demonstrated experience in team leadership and mentoring is required.
- Must possess knowledge of Scrum, ITIL, Agile methodologies, ISO 27001 ISMS processes and standards, and have experience interfacing with external auditors
Benefits
- Competitive remuneration package
- Host of perks including healthcare, education support, leave benefits and more
- Hybrid work policy
- Outstanding learning, development & growth opportunities
- Open, diverse and inclusive environment with a global vision