We are looking for a Site Reliability Engineer with a minimum of 10+ years of hands-on experience to maintain and improve the reliability, availability, and performance of our critical systems. You will work in a 24x7 on-call rotation, ensuring that issues are swiftly identified, triaged, and resolved.
Requirements
- Strong Site Reliability Engineering fundamentals with hands-on experience in managing large-scale, highly available systems.
- CI/CD Experience with a deep understanding of continuous integration, delivery, and deployment pipelines, particularly using GitHub Actions.
- On-call Support experience in managing and resolving incidents in a 24x7 operational environment.
- Strong foundation in ITIL with knowledge of incident, change, and problem management processes.
- AWS Cloud expertise with experience in provisioning, configuring, and managing cloud resources.
- Certificate Management skills to ensure proper handling of digital certificates for system security.
- Observability Tools experience, particularly with Splunk and AppDynamics for system monitoring, performance analysis, and troubleshooting.
- Proficiency in at least one automation tool or scripting language (Python, Shell, etc.).
- ServiceNow Knowledge for incident management and workflow automation.
- Agile/Kanban Methodologies with experience using Jira and Confluence for project management and documentation.
Benefits
- Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions.
- Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.