We are seeking an AIOps Lead (Principal, IT Software Engineer 2) to drive the adoption and execution of Artificial Intelligence for IT Operations (AIOps) practices across the organization. The ideal candidate will have a strong background in IT operations, SRE, and AIOps tools, with experience in leading cross-functional teams to drive innovation in IT operations automation and monitoring.
Requirements
- Bachelor's degree in computer science or engineering, or related field.
- 5 – 7 years required, 7+ years preferred, of experience in IT operations, DevOps, or site reliability engineering, with at least 2 years in AIOps-related roles.
- Strong experience with AIOps tools such as Moogsoft, BigPanda, Splunk, Dynatrace, Datadog, ServiceNow, xMatters or similar.
- Solid understanding of machine learning algorithms and their application in IT operations.
- Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- 3+ years of experience with Dynatrace SaaS, DQL, and Logs on Grail or similar.
- Strong scripting/automation skills in Python, Perl, Shell, and JavaScript.
- Experience with automation, DevOps, GitOps, CI/CD, and IaC tools (Terraform, Jenkins, GitHub, Ansible).
- Experience integrating and automating ITSM tools like ServiceNow, xMatters, PagerDuty, JIRA.
- Hands on experience in building and operating open-source observability tools like ELK, Grafana, Prometheus fluentd, fluent bit, Loki, OpenTelemetry, OpenSearch, and Thanos.
- Experience in designing and implementing observability and AIOPS solutions for complex, distributed systems.
- Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions (both frontend and backend).
- Experience with OS: Linux & Windows, Java, NodeJS, ReactJS, databases: Oracle, Casandra, Kafka, MuleSoft, Salesforce, networking.
- Expertise in incident management, monitoring systems, and ITSM processes.
- 2+ years of experience leading engineering teams in Observability, SRE, Platform, Infrastructure, or Application organizations.
- Excellent communication, collaboration, and problem-solving skills.
- Proficient in developing and maintaining technical documentation, runbooks, and process.
Benefits
- Market-competitive compensation structure
- Total compensation package includes salary, bonus, and benefits
- Remote work opportunity