Narro.com is seeking a Senior Staff Engineer to join their cloud infrastructure team. This role focuses on building and optimizing their cloud-based products and services. The ideal candidate will have experience with observability and monitoring tools, and a strong understanding of alert management, automation, and AI/ML for Ops. They work at scale across multiple devices and digital mediums.
Requirements
- Total experience 10+ years.
- Strong working expertise with observability and monitoring tools: Splunk, Datadog, ELK, Prometheus, or similar.
- Proven experience in anomaly detection, alert tuning, event correlation, and custom dashboards.
- Deep understanding of alert deduplication, incident impact scoring, and automation frameworks.
- Hands-on with automation platforms (Rundeck, StackStorm, Jenkins, or custom scripting).
- Strong Python expertise (scripting & automation) and proficiency in Bash or other scripting languages.
- Experience in leveraging AI/ML for Ops: log analysis, chatbot incident assistance, predictive alerts.
- Knowledge of multi-cloud platforms and tools like PolyCloud, Terraform, or CloudFormation.
- Strong experience with ITSM tools (ServiceNow, Remedy) and their integration into AIOps pipelines.
- Expertise in integrating ServiceNow via REST/SOAP APIs for incident automation, CMDB sync, and workflow orchestration.
- Working knowledge of ITIL processes and how AIOps enhances Incident, Problem, and Change Management.
- Exposure to CMDB integration, dependency graphs, and service maps for contextual alerting and automation.
- Excellent communication and collaboration skills.
- Understanding of client business use cases and translating them into technical design.