We are looking for a Senior Site Reliability Engineer to join our Cloud Operations team. As an SRE, you will be responsible for maintaining and developing the reliability, scalability, and performance of our cloud infrastructure. You will use your software development, systems engineering, and networking expertise to proactively prevent repeatable issues and drive initiatives to improve the reliability and performance of our infrastructure.
Requirements
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving
- Solid understanding of Linux systems, networking, and container security
- Proficiency with infrastructure-as-code tools like Terraform and Ansible
- 4+ years of experience in SRE, DevOps, or cloud infrastructure role
- 4+ years of experience programming/scripting skills in Python, Go, Bash, and JavaScript
- 4+ years of experience with Linux System Administration with deep knowledge of Linux systems
- 4+ years of experience operating and scaling Kubernetes in production environments
- Knowledge of database technologies including MySQL, MariaDB, and PostgreSQL
- Expertise with GitLab CI/CD and modern software delivery practices
- Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
- Experience with Cloud technologies, Azure, AWS, and GCP
- Ability to leverage AI technologies to enhance system reliability, automate operational tasks, and optimize performance monitoring and incident response processes
- Team-first attitude and an uncompromising attention to detail
- Excellent collaboration and communication skills
- Experience developing on the ServiceNow Platform is a bonus!
Benefits
- Base pay of $126,700 - $215,400
- Equity (when applicable)
- Variable/incentive compensation
- Benefits
- 401(k) Plan with company match
- ESPP
- Matching donations
- Flexible time away plan
- Family leave programs