Our company is seeking an experienced SRE specialist to join its SRE Practice team. The successful candidate will be responsible for implementing and maintaining a comprehensive reliability solution for on-premises and cloud applications and services.
Requirements
- Solid expertise on the topic of IT reliability
- Extensive experience with application performance management, IT infrastructure monitoring, and user experience monitoring.
- Technical leadership experience.
- Enterprise application, systems, and network monitoring expertise for on-premises and cloud applications.
- Hands-on experience with Dynatrace, Elastic Search, and ServiceNow in instrumenting applications end-to-end with minimal supervision.
- Solid knowledge of AI-OPS, anomaly detection, and event correlation solutions.
- Comfortable with scripting or programming languages (Java, C++, GO, Python)
- Experience with open telemetry.
- Good knowledge of infrastructure protocols to gather element-level event data.
- Good knowledge of open-source monitoring technologies.
- Proficient with data lifecycles and aggregation, reporting, and web dashboards.
- Proficient in ITIL event management and good basis in ITIL foundational concepts.
- Hands-on experience with continuous integration tools.
- Deep knowledge of reliability and Site Reliability Engineering (SRE).
- Infrastructure and Networking: The candidate should be familiar with advanced networking tools like F5, Citrix, Cloudflare, etc. and be able to design custom hardware and software networking solutions.
- Troubleshooting: The candidate should be proficient with advanced log analysis tools like Dynatrace and be able to develop and maintain automated testing and deployment tools.
- Cloud Computing and Virtualization: The candidate should have hands-on experience with AWS, GCP, Azure, VirtualBox, Docker, Kubernetes and advanced cloud infrastructure tools like Terraform, Puppet, or Chef.
- Distributed Systems and Scalability: The candidate should have knowledge of advanced distributed systems tools like Kubernetes and service meshes, and advanced distributed systems tools like Cassandra, Hadoop, or Spark.
- Security and Compliance: The candidate should have knowledge of advanced security tools like HashiCorp Vault, AWS KMS, or Azure Key Vault and security best practices, firewalls, encryption, SSL/TLS.
Benefits
- Financial rewards program that recognizes success
- Industry leading Employee Share Purchase Plan; 50% of net shares purchased is matched
- Extensive flex pension and benefits package, with access to virtual healthcare
- Flexible work arrangements
- Possibility to purchase up to 5 extra days off per year
- Annual wellness account that promotes an active and healthy lifestyle
- Access to tools and resources to support physical and mental health, embracing change and connecting with colleagues
- Dynamic workplace learning ecosystem complete with learning journeys, interactive online content, and inspiring programs
- Inclusive employee-led networks to educate, inspire, amplify voices, build relationships and provide development opportunities