The Guardian Technology Operations Team is seeking a TechOps Engineer to work closely with Development, and DevOps teams to ensure that systems are reliable, scalable, and performant. The role combines software engineering and IT operations to manage infrastructure and create scalable and highly reliable software systems.
Requirements
- 5+ years of experience in managing IT Systems, OS environments, especially Linux
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent work experience
- Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation)
- Flexibility to work in 24x7x365 shifts on rotational basis
- Must be comfortable working in a highly critical, fast-paced environment with shifting priorities
Benefits
- Automate operational tasks, such as deployments, changes, and incident response
- Use Infrastructure as Code (Terraform) and other configuration management capabilities to implement changes in the Cloud environment
- Provide operational application deployment support using standardized code deployment pipelines
- Excellent experience in running and managing Linux and Windows Servers/Compute environments, with Linux being primary
- Collaborate with DevOps and Application Support teams to improve system reliability and performance
- Actively participate in incidents and root cause analysis for production incidents and implement preventative measures
- Monitor system performance, identify potential issues, and proactively address them
- Participate in on-call rotation to provide 24/7 support for production systems
- Document processes, procedures, and system configurations
- Good exposure in running/managing Network, Middleware, Database, applications, and InfoSec environments
- Excellent Communications skills
- Good exposure in managing Situation/Outages
- Build good understanding on Business application and their dependencies on IT Infrastructure
- Monitor the ServiceNow ticket queue and event monitoring tools (Zenoss) for incoming incidents & requests
- Initiate/attend/participate on outage and work towards resolution
- Suggest defects and product/infrastructure enhancements to improve stability and automation