Coupang is looking for a Staff Reliability Engineer to ensure stable IT services by operating monitoring systems and processes for IT infrastructure and applications. The role involves defining and driving observability strategy, leading the design and implementation of observability platforms, and conducting gap assessments in existing monitoring setups.
Requirements
- Define and drive the observability strategy and roadmap
- Establish a mature observability framework
- Advocate for observability best practices across engineering, operations, and product teams
- Lead the design, implementation, and optimization of observability platforms
- Evaluate and onboard new tools and technologies
- Ensure scalable and resilient monitoring architectures
- Conduct gap assessments in existing monitoring setups
- Implement automated solutions to address low-hanging fruits
- Continuously refine monitoring configurations
- Build and maintain end-to-end visibility across infrastructure, network, applications, and user journeys
- Integrate observability tools with incident management, ticketing, and reporting systems
- Develop and enforce tagging strategies, metrics standards, and log enrichment practices
- Partner with DevOps, SRE, and application teams
- Provide technical guidance and training to teams
- Support incident response and post-mortem analysis
- Leverage observability data to generate actionable insights
- Create dashboards and reports that provide meaningful visibility to stakeholders
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance