Job description

Posted on:
June 18, 2025
Walmart's Enterprise Business Services (EBS) is a powerhouse of seven exceptional teams delivering world-class technology solutions and services making a profound impact at every level of Walmart. Joining EBS means embarking on a journey of limitless growth, relentless innovation, and the chance to set new industry standards that shape the future of Walmart.
Requirements
- Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR Mean Time to Restore] of large-scale enterprise systems.
- Alert, Monitoring, Log analysis: Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Service Now, JIRA, Dynatrace, Splunk etc [Reduce MTTD Mean Time to Detect].
- Enhance Alerting solutions: Design and implement JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters.
- Disaster Recovery Planning: Requires knowledge of: Disaster recovery procedures and processes; Enterprise disaster recovery systems.
- Performance and Optimization : Requires knowledge of: Unix/Linux performance optimization tuning; Java/NodeJS/Tomcat/Apache tuning and optimization; Chaos tools to utilize established criteria (for example, probability of failure, frequency of failure) to measure site reliability.
- Work on Product Enrichment ; Content Services projects at Walmart: Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Splunk etc, to improve visibility, pro-actively detect issues and restore system availability.
- Develop Tools and support: Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages.
Benefits
- Benefits: Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
- Competitive pay as well as performance-based bonus awards.
- 401(k) match, stock purchase and company-paid life insurance.
- PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
- Short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
Requirements Summary
5+ years of hands-on experience in SRE, Operations, and Development experience with JavaScript, Java, Restful services, Git, Maven, Jenkins, DevOps, Containerization, Docker, Kubernetes, Azure, Google cloud, Kafka, Azure Cosmos, Azure SQL, Mega cache CI/CD, Prometheus, Grafana, Splunk