ServiceNow

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Join ServiceNow in Santa Clara as a Senior Staff Machine Learning Engineer. Leverage ServiceNow skills to enhance AI workloads, SRE practices, and DevOps. Benefits include generous PTO, 401k matching, and a four-day work week.

The Mothership

Senior

ServiceNow Role Type:

ServiceNow Modules:

DevOps

Predictive Intelligence

ServiceNow Certifications (nice to have):

Job description

Posted on:

August 7, 2025

Senior Staff Machine Learning Engineer - Site Reliability Engineer at ServiceNow, responsible for designing, developing, and implementing infrastructure, platform, deployment, and observability features that power AI workloads, collaborating with researchers, AI engineers, and infrastructure teams, and contributing to the continuous improvement of the SRE practice.

Requirements

8+ years of experience with infrastructure and platform operations, deployments, SRE, and DevOps with a continued focus on improving Platform health;
6+ years of experience operating highly-available distributed workloads on Kubernetes following a DevOps approach.
6+ years of development experience with Python, GoLang, Java or similar languages;
Experience with DevOps tooling (e.g. Helm / Ansible / Kubernetes / Prometheus /Splunk/ GitLab CI);
Strong working experience operating distributed systems built on Linux and J2EE;
Experience with software-defined networking, infrastructure as code and configuration management;
Experience building software for compliance and security in regulated environments
Ability to drive outcome in projects with material technical risk.
Proficient in prompt engineering and developing LLM based features
Experience with methods of training and fine tuning large language models, such as distilation, supervised fine-tunning and policy optimization
Experience in using AI productivity tools such as Cursor, Windsurf, etc

Benefits

Generous Paid Time Off
401k Matching
Retirement Plan
Visa Sponsorship
Four Day Work Week
Generous Parental Leave
Tuition Reimbursement
Relocation Assistance

Requirements Summary

8+ years of experience with infrastructure and platform operations, 6+ years of experience operating distributed workloads on Kubernetes, 6+ years of development experience with Python, GoLang, Java or similar languages, and ability to drive outcome in projects with material technical risk

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Job description

Requirements

Benefits

Requirements Summary

Apply now

ServiceNow

More job openings

ServiceNow ITAM Principal Technical Consultant

Desktop Support Engineer III

ServiceNow Developer