ServiceNow

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Join ServiceNow in Santa Clara as a Senior Staff Machine Learning Engineer. Leverage AI, Kubernetes, and DevOps skills to enhance platform observability.

ServiceNow Role Type:
ServiceNow Modules:
Department - JobBoardly X Webflow Template
DevOps
Department - JobBoardly X Webflow Template
Predictive Intelligence
ServiceNow Certifications (nice to have):

Job description

Date - JobBoardly X Webflow Template
Posted on:
 
June 4, 2025

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer required to design, develop, and implement infrastructure, platform, deployment, and observability features for AI workloads. Must have experience with operating LLMs on NVIDIA GPUs, prompt engineering, and fine-tuning large language models.

Requirements

  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving.
  • Prompt Engineering: proficient in prompt engineering and developing LLM based features
  • Fine tuning: experience with methods of training and fine tuning large language models, such as distilation, supervised fine-tunning and policy optimization
  • 8+ years of experience with infrastructure and platform operations, deployments, SRE, and DevOps with a continued focus on improving Platform health;
  • 6+ years of experience operating highly-available distributed workloads on Kubernetes following a DevOps approach.
  • 6+ years of development experience with Python, GoLang, Java or similar languages;
  • Experience with DevOps tooling (e.g. Helm / Ansible / Kubernetes / Prometheus /Splunk/ GitLab CI);
  • Strong working experience operating distributed systems built on Linux and J2EE;
  • Experience with software-defined networking, infrastructure as code and configuration management;
  • Experience building software for compliance and security in regulated environments
  • Ability to drive outcome in projects with material technical risk.

Benefits

  • Base pay, plus equity (when applicable), variable/incentive compensation and benefits
  • Health plans, including flexible spending accounts
  • 401(k) Plan with company match
  • ESPP
  • Matching donations
  • Flexible time away plan and family leave programs

Requirements Summary

8+ years of experience with infrastructure and platform operations, 6+ years of experience operating distributed workloads on Kubernetes, and 6+ years of development experience with Python, GoLang, Java or similar languages