Royal Caribbean Group

Director, Digital Reliability Engineering

Director, Digital Reliability Engineering at Royal Caribbean Group (Miami, FL). Lead global SRE strategy, manage incident/problem/change processes via ServiceNow, oversee hybrid tech stack reliability. 15+ yrs ops experience, 8+ leadership. ITIL/ServiceNow certification preferred.

Department - JobBoardly X Webflow Template
Direct Hire
Job Level - JobBoardly X Webflow Template
Expert/Leadership
ServiceNow Role Type:
ServiceNow Modules:
Department - JobBoardly X Webflow Template
DevOps
Department - JobBoardly X Webflow Template
IT Asset Management
Department - JobBoardly X Webflow Template
IT Service Management
Department - JobBoardly X Webflow Template
Incident Management
Department - JobBoardly X Webflow Template
Problem Management
Department - JobBoardly X Webflow Template
Security Operations
ServiceNow Certifications (nice to have):
Department - JobBoardly X Webflow Template
Certified Implementation Specialist - IT Service Management

Job description

Date - JobBoardly X Webflow Template
Posted on:
 
November 14, 2025

Director, Digital Reliability Engineering role at Royal Caribbean Group, leading global Technology Operations portfolio for Digital organization, ensuring reliability, availability, and performance of guest-facing pre-cruise platforms.

Requirements

  • Define and execute the global SRE strategy for Digital Operations, aligning with business priorities and Royal Caribbean’s long-term technology vision.
  • Build and nurture a culture of reliability, resilience, and continuous improvement across all digital platforms.
  • Drive initiatives to maintain zero downtime by rapidly addressing issues, conducting root cause analysis, and implementing remediations.
  • Lead global site reliability and operations teams across onshore, nearshore, and offshore locations while actively engaging in day-to-day challenges.
  • Actively participate in major incident response, including log analysis, recovery validation, and executive updates.
  • Lead problem bridges, collaborating across technical and functional teams for timely issue resolution.
  • Partner with engineers to diagnose, troubleshoot, and resolve critical issues in real time, demonstrating technical credibility.
  • Strengthen ITSM processes (Incident, Problem, Change, Major Incident) using tools like ServiceNow, PagerDuty, and JIRA.
  • Lead engineering support for production issue remediation, ensuring timely root-cause analysis, resolution, and prevention of recurring problems.
  • Manage and prioritize ongoing maintenance activities, patches, upgrades, and operational improvements across the digital technology stack.
  • Establish strong feedback loops with product and engineering teams so that recurring issues and operational pain points are systematically eliminated.
  • Work directly with teams to ensure the reliability of a hybrid technology stack spanning: Mobile, Web, Backend Services, Commerce, and Cloud Infrastructure.
  • Champion observability and performance practices leveraging platforms such as Splunk, Dynatrace, Prometheus, Quantum Metric / RUM tools.
  • Promote automation, chaos engineering, and AI-driven anomaly detection to strengthen system resilience.
  • Guide teams in Infrastructure as Code, and modern operational tooling.
  • Oversee all environment activities, including new code deployments.
  • Recruit, mentor, and develop global SRE talent while modeling hands-on technical engagement.
  • Manage vendor and partner teams with the same “roll-up-your-sleeves” approach as internal teams.
  • Deliver executive-ready dashboards and insights to communicate the health of digital operations.
  • Own and manage the Operational Expenditure (OPEX) budget for Digital Operations, ensuring efficient allocation of resources while balancing reliability, scalability, and cost optimization.
  • Provide transparency into operational spend through regular reporting and executive updates.
  • Partner with Finance and Procurement to negotiate, track, and optimize vendor contracts and third-party services.
  • Ensure budget discipline while identifying opportunities for automation and efficiency improvements to reduce operational costs without compromising reliability.

Benefits

  • Competitive compensation and benefits package
  • Excellent career development opportunities
  • Global experience
  • Resiliency mindset
  • Leadership by example
  • Strategic thinking
  • Maintenance and communication
  • Engineering collaboration
  • Communication
  • Financial responsibilities
  • Working conditions

Requirements Summary

Director, Digital Reliability Engineering role at Royal Caribbean Group, leading global Technology Operations portfolio for Digital organization, ensuring reliability, availability, and performance of guest-facing pre-cruise platforms. 15+ years of experience in technology operations, including 8+ years in global leadership roles. Bachelor’s or Master’s degree in Computer Science, Engineering, or related field