Zuora

Site Reliability Engineer

Posted: 12 minutes ago

Job Description

Costa RicaCompany OverviewAt Zuora, we power the world’s shift to Modern Business. We’re helping people and companies subscribe to a better way of doing business—one that’s built on recurring relationships instead of one-time transactions, creating more value for customers, companies, and the planet.As pioneers of the Subscription Economy, our platform and expertise help the world’s most innovative organizations—from disruptive startups to global enterprises—monetize new business models, nurture long-term subscriber relationships, and optimize their digital experiences.Join us as we transform industries and shape the future of how businesses grow.Our Tech Stack: Linux Administration, Python, Docker, Kubernetes, MySQL, Kafka, ActiveMQ, Tomcat App & Web, Oracle, Load Balancers, REDIS Cache, Debezium, AWS, WAF, LBs, Jenkins, GitOps, Terraform, Ansible, Puppet, Prometheus, Grafana, Open TelemetryThe Team & RoleJoin Zuora’s high-impact Operations team and help power the backbone of our industry-leading SaaS platform.In this role, you’ll be at the center of maintaining and enhancing the reliability, scalability, and performance of Zuora’s core systems — ensuring our customers around the world enjoy a seamless experience every time.We’re looking for an engineer who thrives on solving complex operational challenges, loves building automation-first solutions, and is passionate about driving innovation through AI and modern infrastructure practices.Make a measurable impact: Your work directly affects system uptime, performance, and customer satisfaction across Zuora’s global platform.Build the future of operations: Shape how we leverage AI/ML for predictive monitoring, self-healing systems, and intelligent automation.Collaborate across disciplines: Partner with Product Engineering, Customer Support, Deal Desk, Global Services, and Sales to deliver a world-class, customer-centric operational model.Work with cutting-edge tech: From Kubernetes to Kafka, Terraform to OpenTelemetry — you’ll use (and improve) the tools that define modern cloud infrastructure.This is a hybrid position, so you'll work both remotely and in the office.Our Tech Stack: Linux Administration, Python, Docker, Kubernetes, MySQL, Kafka, ActiveMQ, Tomcat App & Web, Oracle, Load Balancers, REDIS Cache, Debezium, AWS, WAF, LBs, Jenkins, GitOps, Terraform, Ansible, Puppet, Prometheus, Grafana, Open TelemetryWhat You’ll DoDesign and implement intelligent automation for infrastructure lifecycle management — including self-healing, anomaly detection, and automated remediation using IaC and AI-driven tooling.Apply AI/ML techniques for predictive monitoring and proactive performance optimization to prevent outages before they happen.Lead complex incident response and root cause analysis (RCA) efforts, embedding automation and learning into postmortems.Identify and remove reliability bottlenecks using dynamic scaling, telemetry instrumentation, and automated tuning.Continuously enhance runbooks and playbooks by integrating machine learning insights and automating manual tasks.Stay on the cutting edge of AIOps, distributed systems, and cloud-native reliability practices — and bring those learnings to influence strategic engineering decisions.Your ExperienceStrong hands-on experience in Linux Administration and Python Development.Experience working with Agentic AI or multi-agent frameworks to amplify operational capabilities.Deep expertise with Docker and Kubernetes, managing scalable, high-availability environments.Familiarity with Kafka, ActiveMQ, MySQL, Oracle, Redis, and modern caching/messaging systems.Understanding of AI/ML-based anomaly detection and predictive operations.Proven ability in incident management, RCA, and building systems that prevent recurrence.Experience designing and maintaining CI/CD pipelines, with strong observability and reliability focus.Proficiency with Prometheus, Grafana, and OpenTelemetry for real-time monitoring and anomaly detection.A continuous learning mindset and a passion for automation, innovation, and operational excellence.1+ years of experience in a SaaS or cloud-native environment.Nice to haveCertificationsRed Hat Certified System Administrator (RHCSA)AWS / Azure / GCP CertificationsPython Institute PCAP (Certified Associate in Python Programming)Docker Certified Associate (DCA) or Certified Kubernetes Administrator (CKA)SRE or advanced operations certificationsExperience with Jenkins, Terraform, and advanced infrastructure-as-code practices.#ZEOLife at ZuoraAt Zuora, we’re constantly learning, innovating, and growing. Our people—known as ZEOs—are empowered to take ownership, challenge the status quo, and make a lasting impact.We collaborate deeply, think boldly, and support one another to make what’s next possible—for our customers, our communities, and each other.We OfferCompetitive compensation, bonus opportunities, and retirement programsComprehensive medical, dental, and vision coverageGenerous, flexible time offPaid holidays, wellness days, and a company-wide year-end break6 months of fully paid parental leaveLearning & development stipendOpportunities to give back, including volunteer time and donation matchingMental wellbeing resources and support(Benefits may vary by location; details will be shared during the interview process.)Location & Work ArrangementsZuora teams are empowered to design flexible, intentional ways of working. Whether remote, hybrid, or in-office, we balance flexibility with accountability—to each other, our customers, and our mission.For most roles, you’ll have the freedom to work where you’re most productive while staying connected to your team and the broader ZEO community.Our Commitment to an Inclusive WorkplaceThink, be and do you! At Zuora, different perspectives, experiences and contributions matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all.Zuora does not discriminate on the basis of, and considers individuals seeking employment with Zuora without regards to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to assistance@zuora.com

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In