BMA Group Global

Director of Site Reliability Engineering (SRE)

Posted: 4 days ago

Job Description

About the RoleWe are looking for a highly strategic Director of Site Reliability Engineering (SRE) to lead infrastructure reliability, observability, security operations, and business continuity across hybrid environments. This role ensures maximum uptime, proactive incident prevention, and robust service health for mission-critical data and analytics platforms.Key ResponsibilitiesDefine and lead the SRE strategy, establishing SLAs, SLOs, and SLIs across platforms.Oversee hybrid infrastructure reliability including compute, networking, virtualization, and cloud systems.Lead teams across IAM, RBAC/ABAC, observability, Kubernetes administration, and disaster recovery.Supervise 24×7 SRE operations, major incident response, and root-cause analysis.Partner with DevOps, Data Operations, Product, and Security leadership to embed reliability by design.Implement automation, predictive alerting, telemetry, and capacity planning.Ensure compliance with NIST, ISO, HIPAA, GxP, and other regulatory frameworks.Qualifications10+ years in IT operations or infrastructure engineering.5+ years leading SRE or reliability teams in hybrid-cloud environments.Deep expertise in Kubernetes, cloud platforms (Azure, AWS, GCP), IaC, CI/CD, and enterprise observability tools.Strong leadership abilities with a track record in SOC operations, vulnerability management, and security incident response.Excellent communication and executive stakeholder management skills.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In