Service Delivery and Incident Response Lead
Posted: 4 days ago
Job Description
OpsWerks is a technical consulting company specializing in operational services for the high-tech industry. We help platform and infrastructure teams operate multi-cloud environments, execute complex migrations, and enable seamless app deployments.About The JobWe’re looking for a Service Delivery & Incident Response Lead who thrives at the intersection of people’s leadership, operational reliability, and continuous improvement. You’ll lead engineers supporting mission-critical cloud and infrastructure environments, ensuring stability, responsiveness, and operational excellence 24×7.This role combines real-time incident command with team development, process optimization, and cross-functional collaboration to keep our systems and our team performing at their best.Your RolePeople & Team LeadershipLead, coach, and mentor IT engineers to build strong technical and leadership capabilities.Set clear performance goals aligned with our Beliefs, Vision, Mission, Methods (BVMM).Conduct 1:1s, performance reviews, and career growth discussions.Foster a culture of ownership, collaboration, and continuous learning.Maintain balanced workloads, shift coverage, and clear succession plans to sustain healthy 24×7 operations.Service Operations & ReliabilityOversee daily service health, capacity, and reliability across all supported environments.Ensure compliance with operational KPIs through proactive planning and improvement.Balance demand vs. capacity and manage shift coverage to prevent burnout.Partner with engineering teams to maintain runbooks, knowledge bases, and escalation paths.Drive automation and workflow optimization to reduce manual overhead.Use data insights to guide decisions and improvements.Incident & Problem ManagementLead end-to-end incident response, triage, communication, and resolution in real time.Act as Incident Commander for high-impact events across a global environment.Track and improve metrics like MTTD, MTTM, and MTTR.Champion blameless Post-Incident Reviews (PIRs) and translate learnings into long-term system and process improvements.Strategic & Cross-Functional ImpactRepresent in customer reviews, operational syncs, and briefings.Collaborate with SREs, product owners, and partner engineers to align priorities and reliability goals.Contribute to frameworks and governance initiatives.Lead service onboarding/off-boarding and strengthen operational readiness checkpoints.Identify and close systemic operational gaps through process and tool improvements.Your QualificationsBachelor’s degree in Computer Science, Information Technology, Engineering, or a related discipline.3+ years in Service Delivery, Incident Response, or Operations Leadership within enterprise-scale, 24×7 environments.Proven experience managing technical teams, driving performance, and leading through critical situations.Strong grounding in ITSM / ITIL principles (Incident & Problem Management).Familiarity with cloud, distributed systems, or enterprise infrastructure.Skilled in monitoring, alerting, and ticketing tools (e.g., PagerDuty, Datadog, Grafana, Splunk, ServiceNow).Core CompetenciesPeople and Performance LeadershipIncident Command and Escalation ManagementAnalytical and Problem-Solving SkillsCommunication and Decision-Making Under PressureRoot Cause and Post-Incident AnalysisOperational Planning and Service GovernanceStakeholder and Partner ManagementIT Service Management (Incident & Problem Management)Observability, Monitoring, and Automation ToolsPassion for People Development, Operational Discipline, and Continuous ImprovementGood to HaveITIL V3 or V4 certificationAWS Certified SysOps AdministratorSRE Foundation or Crisis/Incident Management certificationsBackground in SRE practices and operational frameworks that promote reliability and automationWhat You’ll Help Us MaintainEnterprise-grade reliability: Ensuring highly available, resilient systems powering global business operations.Customer-grade experience: Seamless, always-on access to applications, cloud workloads, and core services.
Job Application Tips
- Tailor your resume to highlight relevant experience for this position
- Write a compelling cover letter that addresses the specific requirements
- Research the company culture and values before applying
- Prepare examples of your work that demonstrate your skills
- Follow up on your application after a reasonable time period