SiFi

Sr. Site Reliability Engineer (SRE)

Posted: 24 minutes ago

Job Description

This is a remote position.About SiFi: SiFi is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. As a licensed EMI from the Saudi Central Bank, we empower companies with innovative tools to simplify finance management.Position OverviewWe are looking for a Senior Site Reliability Engineer (SRE) who will take ownership of the reliability, performance, and scalability of our production systems. You will design, automate, and operate mission-critical environments that include Kubernetes clusters, database disaster recovery, workflow orchestration, and multi-region networking.This role suits engineers who think deeply about systems — combining infrastructure, automation, and diagnostic reasoning to drive operational excellence.Primary ResponsibilitiesReliability, Availability & Infrastructure Maintain and evolve multi-region cloud infrastructure using Terraform-based Infrastructure as Code (IaC). Operate and optimize Kubernetes (OKE) clusters running microservices, data pipelines, and workflow orchestration. Manage SQL Server backup/restore pipelines, DR testing, and performance optimization. Ensure high availability for .NET and Python applications hosted behind load balancers and WAF. Design and maintain cross-network connectivity (DRGs, LPGs, VCNs, subnets, and NSGs).Observability & Automation Build and maintain a centralized orchestration platform integrated with alerting and notification systems. Develop self-healing, monitoring, and auto-remediation scripts for infrastructure and databases. Implement logging, metrics, and tracing pipelines Automate recurring operational tasks using Python, Bash, and PowerShell to reduce manual effort and improve reliability.DevOps, CI/CD & Security Manage GitHub Actions and Octopus Deploy pipelines for backend and data services. Apply strong security principles — least privilege, network segmentation, secure credentials, and encrypted communications. Promote GitOps and Infrastructure-as-Code practices to ensure repeatable and traceable deployments. Collaborate with developers to embed reliability and resilience into every releaseCollaboration & Incident Management Lead incident response, run blameless post-mortems , and turn findings into lasting improvements. Partner closely with engineering teams to drive design and code-level reliability improvements. Conduct capacity planning, cost optimization, and system tuning for performance and scalability. Mentor engineers in automation, observability, and root-cause analysis best practicesTroubleshooting Mindset & Diagnostic ThinkingWe Value Engineers Who Approach issues systematically and validate assumptions with data. Treat incidents as opportunities to improve design and automation. Rely on metrics, logs, and tracing rather than guesswork. Communicate findings clearly and document learnings for future reference. Continuously refine how problems are detected, escalated, and resolved.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In