Neo Group

Senior SRE Engineer

Posted: 3 days ago

Job Description

Come on board with Neo Group! Here's your chance to stir things up in the scene with us. We're not just expanding; we're revolutionising the entire game, mastering profitability with every new venture. But you know what truly fuels our drive? It's people like you.Neo Group is on the lookout for a Senior SRE Engineer to join our Engineering Department.Responsibilities:Design, deploy, and maintain observability platforms including Zabbix, Grafana, and Opensearch Stack (Opensearch, Logstash, Kibana)Implement and maintain metrics, logs, traces, and synthetic monitoring across infrastructure and applicationsIntegrate Prometheus, Alertmanager and OpenTelemetry where applicable to achieve unified observabilityMaintain monitoring coverage for Linux, network devices, applications, and cloud servicesMaintain and enhance the overall monitoring and logging infrastructure, including capacity, performance, and reliabilityDevelop meaningful dashboards and alerting logic to ensure timely and actionable incident notificationsOptimize alerting systems: reduce noise, tune thresholds, and focus on critical business and technical metricsImprove observability processes and implement predictive failure analysis and early-warning signalsAnalyze incidents, identify patterns, and drive proactive monitoring improvementsDefine and maintain KPIs, SLIs, SLOs, and SLA measurement processes in coordination with service ownersEnhance reliability through structured incident management and post-mortem analysisAutomate deployment and configuration of monitoring components using Ansible, Terraform following IaC principlesManage configuration templates and Zabbix host provisioning through automation tools (Ansible, Terraform following IaC principles)Leverage APIs and scripting (e.g., Python, Go) for data collection, integrations, and automationCollaborate closely with Developers, System Engineers, DevOps, and IT Operations teams to improve system reliability and reduce MTTREstablish and evolve the Monitoring & Diagnostics foundation for the in-house 24/7 App Support team, including tooling, processes, knowledge base, training, runbooks, and troubleshooting guidesCreate intelligent, step-by-step troubleshooting instructions to speed up incident resolutionRequirements4+ years of experience as an SRE, Monitoring Engineer, or similar role in production environmentsAdvanced Linux user with strong command-line and diagnostic skillsStrong understanding of monitoring, logging, and observability concepts (metrics, logs, traces, SLIs/SLOs, alerting)Hands-on experience with at least several of the following:Zabbix, Prometheus, Grafana, Elastic Stack (ELK), Alertmanager, OpenTelemetryExperience managing both cloud-based and on-premise environmentsAutomation skills using Python or GoProficiency with configuration management / IaC tools (Ansible, Terraform or similar)Solid grasp of networking principles and protocols (TCP/IP, HTTP, DNS, load balancing, etc.)Experience with CI/CD pipelines (GitLab, Jenkins or similar)Familiarity with container orchestration (Kubernetes, Rancher)Experience documenting workflows and training support teamsProven skills in incident analysis, pattern recognition, and driving preventive improvementsGood communication skills and ability to work with cross-functional teamsNice to Have:Experience with synthetic monitoring tools and user-experience monitoringBackground in capacity planning and performance tuningAdvanced knowledge of ML-driven monitoring and predictive analysisExperience with automated incident response (self-healing systems)Soft Skills:Responsibility, initiative, and strong analytical thinkingAbility to collaborate effectively within a teamFocus on automation and process improvementStrong documentation and knowledge-sharing skillsCapability to diagnose complex incidents and provide actionable insightsBenefitsEnjoy 3 health days to focus on your well-beingTake advantage of 25 paid calendar vacation days to explore, relax, and unwindGet a $30 net per month sports compensation to stay active and healthyBenefit from top-notch medical insurance for peace of mindIndulge in a variety of snacks available in the officeJoin us for exciting corporate events that foster team spirit and fun!

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In