Tuesday, October 28, 2025
Smartbrain.io

Monitoring and Observability Engineer

Posted: 15 hours ago

Job Description

Full-timeRemotelyGrafana PythonThis role involves designing, implementing, and managing comprehensive monitoring solutions using Prometheus, Grafana, SNMP-Exporter, Streaming Telemetry, OpenTelemetry, and other related technologies.Responsibilities Design, implement, and manage Prometheus-based monitoring solutions, including configurations and alert rules. Develop and maintain interactive and visually appealing Grafana dashboards. Configure SNMP modules/jobs to scrape SNMP metrics for different network technologies in a very optimized way. Strong knowledge of Git to be able to clone working branches, develop, and commit to the main branch. Or other approaches, but show a strong hold on Git usage. Identify and onboard new metrics from various systems and applications, developing data pipelines for metrics collection and storage. Optimize and scale monitoring environments to handle large volumes of metrics and ensure comprehensive monitoring coverage. Implement and manage Streaming Telemetry solutions for real-time data collection and monitoring. Integrate and manage OpenTelemetry for comprehensive tracing and observability across services. Troubleshoot and resolve issues related to data collection, monitoring configurations, and dashboard performance. Ensure proper instrumentation of applications and infrastructure with DevOps, development, and operations teams. Document configurations, procedures, and provide training to team members and stakeholders.Skills Familiarity with network monitoring tools and practices. Extensive experience with Prometheus and related technologies (Alertmanager, Pushgateway, etc.). Strong knowledge of time-series databases and monitoring concepts. Proficiency in writing Prometheus queries (PromQL). Strong experience with Grafana and its ecosystem. Proficiency in creating and managing Grafana dashboards and panels. Knowledge of data visualization principles and best practices. Familiarity with monitoring and observability tools and practices. Strong knowledge of SNMP protocols and network device management. Experience with SNMP-Exporter and its integration with Prometheus. Strong in SNMP module creation and scrape congas for various network technologies. Strong Git experience. Strong understanding of metrics and monitoring concepts. Experience with metrics collection tools (Prometheus, Telegraf, Collectd, etc.). Experience with Streaming Telemetry solutions for real-time monitoring. Experience with OpenTelemetry for tracing and observability. Familiarity with Linux/Unix systems and scripting languages (Bash, Python). Experience with containerization and orchestration tools (Docker, Kubernetes).Qualification Bachelor’s degree in Computer Science, Engineering, or related. 5+ years of experience in monitoring and observability roles. Proficiency in tools like Prometheus, Grafana, PromQL, Alertmanager, Alert Framework, GitHub, SNMP-exporter, Streaming-Telemetry, Otel. Strong coding and scripting skills. Excellent problem-solving abilities and attention to detail. Strong communication and teamwork skills.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

Related Jobs