Job Description

MissionJoin the engineering team to support our cloud migration roadmap to a modern, scalable, and compliant ML infrastructure on Google Cloud Platform (GCP).You will learn how to operate production-grade ML systems under the mentorship of a Lead MLOps Engineer, while contributing directly to infrastructure modernization, CI/CD pipelines, and MLOps workflows.Responsibilities•         Support our cloud migration roadmap to modern GCP architecture.•         Implement security & performance improvements: Cloud Armor, CDN, KMS encryption, IAM policies.•         Build observability infrastructure: Prometheus, Grafana dashboards, distributed tracing, SLO monitoring.•         Deploy containerized workloads to Cloud Run and GKE Autopilot with autoscaling and GPU support.•         Set up async messaging with Pub/Sub queues, idempotency, and dead-letter queues.•         Build Infrastructure as Code with Terraform/Terraspace and CI/CD pipelines (GitHub Actions).•         Optimize storage & costs: lifecycle policies, tiered storage, resource labeling, budget monitoring.•         Support compliance initiatives: audit logs, retention policies, least-privilege access (FDA/HIPAA).•         Write clear documentation and operational runbooks for all infrastructure components.Stack You’ll Work With•         Cloud: GCP (Cloud Run, GKE Autopilot, Cloud Armor, Pub/Sub, GCS, Cloud CDN, KMS)•         IaC & CI/CD: Terraform, Terraspace, GitHub Actions, OPA/Conftest•         Languages: Python, Node.js, Bash, YAML/HCL•         Containers: Docker, Kubernetes•         Database: MongoDB, Cloud SQLIdeal Profile•         0–2 years of experience in DevOps or Cloud.•         Basic understanding of cloud services (GCP preferred, AWS/Azure transferable).•         Familiarity with Linux, Docker, and Git.•         Curious about ML infrastructure and automation.•         Eager to learn under mentorship from a Lead MLOps Engineer.•         Good communication and documentation skills in English.What You’ll Learn•         Design and deploy production-grade infrastructure on GCP.•         Build CI/CD pipelines and Infrastructure as Code with Terraform.•         Implement observability with SLOs, distributed tracing, and monitoring dashboards.•         Manage MLOps workflows (experiment tracking, model versioning, GPU workloads).•         Apply cloud security best practices (IAM, encryption, compliance).•         Optimize costs and performance in a regulated medical AI environment.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In