Job Description

Position: Head of Cloud EngineeringOverview: A high-impact leadership role overseeing the design, development, and operation of a next-generation AI Cloud Platform built on top of high-performance compute and data center infrastructure. This position is responsible for transforming raw GPU compute resources into a developer-centric, scalable cloud environment optimized for AI applications — from model training and inference to full-stack deployment. The successful candidate will drive the platform vision, ensuring frictionless adoption by AI teams, enterprise customers, and developers.Key Responsibilities:Platform Architecture & Engineering: Lead the development of a robust AI Cloud layer, integrating GPU clusters, orchestration systems, and application deployment tools into a unified platform.Infrastructure Integration: Align cloud architecture with underlying compute and data center layers, ensuring performance, scalability, and reliability.AI Workload Enablement: Build services and frameworks to support AI/ML workloads, including training pipelines, inference endpoints, and secure, multi-tenant environments.Cross-functional Leadership: Collaborate closely with engineering, product, and infrastructure leadership. Drive cloud platform initiatives from conceptualization through to production deployment.Core Qualifications:Cloud Platform Expertise: Proven experience architecting and operating cloud infrastructure across public cloud, high-performance computing (HPC), or GPU-native environments. Expertise in Kubernetes, container orchestration, service meshes, API gateways, and distributed systems.AI Infrastructure Knowledge: Deep understanding of GPU-based cloud environments and AI compute layers, including technologies like MIG partitioning, Triton Inference Server, Ray, Slurm, and ML orchestration stacks. Familiarity with high-speed interconnects (RDMA, NVLink) and scalable storage systems.AI Application Enablement: Experience creating developer-ready environments for machine learning workflows — including model deployment, MLOps integration, and multi-tenant runtime security.Leadership & Execution: Demonstrated success leading cloud engineering, platform, or SRE teams at scale. Track record of delivering complex cloud platforms from ideation through implementation.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period