Job Description

We are currently seeking an experienced Lead AI DevOps/SRE to join our team.In this pivotal role, you will collaborate closely with data scientists and software developers to ensure seamless integration and optimize the operational efficiency of our AI deployments. Your expertise will be pivotal in deploying, maintaining, and scaling our cutting-edge AI solutions, encompassing LLMs and RAG systems. As a key team member, you will spearhead both traditional DevOps responsibilities and innovative approaches to MLOps. Your proactive involvement will be essential in driving the success of our AI initiatives and maximizing their impact across the organization. ResponsibilitiesImplement and maintain CI/CD pipelines for AI and machine learning projects, ensuring robust deployment strategies and continuous integrationMonitor and ensure the reliability, availability, and performance of AI applications, particularly those involving LLMs and RAGCollaborate with AI research teams to operationalize machine learning models and systems efficientlyDevelop and enforce best practices for version control, configuration management, and testing of AI-driven software solutionsUtilize MLOps tools such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to streamline the machine learning lifecycle from experimentation to productionImplement monitoring solutions that track both system metrics and model performance to facilitate proactive issue resolutionParticipate in on-call rotations to support the operational health of critical systems, employing SRE principles to meet service-level objectives (SLOs) and reduce downtime RequirementsBachelor’s degree in Computer Science, Engineering, or a related fieldProven experience as a DevOps Engineer or SRE, with a strong background in software development and automationExpertise in deployment and management of LLMs, including technologies like RAGProficient in CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure as code (Terraform, Ansible)Solid knowledge of container orchestration technologies (Kubernetes, Docker)Familiarity with MLOps tools and practices to support machine learning lifecycle management Nice to haveExperience with cloud services (AWS, GCP, Azure), particularly in AI/ML deploymentsBackground in monitoring tools like Prometheus, Grafana, and ELK stackUnderstanding of Python, particularly in data science and machine learning contextsCertification in Kubernetes, AWS/GCP/Azure, or similar technologies We offerWe connect like-minded people: Delivering innovative solutions to industry leaders, making a global impactEnjoyable working environment, whether it is the vibrant office or the comfort of your own homeOpportunity to work abroad for up to two months per yearRelocation opportunities within our offices in 55+ countriesCorporate and social eventsWe invest in your growth:  Leadership development, career advising, soft skills and well-being programsCertifications, including GCP, Azure and AWSUnlimited access to LinkedIn Learning and Get AbstractFree English classes with certified teachersDiscounts in local language schools, including online courses for the Kazakh languageWe cover it all: Participation in the Employee Stock Purchase PlanMonetary bonuses for engaging in the referral programMedical & family care packageSix trust days per year (sick leave without a medical certificate)Coverage of psychology sessions of your choiceBenefits package (sports activities, a variety of stores and services) Immerse yourself in our collaborative culture by working on-site at our office in Astana, Almaty or Karaganda. Unlock the potential of remote work in Kazakhstan, giving you the flexibility to work from home or access our offices in Astana, Almaty or Karaganda. EPAM is a team of technologists and innovators united by a passion for technology. In Kazakhstan, we operate across all cities with offices in Astana, Almaty, and Karaganda and work with the world's leading companies from different industries. In 2023, EPAM received the Export Excellence Award at the esteemed Digital Bridge Awards, showcasing our commitment to excellence and innovation. 

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In