Capgemini Engineering

Lead/Staff AI Runtime Engineer (Ukraine)

Posted: 1 days ago

Job Description

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.Your ClientOur client is at the forefront of revolutionizing AI computing by re-engineering infrastructure at the system level. Its architecture, combined with sophisticated software intelligence, abstraction, and an orchestration layer, enables developers to leverage a diverse array of compute resources, achieving efficient and reliable computing at a fraction of the cost. Founded by industry veterans from Nvidia, Apple, Tesla, Intel, and Zoox, it's shaping the future of AI.As the Lead/Staff AI Runtime Engineer, you’ll play a pivotal role in the design, development, and optimization of the core runtime infrastructure powering distributed training and deployment of large AI models. This is a hands-on leadership role - ideal for a systems-minded software engineer who thrives at the intersection of AI workloads, runtimes, and performance-critical infrastructure.Your Role Own the core runtime architecture supporting AI training and inference at scale.Design resilient and elastic runtime features (for example, dynamic node scaling and job recovery) within the custom PyTorch-based stack.Optimize distributed training reliability, orchestration, and job-level fault tolerance.Profile and enhance low-level system performance across training and inference pipelines.Improve packaging, deployment, and integration of customer models in production environments.Design and maintain libraries and services that support the full model lifecycle: training, checkpointing, fault recovery, packaging, and deployment.Implement observability hooks, diagnostics, and resilience mechanisms for deep-learning workloads.Champion best practices in CI/CD, testing, and software quality across the AI Runtime stack.Work cross-functionally with Research, Infrastructure, and Product teams to align runtime development with customer and platform needs.Guide technical discussions, mentor junior engineers, and help scale the AI Runtime team’s capabilities.Your Profile8+ years of experience in systems or software engineering, with deep exposure to AI runtime, distributed systems, or compiler/runtime interaction.Experience in delivering PaaS services.Proven experience optimizing and scaling deep-learning runtimes (such as PyTorch, TensorFlow, or JAX) for large-scale training or inference.Strong programming skills in Python and C++; experience with Go or Rust is a plus.Familiarity with distributed training frameworks, low-level performance tuning, and resource orchestration.Experience working with multi-GPU, multi-node, or cloud-native AI workloads.Solid understanding of containerized workloads, job scheduling, and failure recovery in production environments.Nice to HaveContributions to PyTorch internals or open-source deep learning infrastructure projects.Intel OpenVINOFamiliarity with LLM training pipelines, checkpointing, or elastic training orchestration.Experience with Kubernetes, Ray, TorchElastic, or custom AI job orchestrators.Background in systems research, compilers, or runtime architecture for high-performance computing (HPC) or machine learning.Start-up experience.Ability to travel to the EU.What You Will Love About Working HereWe care about all our employees and want them to feel as comfortable as possible. That's why we offer them health insurance from the first days, regardless of the probationary period.The gift from the company - Christmas holidays from 25 December to 31 December.Сooperation with Superhumans center and Veteran HUB. Capgemini Engineering has supported the launch of psychological rehabilitation department of Superhumans. Our team also donated over UAH 500 000 prosthetics for three Ukrainian defenders. Currently, we support psychological counseling provided by the Veteran Hub, and we have implemented an internal policy making the company friendly to military and veterans with the assistance of the Hub.Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In