Bagel Labs

Machine Learning Engineer

Posted: 2 hours ago

Boost Your Application

Stand out with our professional, ATS-friendly resume templates designed to get you noticed by recruiters.

Download Resume Templates

Job Description

Bagel Labs is an Artificial Intelligence Research Lab developing novel methods for distributed training of frontier diffusion models on commodity hardware. Our work enables training of state-of-the-art generative models for image, video, and world modelling, without centralized GPU superclusters, reducing training compute capex by up to 50%.We ignore years of experience and pedigree. If you have high agency — meaning your default assumption is that you can control the outcome of whatever situation you are in — we want to hear from you. Every requirement below is flexible for a candidate with high enough agency and tolerance for ambiguity.Role DescriptionYou will build and run the systems that make decentralized diffusion training work in practice. Training pipelines, inference serving, GPU orchestration across commodity hardware — you own the engineering end-to-end.Key ResponsibilitiesBuild and maintain distributed training pipelines across heterogeneous, commodity GPU hardware.Profile and optimize training throughput, memory usage, and fault tolerance. Write custom CUDA/Triton kernels when needed.Design and operate inference infrastructure: batching, routing, serving large generative models.Ship experiment tracking, CI/CD, and reproducibility tooling for the ML stack.Work directly with researchers to turn new algorithms into code that actually runs at scale.Who You Might BeStrong in Python and PyTorch. Can read and write C++/CUDA when performance requires it.Experience with distributed training: FSDP, DeepSpeed, Megatron-LM, or custom tensor/pipeline/data parallelism.Systems thinker — you reason about networking, memory layouts, and failure modes upfront.Comfortable with Linux, Docker/Kubernetes, job schedulers, bare-metal and cloud GPU setups.Enough ML fundamentals (transformers, diffusion, optimization) to debug a training run end-to-end and hold your own with researchers.What We OfferTop-of-market compensation.A deeply technical culture where bold, frontier ideas are debated, stress-tested, and built.Paid travel to top ML conferences.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In