DriveNets

Team Leader- NPU Communication

Posted: just now

Job Description

DescriptionLocation: Tel Aviv#HybridDriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions.Job SummaryWe are seeking an experienced technical leader to head our collective communication library development team. This role involves leading a team of engineers in developing high-performance collective communication implementations for multi-NPU and multi-node AI workloads.Key ResponsibilitiesLead the design and development of collective communication primitives (All-Reduce, All-to-All, Gather/Scatter and etc)Architect scalable communication protocols for multi-NPU and multi-node systemsOptimize communication performance for NPU architecturesProvide technical leadership to the team members in NPU programming, distributed systems, and communication protocolsWork with a success-driven worldwide international team (Network, NPU, QA, AI, DL/ML Framework)Define project milestones, deliverables, and technical roadmapsEnsure compatibility with major AI frameworks (PyTorch, TensorFlow, JAX) RequirementsRequired QualificationsBSc/MSc in computer science/computer engineering or equivalent8+ years of experience in systems programming and distributed computing5+ years of leadership experience managing technical teamsExpert-level C/C++ programming with focus on performance optimizationExperience with NPU programming (Triton / CUDA / HIP / OpenCL)Deep understanding of distributed systems, communication protocols, and network programmingExperience with DL/ML frameworks (PyTorch, TensorFlow) and distributed training / inferencingExperience with performance profiling and optimization toolsStrong communication and interpersonal skillsPreferred QualificationsExperience with NPU communication library developmentContributions to open-source projects (PyTorch, TensorFlow, communication libraries)Familiarity with containerization and orchestrationInteroperability experience with partners, vendors and external teams

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In