Job Description

Position Overview: We are looking for a highly skilled engineer to design and optimize the GPU/AI infrastructure behind our Perception & Planning stack, covering object detection, segmentation, depth estimation, and trajectory planning. This role is technical: you will push the limits of GPU efficiency, distributed training, and real-time inference, turning state-of-the-art research into production-ready systems.ResponsibilitiesArchitect and optimize large-scale training pipelines with advanced techniques (FSDP/ZeRO-DP, tensor/pipeline parallelism, activation checkpointing, CPU/NVMe offloading, FlashAttention, mixed precision/bfloat16, comm/comp overlap).Profile end-to-end pipelines (data → GPU kernels → inference) and eliminate bottlenecks using tools such as torch.profiler, Nsight Systems, Nsight Compute, TensorBoard Profiler, and low-level debuggers (perf, NVTX/NCCL tracing).Implement performance-critical components in CUDA/C++ (custom kernels, TensorRT plugins, efficient memory layouts).Tune GPU utilization, memory hierarchy (HBM, L2, shared), and communication efficiency (PCIe/NVLink/NCCL) to maximize throughput and minimize latency.Drive model conversion and deployment workflows (ONNX/TensorRT, mixed precision, quantization) with strict real-time FPS requirements.Lead distributed training scaling and orchestration (multi-node DDP/FSDP, NCCL tuning, experiment automation).Build reliability and observability into systems with low-overhead logging, metrics, and health monitoring.Maintain benchmarks, profiling reports, and best-practice documentation to guide the team.QualificationsMaster’s or Ph.D. in Computer Science, Electrical/Computer Engineering, or related technical discipline.Strong foundation in ML/CV with proven experience in GPU/AI infrastructure and performance optimization.Expert-level coding in C++ and Python; ability to implement, debug, and optimize CUDA kernels.Hands-on experience with GPU profiling and tuning, with a track record of improving throughput, utilization, and memory efficiency.Familiarity with ONNX, TensorRT, NCCL, and other performance-oriented frameworks and libraries.Demonstrated success deploying real-time inference systems on GPUs/edge devices.Strong problem-solving, debugging, and performance-analysis skills; thrives in low-level, high-performance system challenges.

Engineer/Senior Engineer, AI Infrastructure (Perception & Planning)

Job Description

Job Application Tips

Related Jobs

Senior Analyst, Southeast Asia

Client Growth Manager - INSEA

Intern, Data Engineering

Principal Manager, Incentives Management (LIMD)

Job Description

Job Application Tips

Share this job

Apply for this Job

Related Jobs

Senior Analyst, Southeast Asia

Client Growth Manager - INSEA

Intern, Data Engineering

Principal Manager, Incentives Management (LIMD)