FPT Software

LLM Engineer / GenAI Engineer (RAG & LLMOps)

Posted: 24 minutes ago

Job Description

Role OverviewOwn the design, fine-tuning, optimization, and production deployment of large language models (LLMs) for domain-specific use cases. You will build high-performance RAG systems, optimize prompts/agents, operate inference at scale, and champion engineering best practices while driving research and innovation.Key ResponsibilitiesLLM Engineering: Design, fine-tune, and optimize models such as GPT, Claude, Gemini, LLaMA, and Falcon for domain-specific applications.RAG Systems: Build and operate retrieval-augmented generation pipelines (ingestion, chunking, embedding, indexing, retrieval, re-ranking) using vector databases (FAISS, Pinecone, Weaviate, etc.).Prompt/Agent Optimization: Develop prompt templates, chains, and agents with LangChain/LlamaIndex; implement guardrails, tool-use, and memory.Model Deployment (LLMOps): Implement, monitor, and scale inference endpoints with MLflow, Docker, and Kubernetes; manage versioning/registry and safe rollouts (blue-green/canary).Performance Optimization: Evaluate and continuously improve accuracy, latency, and cost (batching, caching/KV-cache, quantization, speculative decoding).Collaboration & Mentoring: Review code, set best practices for AI software engineering, and mentor junior engineers.Research & Innovation: Track advances in LLMs, multimodal AI, and open source; lead PoCs, benchmarking, and knowledge sharing.Required QualificationsEducation: Bachelor’s or Master’s in Computer Science, Artificial Intelligence, or related field (PhD preferred).Experience:5+ years in machine learning/NLP.2+ years working directly with LLMs or GenAI applications.Technical Skills:Proficiency in Python and ML frameworks (PyTorch/TensorFlow) and Hugging Face Transformers.Hands-on with LangChain, LlamaIndex, or SDKs for OpenAI/Anthropic/Cohere/Gemini.Strong understanding of embeddings, tokenization, and vector search/retrieval.Familiarity with MLOps, CI/CD, and cloud (AWS/Azure/GCP); containerization with Docker/Kubernetes.Experience integrating AI APIs (OpenAI, Anthropic, Cohere, Google Gemini).Soft Skills: Excellent problem-solving and communication; comfortable leading projects and mentoring teammates.Preferred/BonusExperience with model distillation and fine-tuning open-source LLMs (LoRA/QLoRA, PEFT).Exposure to multimodal AI (text + image + audio/voice), TTS/ASR, VLMs.Familiarity with AI safety, bias/fairness, privacy, and governance/compliance frameworks.Cost/performance tuning: quantization (INT8/INT4), speculative decoding, throughput optimization.Success Metrics (KPIs)Model quality (task-specific metrics: accuracy/recall, hallucination rate, BLEU/ROUGE/WER as applicable).System performance & cost (P95 latency, throughput, cost per request).Reliability (SLO/SLA, error rates) and delivery velocity (lead time, deployment frequency).Knowledge impact (PoC → production conversions, docs/best practices, mentoring outcomes).Tools & EnvironmentModel/Serving: HF Transformers, vLLM/TensorRT-LLM, Triton, Ray/Modal (as applicable).Vector/RAG: FAISS, Pinecone, Weaviate, Milvus; re-ranking (e.g., Cross-Encoder/ColBERT).Ops/Observability: MLflow, Prometheus/Grafana, OpenTelemetry, Weights & Biases.Data: Airflow/Prefect, dbt, Spark (as needed).Benefits (customizable)Competitive compensation with performance/PoC success bonuses.Learning budget/certifications and conference attendance.Dedicated GPU credits/resources for R&D; open-source-friendly environment.Comprehensive insurance and flexible work arrangements.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In