Job Description

Role OverviewOwn the design, fine-tuning, optimization, and production deployment of large language models (LLMs) for domain-specific use cases. You will build high-performance RAG systems, optimize prompts/agents, operate inference at scale, and champion engineering best practices while driving research and innovation.Key ResponsibilitiesLLM Engineering: Design, fine-tune, and optimize models such as GPT, Claude, Gemini, LLaMA, and Falcon for domain-specific applications.RAG Systems: Build and operate retrieval-augmented generation pipelines (ingestion, chunking, embedding, indexing, retrieval, re-ranking) using vector databases (FAISS, Pinecone, Weaviate, etc.).Prompt/Agent Optimization: Develop prompt templates, chains, and agents with LangChain/LlamaIndex; implement guardrails, tool-use, and memory.Model Deployment (LLMOps): Implement, monitor, and scale inference endpoints with MLflow, Docker, and Kubernetes; manage versioning/registry and safe rollouts (blue-green/canary).Performance Optimization: Evaluate and continuously improve accuracy, latency, and cost (batching, caching/KV-cache, quantization, speculative decoding).Collaboration & Mentoring: Review code, set best practices for AI software engineering, and mentor junior engineers.Research & Innovation: Track advances in LLMs, multimodal AI, and open source; lead PoCs, benchmarking, and knowledge sharing.Required QualificationsEducation: Bachelor’s or Master’s in Computer Science, Artificial Intelligence, or related field (PhD preferred).Experience:5+ years in machine learning/NLP.2+ years working directly with LLMs or GenAI applications.Technical Skills:Proficiency in Python and ML frameworks (PyTorch/TensorFlow) and Hugging Face Transformers.Hands-on with LangChain, LlamaIndex, or SDKs for OpenAI/Anthropic/Cohere/Gemini.Strong understanding of embeddings, tokenization, and vector search/retrieval.Familiarity with MLOps, CI/CD, and cloud (AWS/Azure/GCP); containerization with Docker/Kubernetes.Experience integrating AI APIs (OpenAI, Anthropic, Cohere, Google Gemini).Soft Skills: Excellent problem-solving and communication; comfortable leading projects and mentoring teammates.Preferred/BonusExperience with model distillation and fine-tuning open-source LLMs (LoRA/QLoRA, PEFT).Exposure to multimodal AI (text + image + audio/voice), TTS/ASR, VLMs.Familiarity with AI safety, bias/fairness, privacy, and governance/compliance frameworks.Cost/performance tuning: quantization (INT8/INT4), speculative decoding, throughput optimization.Success Metrics (KPIs)Model quality (task-specific metrics: accuracy/recall, hallucination rate, BLEU/ROUGE/WER as applicable).System performance & cost (P95 latency, throughput, cost per request).Reliability (SLO/SLA, error rates) and delivery velocity (lead time, deployment frequency).Knowledge impact (PoC → production conversions, docs/best practices, mentoring outcomes).Tools & EnvironmentModel/Serving: HF Transformers, vLLM/TensorRT-LLM, Triton, Ray/Modal (as applicable).Vector/RAG: FAISS, Pinecone, Weaviate, Milvus; re-ranking (e.g., Cross-Encoder/ColBERT).Ops/Observability: MLflow, Prometheus/Grafana, OpenTelemetry, Weights & Biases.Data: Airflow/Prefect, dbt, Spark (as needed).Benefits (customizable)Competitive compensation with performance/PoC success bonuses.Learning budget/certifications and conference attendance.Dedicated GPU credits/resources for R&D; open-source-friendly environment.Comprehensive insurance and flexible work arrangements.

Job Application Tips

Tailor your resume to highlight relevant experience for this position

Write a compelling cover letter that addresses the specific requirements

Research the company culture and values before applying

Prepare examples of your work that demonstrate your skills

Follow up on your application after a reasonable time period

LLM Engineer / GenAI Engineer (RAG & LLMOps)

Job Description

Job Application Tips

You May Also Be Interested In

Mendix Developer, Makro Head Office

Senior Physical Design Engineer

HCMC - AI DEVELOPER (JAPANESE REQUIRED)

DevOps Azure

Chassis Electronics Lead Engineer

[Remote - Vietnam] Mobile Developer (Flutter + Golang)

Job Description

Job Application Tips

Share this job

You May Also Be Interested In

Mendix Developer, Makro Head Office

Senior Physical Design Engineer

HCMC - AI DEVELOPER (JAPANESE REQUIRED)

DevOps Azure

Chassis Electronics Lead Engineer

[Remote - Vietnam] Mobile Developer (Flutter + Golang)

Apply for this Job

This Job Has Expired