Job Description

About CodemCodem is a technology services company specializing in eCommerce, SAP, custom applications, cloud infrastructure, DevOps, and systems integration. We work with global enterprises to build and modernize scalable platforms. We are now seeking AI.ML Develper.ONLY APPLY IF YOU ARE AN IMMEDIATE JOINER. About the role We’re looking for an AI Developer to build and ship LLM-powered features (chat/search/agents, RAG pipelines, automations). You’ll work closely with product and data teams to turn messy real-world data into reliable, low-latency experiences.ResponsibilitiesRAG pipelines: Ingest, chunk, embed, and index documents; design retrieval strategies (hybrid/BM25+embedding, metadata filtering, reranking).App logic: Build APIs/serving layers around LLMs (prompt templates, tool/function calling, agents, streaming).Vector DB ops: Create/maintain indexes, upserts, namespace/tenant design, TTLs, migrations, and recall/latency tuning.Evaluation & quality: Set up offline/online evals (accuracy, grounding, toxicity, hallucination rate), A/B tests, and feedback loops.Safety & reliability: Implement guardrails, prompt-injection defenses, PII redaction, rate limiting, retries, and fallbacks.Cost & perf: Token budgeting, caching (prompt/result/embedding), batching, and observability for latency & spend.Data pipelines: Build ETL for PDFs/HTML/docs, enrichment, and scheduled syncs from SaaS/data lakes.DevOps/MLOps: CI/CD, environment config, secrets management, dataset/version control, and monitoring.Must-have qualifications3+ years software experience (ideally Python) delivering production code.Hands-on with LLM APIs (OpenAI/Azure OpenAI, Anthropic, or local LLMs like Llama) including prompting, tools/function calling, and streaming.Practical RAG experience using vector databases (e.g., Pinecone, Weaviate, FAISS, pgvector) and embedding models.Experience with LangChain or LlamaIndex (or equivalent in-house orchestration).Strong with web APIs (FastAPI/Flask/Node), Git, testing, and debugging.Solid understanding of security & privacy basics (PII handling, secrets, auth).Nice to haveReranking (Cohere/TEI), hybrid search (BM25 + embeddings), or Elasticsearch/OpenSearch.Eval frameworks (Ragas, TruLens) and telemetry (Langfuse, OpenTelemetry).Workflow/orchestration (Celery/Temporal/Airflow) and message queues (SQS/Kafka).Cloud: AWS (Bedrock, Lambda), GCP (Vertex AI), Azure (AOAI), Docker; basic Terraform.Frontend collaboration (React) for chat UIs, streaming tokens, and citations.Fine-tuning/LoRA, prompt caching, distillation, or model hosting experience.Tools you might use herePython (FastAPI), TypeScript/Node (optional), LangChain/LlamaIndexVector DBs: Pinecone, Weaviate, pgvector/FAISSLLMs/Embeddings: GPT-4/4o/mini, Claude, Llama, instructor/sentence-transformersInfra: AWS/GCP/Azure, Docker, GitHub Actions, Terraform (basic)Obs & Eval: Langfuse, Ragas/TruLens, Prometheus/GrafanaSuccess in 3–6 monthsShip a production RAG feature with measurable uplift in answer quality.Reduce latency/cost via caching/batching and better retrieval configs.Establish evaluation + feedback loop with clear QA dashboards and guardrails.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In