99brightminds

DevOps Engineer - AI Products

Posted: 4 days ago

Job Description

About the role We’re hiring a Mid–Senior DevOps Engineer with strong MLOps/LLMOps depth. You’ll own the infrastructure that powers data pipelines, LLM fine-tuning and distillation, low-latency inference, and secure multi-tenant SaaS. You’ll blend classical DevOps (Kubernetes, IaC, CI/CD, observability, security) with modern AI/ML tooling (MLflow/W&B, pgvector, vLLM/TGI, LoRA/QLoRA, evaluation/guardrails). What you’ll do Platform & IaC Design and operate cloud-native, multi-environment Kubernetes platforms (HPA/KEDA, autoscaling, node pools, GPU scheduling). Implement IaC with Terraform (or Pulumi), modular, versioned, and policy-guarded (OPA/Gatekeeper/Kyverno). Build secure networking (ingress/egress, service mesh/ISTIO or Linkerd, WAF, Cloudflare, TLS, mTLS). CI/CD & Supply Chain Own CI/CD (GitHub Actions preferred): build/test/release pipelines, blue-green/canary, feature-flag rollouts. Enforce supply-chain security (SBOMs, image signing with Cosign/Sigstore, Trivy/Falco, SLSA-aligned controls). Data & MLOps Stand up reproducible ML pipelines (Airflow/Prefect, DVC/LakeFS, MLflow/W&B) with artifact/version tracking. Orchestrate feature stores (Feast) and vector indexes (pgvector on Postgres; Qdrant/Weaviate/Pinecone when needed).  LLMOps & Inference Build/operate LLM inference stacks (vLLM or TGI) with token-streaming, batching, and caching for cost/latency SLAs. Manage provider abstraction/failover (OpenAI/Azure OpenAI/Anthropic + open-weights via Ollama/llama.cpp). Operate retrieval pipelines (RAG): ingestion, chunking, embedding jobs, re-ranking, and eval telemetry. Training, Fine-Tuning & Distillation Run fine-tuning at scale (LoRA/QLoRA, PEFT, DeepSpeed/FSDP), GPU fleet ops (CUDA, NCCL, MIG, node affinity). Execute model compression (quantization: GPTQ/AWQ/GGUF) and distillation to smaller student models for edge/low-cost serving. Safety, Evals & Governance Implement evaluation harnesses (golden sets, regression checks, RAGAS/Promptfoo/Langfuse Evals). Add guardrails (PII redaction, moderation, allow/deny tools, timeouts/retries/rate-limits). Support compliance-by-design (secrets/Vault, least-privilege RBAC, audit logging, data residency). Observability & FinOps End-to-end telemetry (OpenTelemetry, Prometheus/Grafana, Loki/ELK, Jaeger); SLOs, error budgets. Track model/infra cost, token usage, and latency; drive optimizations (autoscaling, right-sizing, caching). Core domains of expertise DevOps: Kubernetes, containers, Helm/Kustomize, service mesh, networking, IAM/RBAC, secrets management (Vault/SM). IaC & Policy: Terraform (or Pulumi), OPA/Kyverno, drift detection, reusable modules. CI/CD: GitHub Actions (or GitLab CI), artifact registries, progressive delivery (ArgoCD/Argo Rollouts/Flux). Data/MLOps: Airflow/Prefect, MLflow/W&B, DVC/LakeFS, Feast, Kafka (nice), object storage (S3/GCS). LLMOps: vLLM/TGI, pgvector, embeddings, RAG patterns, prompt/version management, provider fallbacks. Training/Optimization: LoRA/QLoRA, PEFT, DeepSpeed/FSDP, quantization (GPTQ/AWQ/GGUF), distillation pipelines. Observability/Security: OTel, Prometheus/Grafana, Sentry, Sigstore/Cosign, Trivy/Falco, SSO/OIDC. Tech you’ll use Cloud: AWS/GCP/Azure (pick two), Kubernetes, EKS/GKE/AKS; Cloudflare/WAF, CDN, DNS. Storage/Data: Postgres + pgvector, Redis, S3/GCS; (nice) Qdrant/Weaviate/Pinecone. AI/LLM: OpenAI/Anthropic SDKs, Azure OpenAI; vLLM, TGI, Ollama/llama.cpp; LangChain/LangGraph (nice). Pipelines: Terraform, GitHub Actions, ArgoCD/Rollouts, Helm/Kustomize. Obs/Sec: OpenTelemetry, Prometheus/Grafana, Loki/ELK, Jaeger; Vault, OPA/Kyverno, Cosign, Trivy.What success looks like in 90 days Ship a production-ready LLM inference stack on K8s with autoscaling and token-level telemetry. Land a fine-tuning or distillation workflow (data → train → evaluate → package → deploy) with CI/CD and rollback. Reduce p95 latency and cost of at least one AI feature by ≥25% via batching/caching/right-sizing. Establish baseline SLOs (availability, latency, quality) and dashboards/alerts tied to them. You’ll excel here if you have Bias for automation and simplicity: fewer moving parts, more leverage. Clear mental model of when to use LLMs vs. deterministic pipelines; treat prompts/checkpoints as first-class artifacts. Strong security habits (secrets, least privilege, auditability) and operational empathy for developers and data teams. Crisp, async communication and ownership from design to measurable outcomes. Minimum qualifications 4–8+ years in DevOps/SRE/Platform roles, including Kubernetes in production. Strong Terraform (or Pulumi), GitHub Actions (or equivalent), and container runtime knowledge. Hands-on MLOps experience: MLflow/W&B, data/versioning, and at least one pipeline orchestrator. Practical LLMOps: deployed inference with vLLM/TGI or provider SDKs at scale; basics of embeddings/RAG. Nice to have GPU fleet ops (A100/L4/T4), MIG, FSDP/DeepSpeed; LoRA/QLoRA fine-tuning and distillation experience. Vector DB ops beyond pgvector (Qdrant/Weaviate/Pinecone), reranking (ColBERT/Cohere Rerank). Policy-as-code (OPA/Kyverno), Sigstore/Cosign, SLSA practices; SOC 2 readiness. Supabase/Postgres multi-tenant patterns; Kong/Traefik/Nginx; Cloudflare. Experience with cost controls/FinOps for model and infra spend. 

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In