Knowledge Gate Group

Lead Data Engineer

Posted: 6 minutes ago

Job Description

Why we're hiring a Lead Data EngineerWe're building the expert intelligence layer for scientific research: a knowledge graph that connects the world to leading experts based on publications & clinical trials in precise ontologies. You'll design pipelines that ingest millions of life-science records, shaping a graph of how scientific knowledge is modelled, enriched, & served.This is true green-fields work. Your decisions will lay the data foundations for our entire expert intelligence platform.What You'll DoYou will be working at the intersection of science, data engineering & AI to build expert intelligence.Own data end-to-end, design & run data pipelines turning millions of scientific records into a knowledge graph.Implement precision entity resolution & enrichment, disambiguate & enrich experts from noisy data sources.Utilise LLM workflows where it makes sense, for entity extraction, relationship inference & quality validationDevelop vector embeddings & semantic search capabilities to power expert discovery & similarity matching.Model life-science entities & relationships, ontologies, author networks, publication & clinical trial metadata.Build graph & vector data access, performant, accessible, reliable, observable & testable data access.Move fast & ship value incrementally, done-and-iterating beats perfect-and-pending.Radiate intent & document your thinking openly, collaborating async-first in a hybrid environmentLead when you're the expert, follow when someone else is, challenging assumptions when necessaryUse AI as a daily force multiplier across coding, schema design, debugging, optimisation & validation.Destroy your colleagues at Geoguessr (optional but strongly encouraged).What You'll NeedTechnical SkillsGraph Databases: Neo4j, ArangoDB, Neptune; schema design, relationship modelling, query optimisation.Python Data Engineering: ETL development; pandas/polars; distributed processing with Spark or Dask.Entity Resolution: Deduplication, merging, enrichment across heterogeneous scientific data sources.AI-Assisted Data Extraction: LLM entity extraction, schema generation & quality validation.Vector Search: Experience with Pinecone, FAISS, Qdrant, or Weaviate; embeddings, hybrid retrieval.Workflow Orchestration: Robust, observable pipelines using Airflow or Dagster.Data Formats & Standards: Parquet, JSONL, RDF/Turtle; selecting formats for graph & semantic use cases.Embedding Models: Understanding of HuggingFace/OpenAI models, dimensionality tradeoffs & cost.Executive SkillsOwnership mindset: Treat data & schemas as products powering multiple domains.Strategic evaluation: Choose tech aligned with our scale, latency expectations, & roadmap needs.Process engineering: Build reliable, repeatable & maintainable workflows.Cross-functional communication: Bridge product engineers & scientific domain teams.Comfort with scientific data realities: Deep rabbit holes of sprawling complexity.Strong BonusLife Sciences familiarity: Publication, clinical trial, institutional, ontologies (MeSH, SNOMED, Gene Ontology).Hands-on with scientific datasets: OpenAlex, PubMed/MEDLINE, ORCID, Semantic Scholar, ClinicalTrials.govWhy You Might Hate It HereYou want predictability & routine.You dislike documenting or sharing your thinking openly.You see AI as a threat rather than an amplifier.You're looking for a "safe" corporate environment - we're not that.We mean this sincerely: if those points do not work, you'll be happier elsewhere.Why You'll Love Working HereReal Autonomy: You'll own outcomes, not tickets. This is your domain - you'll define data strategy.Greenfield Opportunity: Build the from scratch. Your decisions shape our data capabilities for years.Mission That Matters: Your work directly enables research - accelerating scientific breakthroughs.AI-First Culture: We use AI as a creative & operational partner across every function.High Impact: Every domain depends on what you build. Expert coverage directly drives our success.Success Metrics (6-month target)Expert Coverage: Knowledge graph spans 1+ million experts with rich profile data & relationships.AI & Platform Enablement: AI & other domains consuming knowledge graph insights.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In