Job Description

Why we're hiring a Lead Data EngineerWe're building the expert intelligence layer for scientific research: a knowledge graph that connects the world to leading experts based on publications & clinical trials in precise ontologies. You'll design pipelines that ingest millions of life-science records, shaping a graph of how scientific knowledge is modelled, enriched, & served.This is true green-fields work. Your decisions will lay the data foundations for our entire expert intelligence platform.What You'll DoYou will be working at the intersection of science, data engineering & AI to build expert intelligence.Own data end-to-end, design & run data pipelines turning millions of scientific records into a knowledge graph.Implement precision entity resolution & enrichment, disambiguate & enrich experts from noisy data sources.Utilise LLM workflows where it makes sense, for entity extraction, relationship inference & quality validationDevelop vector embeddings & semantic search capabilities to power expert discovery & similarity matching.Model life-science entities & relationships, ontologies, author networks, publication & clinical trial metadata.Build graph & vector data access, performant, accessible, reliable, observable & testable data access.Move fast & ship value incrementally, done-and-iterating beats perfect-and-pending.Radiate intent & document your thinking openly, collaborating async-first in a hybrid environmentLead when you're the expert, follow when someone else is, challenging assumptions when necessaryUse AI as a daily force multiplier across coding, schema design, debugging, optimisation & validation.Destroy your colleagues at Geoguessr (optional but strongly encouraged).What You'll NeedTechnical SkillsGraph Databases: Neo4j, ArangoDB, Neptune; schema design, relationship modelling, query optimisation.Python Data Engineering: ETL development; pandas/polars; distributed processing with Spark or Dask.Entity Resolution: Deduplication, merging, enrichment across heterogeneous scientific data sources.AI-Assisted Data Extraction: LLM entity extraction, schema generation & quality validation.Vector Search: Experience with Pinecone, FAISS, Qdrant, or Weaviate; embeddings, hybrid retrieval.Workflow Orchestration: Robust, observable pipelines using Airflow or Dagster.Data Formats & Standards: Parquet, JSONL, RDF/Turtle; selecting formats for graph & semantic use cases.Embedding Models: Understanding of HuggingFace/OpenAI models, dimensionality tradeoffs & cost.Executive SkillsOwnership mindset: Treat data & schemas as products powering multiple domains.Strategic evaluation: Choose tech aligned with our scale, latency expectations, & roadmap needs.Process engineering: Build reliable, repeatable & maintainable workflows.Cross-functional communication: Bridge product engineers & scientific domain teams.Comfort with scientific data realities: Deep rabbit holes of sprawling complexity.Strong BonusLife Sciences familiarity: Publication, clinical trial, institutional, ontologies (MeSH, SNOMED, Gene Ontology).Hands-on with scientific datasets: OpenAlex, PubMed/MEDLINE, ORCID, Semantic Scholar, ClinicalTrials.govWhy You Might Hate It HereYou want predictability & routine.You dislike documenting or sharing your thinking openly.You see AI as a threat rather than an amplifier.You're looking for a "safe" corporate environment - we're not that.We mean this sincerely: if those points do not work, you'll be happier elsewhere.Why You'll Love Working HereReal Autonomy: You'll own outcomes, not tickets. This is your domain - you'll define data strategy.Greenfield Opportunity: Build the from scratch. Your decisions shape our data capabilities for years.Mission That Matters: Your work directly enables research - accelerating scientific breakthroughs.AI-First Culture: We use AI as a creative & operational partner across every function.High Impact: Every domain depends on what you build. Expert coverage directly drives our success.Success Metrics (6-month target)Expert Coverage: Knowledge graph spans 1+ million experts with rich profile data & relationships.AI & Platform Enablement: AI & other domains consuming knowledge graph insights.

Lead Data Engineer

Job Description

Job Application Tips

You May Also Be Interested In

Softwareudvikler til backend

Solutions Architect (Danish Speaking)

Service Engineering Specialist (Electrical)

PLC Programmer

Automation Engineer

Product Engineer

Job Description

Job Application Tips

Share this job

You May Also Be Interested In

Softwareudvikler til backend

Solutions Architect (Danish Speaking)

Service Engineering Specialist (Electrical)

PLC Programmer

Automation Engineer

Product Engineer

Apply for this Job

This Job Has Expired