Vast.ai

Data Engineer — Analytics Infrastructure (Foundational Hire)

Posted: 24 minutes ago

Job Description

About UsVision: To make life substrate independent through Vast Artificial Intelligence Mission: To organize, optimize, and orient the world's computationVast.ai’s cloud powers AI projects and businesses all over the world. We are democratizing and decentralizing AI computing—reshaping our future for the benefit of humanity.We are a growing and highly motivated team dedicated to an ambitious technical plan. Our structure is flat, our ambitions are out‑sized, and leadership is earned by shipping excellence.We seek a data engineer with strong intrinsic drive, a true passion for uncovering insights from data, and a mix of analytical, programming, and communication skills.LOCATION: On‑site at our office in Westwood, Los AngelesTYPE: Full‑time On‑site Immediate start preferred REPORTS TO: Operations (partnering closely with Engineering)About the RoleThis is a foundational role: you’ll own the 0→1 build of our data platform—ingestion, modeling, governance, and self‑serve analytics in QuickSight—for Marketing, Sales, Accounting, and leadership. We’re hiring a Data Engineer to build and own the end‑to‑end data platform at Vast.ai.This is a hands‑on role for a builder who can move fast: designing schemas, implementing ELT/ETL, hardening data quality, and enabling secure, governed access to data across the company.Full-timeOn-site at our LA officeWhat You’ll DoOwn the data pipeline: design, build, and operate batch/streaming ingestion from product, billing, CRM, support, and marketing/ad platforms into a central warehouse.Model the data: create clean, well‑documented staging and business marts (dimensional/star schemas) that map to the needs of Marketing, Sales, Accounting/Finance, and Operations.Enable: publish certified datasets with row‑/column‑level security, manage refresh SLAs, and make it easy for teams to self‑serve.Collaborate cross‑functionally: intake requirements, translate them into data contracts and models, and partner with Engineering on event/telemetry capture.Document & scale: maintain clear docs, lineage, and a pragmatic data catalog so others can discover and trust the data.Tech StackOur current environment includes PostgreSQL, Python, SQL, and QuickSight. You’ll lead the next step‑function in maturity using a pragmatic, AWS‑centric stack such as:AWS: S3, Glue/Athena or Redshift, Lambda/Step Functions, IAM/KMSOrchestration & Modeling: Airflow or Dagster; dbt (or equivalent SQL modeling)Data Quality & Observability: built‑in checks or tools like Great ExpectationsSource Connectivity: APIs/webhooks; optionally Airbyte/Fivetran for managed connectorsVersioning/Infra: Git/GitHub Actions; Terraform (nice to have)Marketing attribution: Segment io, Posthog, others(We’re flexible on exact tools—strong fundamentals matter most.)QualificationsMust‑have3+ years (typically 3–6) in a Data Engineering role building production ELT/ETL on a cloud platform (AWS strongly preferred).Expert SQL and solid Python for data processing/automation.Proven experience designing data models (staging, marts, star schemas) and standing up a warehouse/lakehouse.Orchestration, scheduling, and operational ownership (SLAs, alerting, runbooks).Experience enabling a BI layer (ideally QuickSight) with secure, governed datasets.Strong collaboration and communication; able to gather requirements from non‑technical stakeholders and translate to data contracts.Nice‑to‑haveMarketing/Sales/RevOps data (CRM, ads, attribution), Accounting/Finance integrations, or product telemetry/event pipelines.Stream processing (Kafka/Kinesis), CDC, or near‑real‑time ingestion.Data privacy/security best practices (e.g., CPRA), partitioning/performance tuning, and cost management on AWS.90‑Day OutcomesInventory & architecture: clear map of sources, proposed target architecture, and a prioritized backlog aligned with Ops/Engineering.First pipelines live: automated ingestion + core staging tables with data quality checks and alerts.Business marts: at least two curated domains live (e.g., Marketing & Sales) powering certified QuickSight datasets for stakeholders.Runbook & docs: onboarding‑ready documentation, lineage, and incident playbooks.Interview Process (≈ 1 week)15 min — Initial screening (virtual)45 min — Architecture deep‑dive into our data environment and target platform (virtual)2 hours — On‑site practical: build/modify a small ETL + modeling exercise; discuss trade‑offs, quality, and opsAnnual Salary Range$140,000 – $190,000 + equity + benefitsBenefitsComprehensive health, dental, vision, and life insurance401(k) with company match Meaningful early-stage equityOnsite meals, snacks, and close collaboration with founders/tech leadersAmbitious, fast-paced startup culture where initiative is rewarded

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In