Maincode

Data Scientist

Posted: 9 minutes ago

Job Description

OverviewMaincode is building Australian-made AI models from the ground up. We train foundation models from scratch, design new reasoning architectures, and deploy them on our own state-of-the-art GPU clusters. Our data and infrastructure are entirely homegrown, from curation to large-scale training, to ensure independence, transparency, and excellence in model performance.We’re looking for a Data Scientist who thrives at the intersection of data engineering, machine learning, and creative experimentation. You’ll help shape the datasets and data systems that power the next generation of models.This role bridges deep technical execution (pipelines, validation, distributed data processing) with the curiosity and innovation needed to push data science into new territory. You’ll work closely with researchers and engineers to make data the backbone of Australia’s AI capability.What You’ll DoEngineer and innovate with data: Design scalable data workflows that handle massive, heterogeneous datasets (text, code, multimodal, structured).Prototype novel data science approaches: Apply advanced techniques for dataset synthesis, filtering, augmentation, and generation to improve downstream model reasoning.Build production-grade pipelines: Automate ingestion, cleaning, transformation, and validation of large-scale data for model training.Develop intelligent metrics: Develop tools and metrics for assessing dataset quality, diversity, and performance impact.Collaborate across disciplines: Work with AI researchers to shape training corpora aligned with emerging model architectures and objectives.Continuously refine systems: Improve how data flows through the entire training stack, from curation to evaluation.Champion data quality and ethics: Help define standards for responsible, high-integrity data use in AI.Who You AreStrong foundation in Python, data processing frameworks (Pandas, PySpark, Dask, or Ray), and large-scale data systems.Skilled in data analysis, feature engineering, and statistical reasoning.Experienced working with multi-terabyte or distributed datasets in production environments.Familiar with or curious about deep learning, data-centric AI, and model training pipelines.Eager to experiment: combining scientific rigour with creative problem-solving.Motivated to help shape Australia’s independent AI capability through world-class data infrastructure.Why MaincodeMaincode is a small, highly technical team operating at the frontier of AI research and infrastructure. We build, train, and deploy foundation models from scratch - not fine-tune existing ones - and the data you work on will directly shape model behavior at scale.You’ll Join a Team ThatTreats data as a core differentiator of AI progress.Values experimentation and scientific precision in equal measure.Builds clean, transparent, and scalable systems from first principles.Aims to make Australia a leader in independent AI innovation.

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In