OpenGeoHub Foundation

Internship: Multivariate Machine Learning for Digital Soil Mapping

Posted: 4 minutes ago

Job Description

Hours: Full time (38 hours per week)Location: On-site in DoorwerthWorking Hours: to be set between 09:00 and 18:30Internship allowance: 500 EUR/month (with an additional travel allowance if this is not already covered by the student card) for a full-time positionEmployer: Stichting OpenGeoHubWe are looking for an intern to assist the Digital Soil Mapping (DSM) team at OpenGeoHub Foundation in exploring the role of multivariate machine learning (ML) approaches in soil prediction modeling. Are you passionate about geospatial data science and machine learning applications for environmental research? If your answer is yes, this might be the internship for you!The internship can start any time between November 2025 and January 2026, for a flexible duration between 4-6 months. The internship allowance is 500 EUR/month (with an additional travel allowance if this is not already covered by the student card) for a full-time position (38 hours/week).Project BackgroundOpenGeoHub Foundation is a non-profit organization that promotes free and open geodata and facilitates open science development. OpenGeoHub is home to one of the only full cloud-free open-access Landsat archives in Europe and provides open geospatial products that support global initiatives such as the European Green Deal, UNCCD, and Land and Carbon Lab.This internship focuses on testing, comparing, and developing ML models that could contribute to the next generation of pan-European high-resolution soil maps. You will have access to OpenGeoHub’s extensive geodata archive and receive technical support from our DSM experts.Digital Soil Mapping (DSM) uses remote sensing, terrain modeling, and machine learning to predict soil properties across large regions. Traditional DSM models typically predict each soil property independently. However, multivariate machine learning can capture correlations between soil attributes (e.g., organic carbon, pH, clay content), potentially improving spatial consistency and predictive accuracy.At OpenGeoHub, one flagship DSM product is the SoilHealthDataCube (SHDC) — an EU-wide, 30 m resolution data stack covering major soil properties from 2000 to 2024+, as illustrated below. (Curious? Visit EcoDataCube.eu)Time series of high resolution Soil Organic Carbon maps of 30 resolution at different depth levelAbout Your RoleAlthough SHDC is already highly advanced, there’s room for innovation. Currently, each soil property is modeled separately (univariate approach), which can lead to mismatches among properties in the resulting maps — partly due to differing data availability, but also because independent models cannot exploit inter-property relationships.We Therefore Would Like To Explore Whether Multivariate Random Forest (RF) Models Can Help Overcome These Inconsistencies. Your Main Tasks Would BeTask 1: Investigate which soil properties benefit most from multivariate modeling — identify promising combinations that improve both predictive accuracy and map coherence.Task 2: Compare the performance of multivariate vs. univariate Random Forests in predicting soil properties.Task 3 (Optional, if time allows and you are interested): Apply the models to selected test regions across Europe to assess spatial differences in resulting soil maps.What we expect from youMust-haveGood command of English – OpenGeoHub is an international organization collaborating Enrolled at a university (in geoinformation, soil science, environmental science, data science, geoscience, remote sensing, or related fields)A bit programming experience in PythonBasic understanding of GIS and remote sensing, and familiarity with QGIS (basic operations)Availability for full-time work (38 hours/week) in our Doorwerth office (on-site at 4 days a week)Nice-to-haveBasic knowledge of machine learning methods and conceptsFamiliarity with Python libraries such as scikit-learn, pandas, and numpySome soil science knowledge (would be very great)Enthusiasm for open science and environmental data applicationsWhat we can give youHands-on experience with large-scale environmental ML workflows (mainly Python).Working with continental-scale geospatial datasets (e.g., SHDC, remote sensing, terrain data).Insights into modern DSM pipelines and reproducible research practices at OpenGeoHub.Many parties are cooked by BBQ masters (at least 1 hopefully).ApplicationInterested? Send emails to the contact person Xuemeng Tian (xuemeng.tian@opengeohub.org / xuemeng.tian@wur.nl), including:A curriculum vitae, including references with contact details and your availability;Proof of enrollment is valid for the entire duration of the internship;Your institutional internship liaison (contact and brief procedure).This work will be mainly supervised by Xuemeng, with support from the DSM team at OpenGeoHub Foundation.Read Selection Process

Job Application Tips

  • Tailor your resume to highlight relevant experience for this position
  • Write a compelling cover letter that addresses the specific requirements
  • Research the company culture and values before applying
  • Prepare examples of your work that demonstrate your skills
  • Follow up on your application after a reasonable time period

You May Also Be Interested In