Primary competition visual

EY Biodiversity Challenge

$3 500 USD
Under code review
Classification
Feature Engineering
Geospatial Data
Geospatial Analysis
1355 joined
557 active
Starti
Mar 27, 26
Closei
May 24, 26
Reveali
May 24, 26
Rule Clarification on Climate Data & Pseudo-Labeling
8 May 2026, 14:41 · 2

Hi organizers,

I would like to clarify a few rule-related points regarding external environmental data usage.

  1. Is it allowed to extract environmental/climate variables (for example TerraClimate raster values) at observation locations, as long as raw latitude/longitude coordinates are NOT used directly as model features?
  2. Are derived ecological features computed from those climate variables allowed?
  3. Is pseudo-labeling using only the provided Test.csv predictions permitted?
  4. Should external climate data be restricted to timestamps strictly before the observation period to avoid possible temporal leakage?

Thank you for the clarification.

Discussion 2 answers
User avatar
meganomaly
Zindi

Thanks for the questions - happy to clarify.

  • Using latitude/longitude to extract TerraClimate (or other provided environmental raster) values is allowed. The restriction is on using latitude/longitude themselves (or features derived from them) as model inputs. In other words: ✅ Using coordinates to sample the provided climate rasters ❌ Using coordinates, distances, grid cells, clusters, nearest neighbours, or other spatial encodings as predictive features
  • Derived ecological/climate features computed purely from the extracted climate variables are allowed. For example: climate indices seasonal aggregations water balance variables interactions between TerraClimate variables are all fine, provided they are derived only from the allowed environmental data.
  • Pseudo-labeling is permitted only if it uses predictions generated from compliant models and the provided competition data. However, participants should avoid approaches that indirectly leak information from the test set or reconstruct labels using external spatial information.
  • Regarding timestamps and temporal leakage: Participants should avoid using information that would not realistically have been available at the time of observation. In practice, using climate summaries that overlap the observation period is acceptable if they come from the provided TerraClimate data, but care should be taken not to introduce future information.

Hope that helps!

12 May 2026, 10:47
Upvotes 2
User avatar
CodeJoe

Definitely!