The data mix dates, categorical fields, and numeric features (amount, rate, term, income, province, product type, disbursement channel, and more). Missing values appear in both train and test; the test set is crafted with somewhat more missingness to reward robust preprocessing and generalisation. You have from May 02 to May 09, 2026 to register for the challenge. Please note that the dataset will only become available from the May 09, 2026.
Train.csv contains 38,932 loan records, including the binary target defaulted. Test.csv has 12,977 rows with the same feature columns but no defaulted— you must predict it for every ID. Each row is one loan with identifiers, key dates (approval, disbursement, first payment, maturity), amount in USD, interest rate and term, payment frequency and purpose, client and household fileds, employment and income, obligations and collateral, disbursement channel, and province. Both splits include missing values; the test stresses robust handling. Data are tabular and suitable for classical ML, gradient-boosted trees, and other supervised learners.