The dataset is imbalanced at almost 1:5 ratio, therefore the defaults are the least frequent class and the most important one.
Impute NaN and Drop Duplicates
Correlation Analysis
The variable of interest `loan_status` shows a degree of correlation with the loan's interest rate `loan_int_rate` and with the percentage of the income that the rent represents.
One Hot Encoding
Fit the XGBOOST Model in a Two-Folded Cross-Validated Grid Search
Top 10 Performing Models
More Depth seems to have a positive impact in the model as 56% of the 100 best performing ones have the maximum depth in the grid, while less `min child weight` seems to have produced the better results with 61% of the best performing models having 1.
Evaluate Predictions
The default class (most important) has a F1 Score of 82%, showing a decent result for a production model.