Exploración y preparación de datos de préstamo
Explorando los datos de crédito
0
22
59000
1
21
9600
2
25
9600
3
23
65500
4
24
54400
5
21
9900
6
26
77100
7
24
78956
8
24
83000
9
21
10000
Tabulaciones cruzadas y tablas dinámicas
loan_status 0 1 All
loan_intent
DEBTCONSOLIDATION 3722 1490 5212
EDUCATION 5342 1111 6453
HOMEIMPROVEMENT 2664 941 3605
MEDICAL 4450 1621 6071
PERSONAL 4423 1098 5521
VENTURE 4872 847 5719
All 25473 7108 32581
loan_status 0 1 \
loan_grade A B C D E F G A B C
person_home_ownership
MORTGAGE 5219 3729 1934 658 178 36 0 239 324 321
OTHER 23 29 11 9 2 0 0 3 5 6
OWN 860 770 464 264 26 7 0 66 34 31
RENT 3602 4222 2710 554 137 28 1 765 1338 981
loan_status
loan_grade D E F G
person_home_ownership
MORTGAGE 553 161 61 31
OTHER 11 6 2 0
OWN 18 31 8 5
RENT 1559 423 99 27
loan_status 0 1
person_home_ownership
MORTGAGE 0.146504 0.184882
OTHER 0.143784 0.300000
OWN 0.180013 0.297358
RENT 0.144611 0.264859
Encontrar valores atípicos con tablas cruzadas
person_home_ownership MORTGAGE OTHER OWN RENT
loan_status
0 123.0 24.0 31.0 41.0
1 34.0 11.0 17.0 123.0
max min
person_home_ownership MORTGAGE OTHER OWN RENT MORTGAGE OTHER OWN RENT
loan_status
0 38.0 24.0 31.0 41.0 0.0 0.0 0.0 0.0
1 34.0 11.0 17.0 27.0 0.0 0.0 0.0 0.0
Visualización de valores atípicos de crédito
Reemplazo de datos de crédito faltantes
Index(['person_emp_length', 'loan_int_rate'], dtype='object')
person_age person_income person_home_ownership person_emp_length \
105 22 12600 MORTGAGE NaN
222 24 185000 MORTGAGE NaN
379 24 16800 MORTGAGE NaN
407 25 52000 RENT NaN
408 22 17352 MORTGAGE NaN
loan_intent loan_grade loan_amnt loan_int_rate loan_status \
105 PERSONAL A 2000 5.42 1
222 EDUCATION B 35000 12.42 0
379 DEBTCONSOLIDATION A 3900 NaN 1
407 PERSONAL B 24000 10.74 1
408 EDUCATION C 2250 15.27 0
loan_percent_income cb_person_default_on_file cb_person_cred_hist_length
105 0.16 N 4
222 0.19 N 2
379 0.23 N 3
407 0.46 N 2
408 0.13 Y 3
Eliminación de los datos faltantes
3116
Regresión logística para probabilidad de incumplimiento o default
Modelo de regresión logística
{'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 100, 'multi_class': 'auto', 'n_jobs': None, 'penalty': 'l2', 'random_state': None, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
[-4.45785901]
Regresión logística multivariada
[-4.21645549]
Creación de conjuntos de entrenamiento y prueba
[[ 1.28517496e-09 -2.27622202e-09 -2.17211991e-05]]
Codificación dummie
Index(['person_age', 'person_income', 'person_emp_length', 'loan_amnt',
'loan_int_rate', 'loan_status', 'loan_percent_income',
'cb_person_cred_hist_length', 'person_home_ownership_MORTGAGE',
'person_home_ownership_OTHER', 'person_home_ownership_OWN',
'person_home_ownership_RENT', 'loan_intent_DEBTCONSOLIDATION',
'loan_intent_EDUCATION', 'loan_intent_HOMEIMPROVEMENT',
'loan_intent_MEDICAL', 'loan_intent_PERSONAL', 'loan_intent_VENTURE',
'loan_grade_A', 'loan_grade_B', 'loan_grade_C', 'loan_grade_D',
'loan_grade_E', 'loan_grade_F', 'loan_grade_G',
'cb_person_default_on_file_N', 'cb_person_default_on_file_Y'],
dtype='object')
Predicción de probabilidades de incumplimiento
loan_status prob_default
0 1 0.445779
1 1 0.223447
2 0 0.288558
3 0 0.169358
4 1 0.114182
5 1 0.490257
6 0 0.162057
7 0 0.396211
8 1 0.217428
9 1 0.481440
10 1 0.243327
11 1 0.404546
12 0 0.063662
13 0 0.254207
14 1 0.416166
15 0 0.367900
16 0 0.246369
17 1 0.027882
18 0 0.106589
19 0 0.347888
Informe de clasificación de default
0 11175
1 609
Name: loan_status, dtype: int64
precision recall f1-score support
Non-Default 0.81 0.98 0.89 9198
Default 0.71 0.17 0.27 2586
accuracy 0.80 11784
macro avg 0.76 0.57 0.58 11784
weighted avg 0.79 0.80 0.75 11784
Selección de métricas del informe
(array([0.80742729, 0.71264368]), array([0.98097412, 0.16782676]), array([0.8857802 , 0.27167449]), array([9198, 2586]))
(array([0.80742729, 0.71264368]), array([0.98097412, 0.16782676]))
Gráfico del modelo de puntaje crediticio
0.8025288526816021
Umbrales y matrices de confusión
Performance según el umbral elegido
9872265.223119883
Selección del umbral
Gradient Boosted Trees usando XGBoost
Árboles para el incumplimiento de pago de la deuda
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:49:05] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
loan_status prob_default
0 1 0.990942
1 1 0.983987
2 0 0.000807
3 0 0.001239
4 1 0.084892
5 1 0.021722
6 0 0.010304
7 0 0.000375
8 1 0.976663
9 1 0.992004
10 1 0.173216
11 1 0.997188
12 0 0.002653
13 0 0.001659
14 1 0.992790
15 0 0.004816
16 0 0.051240
17 1 0.998005
18 0 0.413108
19 0 0.147902
Desempeño de la cartera impulsado por el gradiente
28606
0.9909417033
0.4457786018
22585
0.9839872122
0.223446533
13888
0.0008073628414
0.288558257
3145
0.001239418169
0.1693575271
14882
0.08489220589
0.1141819733
16677
0.02172206342
0.4902568814
21661
0.01030396204
0.1620574551
19872
0.0003745816357
0.3962111956
12534
0.9766631722
0.2174275844
8873
0.9920044541
0.4814399584
LR perdida esperada: 5596776.979852879
GBT perdida esperada: 5383982.809227714
Evaluación de árboles potenciados por gradiente
[1 1 0 ... 0 0 0]
precision recall f1-score support
Non-Default 0.93 0.99 0.96 9198
Default 0.94 0.74 0.83 2586
accuracy 0.93 11784
macro avg 0.94 0.86 0.89 11784
weighted avg 0.93 0.93 0.93 11784
Importancia de seleccionar variables en la predicción de incumplimiento
[22:49:08] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
{'person_income': 1299.0, 'loan_int_rate': 1001.0, 'loan_percent_income': 515.0, 'loan_amnt': 519.0, 'person_home_ownership_MORTGAGE': 116.0, 'loan_grade_F': 9.0}
Visualización de la importancia de la variable
[22:49:10] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[22:49:11] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Selección de variables y rendimiento del modelo
[22:49:12] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
precision recall f1-score support
Non-Default 0.91 0.95 0.93 9198
Default 0.79 0.66 0.72 2586
accuracy 0.89 11784
macro avg 0.85 0.81 0.83 11784
weighted avg 0.88 0.89 0.88 11784
[22:49:14] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
precision recall f1-score support
Non-Default 0.91 0.97 0.94 9198
Default 0.88 0.66 0.75 2586
accuracy 0.90 11784
macro avg 0.89 0.82 0.85 11784
weighted avg 0.90 0.90 0.90 11784
Validación cruzada
[22:49:16] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
0
0.898182
0.001318178288
1
0.9092564
0.002052300037
2
0.9136208
0.002204611295
3
0.9185998
0.001092324201
4
0.9222516
0.001818467553
Límites de las pruebas de validación cruzada
0
0.8975683
0.0009093306384
1
0.906985
0.002611366883
2
0.9136785
0.001403479052
3
0.9191233
0.0009208806709
4
0.9228642
0.001096840171
5
0.926411
0.001299419255
6
0.9304613
0.001149175709
7
0.933292
0.001214780474
8
0.9363102
0.001096736231
9
0.9392984
0.001157245799
0.94
Puntuación de validación cruzada
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:52:49] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:52:52] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:52:55] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:52:58] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[0.94048427 0.93256393 0.93324282 0.92462653]
Precisión promedio: 0.93 (+/- 0.01)
Submuestreo de datos de entrenamiento
0 3877
1 3877
Name: loan_status, dtype: int64
Rendimiento del árbol submuestreado
[22:53:01] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:53:04] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
precision recall f1-score support
Non-Default 0.93 0.99 0.96 9198
Default 0.94 0.74 0.83 2586
accuracy 0.93 11784
macro avg 0.94 0.86 0.89 11784
weighted avg 0.93 0.93 0.93 11784
precision recall f1-score support
Non-Default 0.95 0.91 0.93 9198
Default 0.73 0.82 0.77 2586
accuracy 0.89 11784
macro avg 0.84 0.87 0.85 11784
weighted avg 0.90 0.89 0.90 11784
[[9085 113]
[ 677 1909]]
[[8416 782]
[ 469 2117]]
0.8629602218579747
0.8668101710802659
Evaluación e Implementación del Modelo
Comparación de reportes de modelos
0
0.4457786018
1
1
0.223446533
0
2
0.288558257
0
3
0.1693575271
0
4
0.1141819733
0
5
0.4902568814
1
6
0.1620574551
0
7
0.3962111956
0
8
0.2174275844
0
9
0.4814399584
1
/root/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
warnings.warn(label_encoder_deprecation_msg, UserWarning)
[22:53:06] WARNING: ../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
0
0.9823871851
1
1
0.9751634002
1
2
0.003473574528
0
3
0.005457147025
0
4
0.119876273
0
5
0.1467240006
0
6
0.01178901456
0
7
0.002285469789
0
8
0.940839529
1
9
0.9881092906
1
precision recall f1-score support
Non-Default 0.86 0.92 0.89 9198
Default 0.62 0.46 0.53 2586
accuracy 0.82 11784
macro avg 0.74 0.69 0.71 11784
weighted avg 0.81 0.82 0.81 11784
precision recall f1-score support
Non-Default 0.93 0.99 0.96 9198
Default 0.94 0.73 0.82 2586
accuracy 0.93 11784
macro avg 0.93 0.86 0.89 11784
weighted avg 0.93 0.93 0.93 11784
0.7108943782814463
0.8909014142736051
Comparación de ROCs
Logistic Regression AUC Score: 0.76
Gradient Boosted Tree AUC Score: 0.94
Curvas de calibración
Tasas de aceptación
count 11784.000000
mean 0.216866
std 0.333038
min 0.000354
25% 0.022246
50% 0.065633
75% 0.177804
max 0.999557
Name: prob_default, dtype: float64
0 10016
1 1768
Name: pred_loan_status, dtype: int64
Visualización de cuantiles de aceptación
Bajas calificaciones
0
1
0.9823871851
1
1
0.9751634002
2
0
0.003473574528
3
0
0.005457147025
4
1
0.119876273
5
1
0.1467240006
6
0
0.01178901456
7
0
0.002285469789
8
1
0.940839529
9
1
0.9881092906
0.08256789137380191
Impacto en la tasa de aceptación
0
1
0.9823871851
1
1
0.9751634002
2
0
0.003473574528
3
0
0.005457147025
4
1
0.119876273
5
1
0.1467240006
6
0
0.01178901456
7
0
0.002285469789
8
1
0.940839529
9
1
0.9881092906
count 11784.000000
mean 9556.283944
std 6238.005674
min 500.000000
25% 5000.000000
50% 8000.000000
75% 12000.000000
max 35000.000000
Name: loan_amnt, dtype: float64
pred_loan_status_15 0 1
true_loan_status
0 $87,812,693.16 $86,006.56
1 $7,903,046.82 $16,809,503.46
Tabla de la estrategia comercial
0
1
0.9823871851
1
1
0.9751634002
2
0
0.003473574528
3
0
0.005457147025
4
1
0.119876273
5
1
0.1467240006
6
0
0.01178901456
7
0
0.002285469789
8
1
0.940839529
9
1
0.9881092906