!pip install h2o
!pip install dabl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dabl import plot
/usr/local/lib/python3.7/dist-packages/sklearn/experimental/enable_hist_gradient_boosting.py:17: UserWarning: Since version 1.0, it is not needed to import enable_hist_gradient_boosting anymore. HistGradientBoostingClassifier and HistGradientBoostingRegressor are now stable and can be normally imported from sklearn.ensemble.
"Since version 1.0, "
train = pd.read_excel("/content/Data_Train.xlsx")
train.head()
test = pd.read_excel("/content/Test_set.xlsx")
test.head()
plot(train,'Airline')
plt.show()
/usr/local/lib/python3.7/dist-packages/dabl/plot/supervised.py:548: FutureWarning: The second positional argument of plot is a Series 'y'. If passing a column name, use a keyword.
FutureWarning)
Target looks like classification
plot(train,'Price')
plt.show()
/usr/local/lib/python3.7/dist-packages/dabl/plot/supervised.py:548: FutureWarning: The second positional argument of plot is a Series 'y'. If passing a column name, use a keyword.
FutureWarning)
Target looks like regression
/usr/local/lib/python3.7/dist-packages/dabl/plot/utils.py:633: UserWarning: Dropped 16 outliers in column Price.
int(dropped), series.name), UserWarning)
plot(train,'Source')
plt.show()
/usr/local/lib/python3.7/dist-packages/dabl/plot/supervised.py:548: FutureWarning: The second positional argument of plot is a Series 'y'. If passing a column name, use a keyword.
FutureWarning)
Target looks like classification
plot(train,'Destination')
plt.show()
/usr/local/lib/python3.7/dist-packages/dabl/plot/supervised.py:548: FutureWarning: The second positional argument of plot is a Series 'y'. If passing a column name, use a keyword.
FutureWarning)
Target looks like classification
train.to_csv("C:\\Users\\Lulus\\Deck\\Desktop\\MachineHack\\Flight_Ticket_Participant_Datasets\\train_csv.csv")
test.to_csv("C:\\Users\\Lulus\\Deck\\Desktop\\MachineHack\\Flight_Ticket_Participant_Datasets\\test_csv.csv")
import h2o
from h2o.automl import H2OAutoML
# Start the H2O cluster (locally)
h2o.init()
# Import a sample binary outcome train/test set into H2O
train = h2o.import_file("C:\\Users\\Lulus\\Deck\\Desktop\\MachineHack\\Flight_Ticket_Participant_Datasets\\train_csv.csv")
test = h2o.import_file("C:\\Users\\Lulus\\Deck\\Desktop\\MachineHack\\Flight_Ticket_Participant_Datasets\\test_csv.csv")
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%
Model Details
=============
H2OStackedEnsembleEstimator : Stacked Ensemble
Model Key: StackedEnsemble_BestOfFamily_5_AutoML_2_20220326_164430
No model summary for this model
ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **
MSE: 3681776.4467453547
RMSE: 1918.795571900601
MAE: 1142.575136563195
RMSLE: 0.18183018315237182
R^2: 0.8254441274874895
Mean Residual Deviance: 3681776.4467453547
Null degrees of freedom: 10043
Residual degrees of freedom: 10039
Null deviance: 211850919177.83386
Residual deviance: 36979762631.110344
AIC: 180369.92835328373
ModelMetricsRegressionGLM: stackedensemble
** Reported on cross-validation data. **
MSE: 4846237.14133705
RMSE: 2201.417075734866
MAE: 1275.6310051412383
RMSLE: 0.20474552039080554
R^2: 0.7720773859290155
Mean Residual Deviance: 4846237.14133705
Null degrees of freedom: 10682
Residual degrees of freedom: 10678
Null deviance: 227161784458.47144
Residual deviance: 51772351380.9037
AIC: 194780.07790416625
Model Details
=============
H2OStackedEnsembleEstimator : Stacked Ensemble
Model Key: StackedEnsemble_BestOfFamily_5_AutoML_2_20220326_164430
No model summary for this model
ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **
MSE: 3681776.4467453547
RMSE: 1918.795571900601
MAE: 1142.575136563195
RMSLE: 0.18183018315237182
R^2: 0.8254441274874895
Mean Residual Deviance: 3681776.4467453547
Null degrees of freedom: 10043
Residual degrees of freedom: 10039
Null deviance: 211850919177.83386
Residual deviance: 36979762631.110344
AIC: 180369.92835328373
ModelMetricsRegressionGLM: stackedensemble
** Reported on cross-validation data. **
MSE: 4846237.14133705
RMSE: 2201.417075734866
MAE: 1275.6310051412383
RMSLE: 0.20474552039080554
R^2: 0.7720773859290155
Mean Residual Deviance: 4846237.14133705
Null degrees of freedom: 10682
Residual degrees of freedom: 10678
Null deviance: 227161784458.47144
Residual deviance: 51772351380.9037
AIC: 194780.07790416625
preds = aml.leader.predict(test)
preds.columns=['Price']
preds.head()
submission = preds.as_data_frame(use_pandas=True)
submission.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2671 entries, 0 to 2670
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Price 2671 non-null float64
dtypes: float64(1)
memory usage: 21.0 KB
submission.to_csv("C:\\Users\\Lulus\\Deck\\Desktop\\MachineHack\\Flight_Ticket_Participant_Datasets\\submission.csv")