House sales price - Reduction and prediction
Notebook under renovation! Sorry for the inconvenience. (All the code it's alright but I'm working on storytelling)
Libraries
Data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 1 to 1460
Data columns (total 80 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null int64
1 MSZoning 1460 non-null object
2 LotFrontage 1201 non-null float64
3 LotArea 1460 non-null int64
4 Street 1460 non-null object
5 Alley 91 non-null object
6 LotShape 1460 non-null object
7 LandContour 1460 non-null object
8 Utilities 1460 non-null object
9 LotConfig 1460 non-null object
10 LandSlope 1460 non-null object
11 Neighborhood 1460 non-null object
12 Condition1 1460 non-null object
13 Condition2 1460 non-null object
14 BldgType 1460 non-null object
15 HouseStyle 1460 non-null object
16 OverallQual 1460 non-null int64
17 OverallCond 1460 non-null int64
18 YearBuilt 1460 non-null int64
19 YearRemodAdd 1460 non-null int64
20 RoofStyle 1460 non-null object
21 RoofMatl 1460 non-null object
22 Exterior1st 1460 non-null object
23 Exterior2nd 1460 non-null object
24 MasVnrType 1452 non-null object
25 MasVnrArea 1452 non-null float64
26 ExterQual 1460 non-null object
27 ExterCond 1460 non-null object
28 Foundation 1460 non-null object
29 BsmtQual 1423 non-null object
30 BsmtCond 1423 non-null object
31 BsmtExposure 1422 non-null object
32 BsmtFinType1 1423 non-null object
33 BsmtFinSF1 1460 non-null int64
34 BsmtFinType2 1422 non-null object
35 BsmtFinSF2 1460 non-null int64
36 BsmtUnfSF 1460 non-null int64
37 TotalBsmtSF 1460 non-null int64
38 Heating 1460 non-null object
39 HeatingQC 1460 non-null object
40 CentralAir 1460 non-null object
41 Electrical 1459 non-null object
42 1stFlrSF 1460 non-null int64
43 2ndFlrSF 1460 non-null int64
44 LowQualFinSF 1460 non-null int64
45 GrLivArea 1460 non-null int64
46 BsmtFullBath 1460 non-null int64
47 BsmtHalfBath 1460 non-null int64
48 FullBath 1460 non-null int64
49 HalfBath 1460 non-null int64
50 BedroomAbvGr 1460 non-null int64
51 KitchenAbvGr 1460 non-null int64
52 KitchenQual 1460 non-null object
53 TotRmsAbvGrd 1460 non-null int64
54 Functional 1460 non-null object
55 Fireplaces 1460 non-null int64
56 FireplaceQu 770 non-null object
57 GarageType 1379 non-null object
58 GarageYrBlt 1379 non-null float64
59 GarageFinish 1379 non-null object
60 GarageCars 1460 non-null int64
61 GarageArea 1460 non-null int64
62 GarageQual 1379 non-null object
63 GarageCond 1379 non-null object
64 PavedDrive 1460 non-null object
65 WoodDeckSF 1460 non-null int64
66 OpenPorchSF 1460 non-null int64
67 EnclosedPorch 1460 non-null int64
68 3SsnPorch 1460 non-null int64
69 ScreenPorch 1460 non-null int64
70 PoolArea 1460 non-null int64
71 PoolQC 7 non-null object
72 Fence 281 non-null object
73 MiscFeature 54 non-null object
74 MiscVal 1460 non-null int64
75 MoSold 1460 non-null int64
76 YrSold 1460 non-null int64
77 SaleType 1460 non-null object
78 SaleCondition 1460 non-null object
79 SalePrice 1460 non-null int64
dtypes: float64(3), int64(34), object(43)
memory usage: 923.9+ KB
count
1460
1201
mean
56.89726027
70.04995837
std
42.30057099
24.28475177
min
20
21
25%
20
59
50%
50
69
75%
70
80
max
190
313
SalePrice
/shared-libs/python3.9/py/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
/shared-libs/python3.9/py/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.9/py/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
Numeric
1
60
65
2
20
80
3
60
68
4
70
60
5
60
84
6
50
85
7
20
75
8
60
nan
9
50
51
10
190
50
dropping un-correlated
dropping redundant
SalePrice 1.000000
OverallQual 0.817185
GrLivArea 0.700927
Name: SalePrice, dtype: float64
OverallQual 1.000000
SalePrice 0.817185
GarageCars 0.600671
Name: OverallQual, dtype: float64
GrLivArea 1.000000
TotRmsAbvGrd 0.825489
SalePrice 0.700927
Name: GrLivArea, dtype: float64
GarageCars 1.000000
GarageArea 0.882475
SalePrice 0.680625
Name: GarageCars, dtype: float64
GarageArea 1.000000
GarageCars 0.882475
SalePrice 0.650888
Name: GarageArea, dtype: float64
TotalBsmtSF 1.000000
1stFlrSF 0.819530
SalePrice 0.612134
Name: TotalBsmtSF, dtype: float64
1stFlrSF 1.000000
TotalBsmtSF 0.819530
SalePrice 0.596981
Name: 1stFlrSF, dtype: float64
FullBath 1.000000
GrLivArea 0.630012
SalePrice 0.594771
Name: FullBath, dtype: float64
YearBuilt 1.000000
GarageYrBlt 0.825667
YearRemodAdd 0.592855
Name: YearBuilt, dtype: float64
YearRemodAdd 1.000000
GarageYrBlt 0.642277
YearBuilt 0.592855
Name: YearRemodAdd, dtype: float64
GarageYrBlt 1.000000
YearBuilt 0.825667
YearRemodAdd 0.642277
Name: GarageYrBlt, dtype: float64
TotRmsAbvGrd 1.000000
GrLivArea 0.825489
2ndFlrSF 0.616423
Name: TotRmsAbvGrd, dtype: float64
Fireplaces 1.000000
SalePrice 0.489450
GrLivArea 0.461679
Name: Fireplaces, dtype: float64
MasVnrArea 1.000000
SalePrice 0.430809
OverallQual 0.411876
Name: MasVnrArea, dtype: float64
BsmtFinSF1 1.000000
TotalBsmtSF 0.522396
1stFlrSF 0.445863
Name: BsmtFinSF1, dtype: float64
LotFrontage 1.000000
1stFlrSF 0.457181
GrLivArea 0.402797
Name: LotFrontage, dtype: float64
WoodDeckSF 1.000000
SalePrice 0.334135
GrLivArea 0.247433
Name: WoodDeckSF, dtype: float64
OpenPorchSF 1.000000
GrLivArea 0.330224
SalePrice 0.321053
Name: OpenPorchSF, dtype: float64
2ndFlrSF 1.000000
GrLivArea 0.687501
TotRmsAbvGrd 0.616423
Name: 2ndFlrSF, dtype: float64
HalfBath 1.000000
2ndFlrSF 0.609707
GrLivArea 0.415772
Name: HalfBath, dtype: float64
OverallQual 0.8171846144867666
GrLivArea 0.7009269871427152
GarageArea 0.6508876811435947
TotalBsmtSF 0.6121342283262258
FullBath 0.5947706649972516
YearBuilt 0.5865701927897158
Fireplaces 0.4894495451574793
MasVnrArea 0.43080895642003225
BsmtFinSF1 0.37202325313636714
LotFrontage 0.3558786203664004
WoodDeckSF 0.3341351729561154
OpenPorchSF 0.3210532515909112
HalfBath 0.31398222425673417
other skewed features
Numeric data is ready...
1
2.583823886
2.079441542
2
2.573300271
1.945910149
3
2.589054268
2.079441542
4
2.553297495
2.079441542
5
2.597432945
2.197224577
6
2.554946177
1.791759469
7
2.61261114
2.197224577
8
2.580677151
2.079441542
9
2.54745318
2.079441542
10
2.539903574
1.791759469
Categorical
droping unuseful
Encoding
[nan 'Ex' 'Fa' 'Gd']
[nan 'MnPrv' 'GdWo' 'GdPrv' 'MnWw']
[0. 1.]
[0. 1.]
1
0
3
2
3
3
3
3
3
4
4
3
5
3
3
6
0
3
7
4
3
8
3
3
9
3
2
10
3
4
1
4
4
2
3
4
3
4
4
4
3
3
5
4
4
6
3
4
7
4
5
8
3
4
9
3
3
10
3
3
back to train...
1
2.583823886
2.079441542
2
2.573300271
1.945910149
3
2.589054268
2.079441542
4
2.553297495
2.079441542
5
2.597432945
2.197224577
6
2.554946177
1.791759469
7
2.61261114
2.197224577
8
2.580677151
2.079441542
9
2.54745318
2.079441542
10
2.539903574
1.791759469
outliers
1
2.583823886
2.079441542
2
2.573300271
1.945910149
3
2.589054268
2.079441542
4
2.553297495
2.079441542
5
2.597432945
2.197224577
6
2.554946177
1.791759469
7
2.61261114
2.197224577
8
2.580677151
2.079441542
9
2.54745318
2.079441542
10
2.539903574
1.791759469
dealing with na
LotFrontage
241
0.184532925
MasVnrArea
8
0.006125574273
SalePrice
0
0
47
0
0
58
0
0
57
0
0
56
0
0
55
0
0
54
0
0
53
0
0
its done
1
2.583823886
2.079441542
2
2.573300271
1.945910149
3
2.589054268
2.079441542
4
2.553297495
2.079441542
5
2.597432945
2.197224577
6
2.554946177
1.791759469
7
2.61261114
2.197224577
8
2.580677151
2.079441542
9
2.54745318
2.079441542
10
2.539903574
1.791759469
PCA
/shared-libs/python3.9/py/lib/python3.9/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
warnings.warn(
/shared-libs/python3.9/py/lib/python3.9/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
warnings.warn(
1
-29.42217173
-1.874271452
2
-2.305921752
-1.820943173
3
-27.44928591
-1.726157305
4
58.64175324
-1.057385912
5
-26.65210112
-2.753840295
6
-19.28511855
-0.5855753891
7
-30.63994246
-3.160409722
8
0.3135960404
-3.908854095
9
42.75127877
3.719236787
10
34.73817524
-1.859999
1
2.583823886
2.079441542
2
2.573300271
1.945910149
3
2.589054268
2.079441542
4
2.553297495
2.079441542
5
2.597432945
2.197224577
6
2.554946177
1.791759469
7
2.61261114
2.197224577
8
2.580677151
2.079441542
9
2.54745318
2.079441542
10
2.539903574
1.791759469
1
2.583823886
-29.42217173
2
2.573300271
-2.305921752
3
2.589054268
-27.44928591
4
2.553297495
58.64175324
5
2.597432945
-26.65210112
6
2.554946177
-19.28511855
7
2.61261114
-30.63994246
8
2.580677151
0.3135960404
9
2.54745318
42.75127877
10
2.539903574
34.73817524
model
MSE train: 0.000, test: 0.000
R^2 train: 0.960, test: 0.714
MSE train: 0.000, test: 0.000
R^2 train: 0.806, test: 0.801