Car Sales Analysis
0
Suzuki
Vitara
1
Honda
S2000
2
BMW
Z4
3
Toyota
Tacoma
4
Ford
Festiva
5
Buick
Skylark
6
Infiniti
QX
7
Ram
C/V
8
GMC
Safari
9
Nissan
Altima
0
Infiniti
G
1
Chevrolet
Suburban 2500
2
Mitsubishi
Precis
3
Mercedes-Benz
E-Class
4
Plymouth
Breeze
5
Chevrolet
Avalanche
6
Dodge
Grand Caravan
7
Toyota
Celica
8
Spyker
C8 Double 12 S
9
GMC
Yukon XL 1500
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4998 entries, 0 to 4997
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Make 4998 non-null object
1 Model 4998 non-null object
2 Nickname 4998 non-null object
3 Car Gender 4998 non-null object
4 Buyer Gender 4998 non-null object
5 Buyer Age 4998 non-null int64
6 Buzzword 4998 non-null object
7 Country 4998 non-null object
8 City 4998 non-null object
9 Dealer Latitude 4998 non-null float64
10 Dealer Longitude 4998 non-null float64
11 Color 4998 non-null object
12 New Car 4998 non-null bool
13 Purchase Date 4998 non-null object
14 Sale Price 4998 non-null float64
15 Discount 4998 non-null float64
16 Resell Price 4998 non-null float64
17 5-yr Depreciation 4998 non-null float64
18 Top Speed 4998 non-null float64
19 0-60 Time 4998 non-null float64
dtypes: bool(1), float64(8), int64(1), object(10)
memory usage: 746.9+ KB
count
4998
4998
mean
47.83313325
24.80011835
std
16.03538733
24.61560776
min
20
-46.5996116
25%
34
7.3938026
50%
48
30.9104435
75%
62
44.2729581
max
75
69.63186
count
4998
4998
unique
69
876
top
Ford
Corvette
freq
428
31
Buyer Gender New Car
Female False 1252
True 1266
Male False 1267
True 1213
Name: Model, dtype: int64
Female
Male
AxesSubplot(0.125,0.125;0.775x0.755)
0
Suzuki
Vitara
1
Honda
S2000
2
BMW
Z4
3
Toyota
Tacoma
4
Ford
Festiva
5
Buick
Skylark
6
Infiniti
QX
7
Ram
C/V
8
GMC
Safari
9
Nissan
Altima
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4998 entries, 0 to 4997
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Make 4998 non-null object
1 Model 4998 non-null object
2 Buyer Gender 4998 non-null object
3 Buyer Age 4998 non-null int64
4 New Car 4998 non-null bool
5 Sale Price 4998 non-null float64
6 5-yr Depreciation 4998 non-null float64
7 Top Speed 4998 non-null float64
8 0-60 Time 4998 non-null float64
dtypes: bool(1), float64(4), int64(1), object(3)
memory usage: 317.4+ KB
0
Vitara
1
1
S2000
0
2
Z4
1
3
Tacoma
0
4
Festiva
0
0 1
1 0
2 1
3 0
4 0
..
4993 0
4994 1
4995 1
4996 0
4997 0
Name: Buyer Gender, Length: 4998, dtype: int64 Buyer Age Top Speed 5-yr Depreciation Sale Price
0 51 200.9 0.13 54806.14
1 30 158.5 0.02 51826.30
2 54 149.5 0.24 82929.14
3 68 153.3 0.20 56928.66
4 70 122.0 0.18 77201.26
... ... ... ... ...
4993 51 191.7 0.11 67254.72
4994 38 243.1 0.24 68142.45
4995 23 224.8 0.12 57902.43
4996 62 201.6 0.24 57009.68
4997 49 129.9 0.07 71653.19
[4998 rows x 4 columns]
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function plot_confusion_matrix is deprecated; Function `plot_confusion_matrix` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: ConfusionMatrixDisplay.from_predictions or ConfusionMatrixDisplay.from_estimator.
warnings.warn(msg, category=FutureWarning)
accuracy: 1.0
Make Model Nickname Car Gender Buyer Age \
0 Infiniti G Eliot Male 51
1 Chevrolet Suburban 2500 Bryna Male 59
2 Mitsubishi Precis Mack Female 32
3 Mercedes-Benz E-Class Elora Female 43
4 Plymouth Breeze Sigvard Male 34
Buzzword Country City Dealer Latitude \
0 multi-tasking Hungary Budapest 47.387906
1 framework Indonesia Liliba -10.170262
2 homogeneous China Shuangxing 41.773024
3 attitude-oriented France Grenoble 45.193486
4 Inverse Ukraine Khust 48.173463
Dealer Longitude Color New Car Purchase Date Sale Price Discount \
0 19.115039 Mauv VERDADERO 5/11/2015 70924.82 0.6790
1 123.642753 Crimson FALSO 2/04/2015 90407.26 0.1462
2 123.356112 Fuscia FALSO 5/06/2016 51744.01 0.1785
3 5.721898 Violet FALSO 29/05/2016 60654.60 0.2899
4 23.297248 Indigo FALSO 10/03/2009 68702.62 0.0721
Resell Price 5-yr Depreciation Top Speed 0-60 Time
0 13065.96 0.12 187.7 9.8
1 40998.62 0.16 197.9 2.5
2 17450.08 0.22 152.3 10.9
3 43694.28 0.06 179.0 5.6
4 19594.64 0.14 225.0 10.3
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5002 entries, 0 to 5001
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Make 5002 non-null object
1 Model 5002 non-null object
2 Nickname 5002 non-null object
3 Car Gender 5002 non-null object
4 Buyer Age 5002 non-null int64
5 Buzzword 5002 non-null object
6 Country 5002 non-null object
7 City 5002 non-null object
8 Dealer Latitude 5002 non-null float64
9 Dealer Longitude 5002 non-null float64
10 Color 5002 non-null object
11 New Car 5002 non-null object
12 Purchase Date 5002 non-null object
13 Sale Price 5002 non-null float64
14 Discount 5002 non-null float64
15 Resell Price 5002 non-null float64
16 5-yr Depreciation 5002 non-null float64
17 Top Speed 5002 non-null float64
18 0-60 Time 5002 non-null float64
dtypes: float64(8), int64(1), object(10)
memory usage: 742.6+ KB
0
Infiniti
G
1
Chevrolet
Suburban 2500
2
Mitsubishi
Precis
3
Mercedes-Benz
E-Class
4
Plymouth
Breeze
5
Chevrolet
Avalanche
6
Dodge
Grand Caravan
7
Toyota
Celica
8
Spyker
C8 Double 12 S
9
GMC
Yukon XL 1500
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5002 entries, 0 to 5001
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Make 5002 non-null object
1 Model 5002 non-null object
2 Nickname 5002 non-null object
3 Car Gender 5002 non-null object
4 Buyer Age 5002 non-null int64
5 Buzzword 5002 non-null object
6 Country 5002 non-null object
7 City 5002 non-null object
8 Dealer Latitude 5002 non-null float64
9 Dealer Longitude 5002 non-null float64
10 Color 5002 non-null object
11 New Car 5002 non-null int64
12 Purchase Date 5002 non-null object
13 Sale Price 5002 non-null float64
14 Discount 5002 non-null float64
15 Resell Price 5002 non-null float64
16 5-yr Depreciation 5002 non-null float64
17 Top Speed 5002 non-null float64
18 0-60 Time 5002 non-null float64
dtypes: float64(8), int64(2), object(9)
memory usage: 742.6+ KB
None
0
51
187.7
1
59
197.9
2
32
152.3
3
43
179
4
34
225
(5002, 4)
Clasificador arbol de decisión: [1 0 1 0 1 1 1 1 1 0]
Conclusión
Durante el análisis de los datos del negocio, se obtuvo que el género femenino es el que más ingresos aporta a la concesionaria. Se obtiene que los ingresos por género son: Mujeres $210.040.800.000.000 Hombres $203.521.700.000.000
Luego, se utilizó el algoritmo de árbol de decisión, para crear un modelos y testearlo, de los cuales dieron como resultado de precisión del 100%. Por último se obtuvieron predicciones de un conjunto de datos de prueba con valores del 100% correctos sobre el género compraría un auto por su precio. El modelo predijo los siguientes resultados:
Clasificador árbol de decisión: [1 0 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1]