Exploring Dataset
import pandas as pd
loading data from file and set columns
df = pd.read_csv('imports-85.data')
columns = ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price']
df.columns = columns
Checking data types
df.dtypes
Statistical summary
df.describe()
symbolingfloat64
wheel-basefloat64
count
204
204
mean
0.8235294118
98.80637255
std
1.23903478
5.994143988
min
-2
86.6
25%
0
94.5
50%
1
97
75%
2
102.4
max
3
120.9
Full summary statistics
df.describe(include='all')
symbolingfloat64
-2.0 - 204.0
normalized-lossesobject
2049.1%
3 others27.3%
Missing63.6%
count
204
204
unique
nan
52
top
nan
?
freq
nan
40
mean
0.8235294118
nan
std
1.23903478
nan
min
-2
nan
25%
0
nan
50%
1
nan
75%
2
nan
Concise summary
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 204 entries, 0 to 203
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 symboling 204 non-null int64
1 normalized-losses 204 non-null object
2 make 204 non-null object
3 fuel-type 204 non-null object
4 aspiration 204 non-null object
5 num-of-doors 204 non-null object
6 body-style 204 non-null object
7 drive-wheels 204 non-null object
8 engine-location 204 non-null object
9 wheel-base 204 non-null float64
10 length 204 non-null float64
11 width 204 non-null float64
12 height 204 non-null float64
13 curb-weight 204 non-null int64
14 engine-type 204 non-null object
15 num-of-cylinders 204 non-null object
16 engine-size 204 non-null int64
17 fuel-system 204 non-null object
18 bore 204 non-null object
19 stroke 204 non-null object
20 compression-ratio 204 non-null float64
21 horsepower 204 non-null object
22 peak-rpm 204 non-null object
23 city-mpg 204 non-null int64
24 highway-mpg 204 non-null int64
25 price 204 non-null object
dtypes: float64(5), int64(5), object(16)
memory usage: 41.6+ KB
Viewing top 5 rows
df.head()
symbolingint64
normalized-lossesobject
0
3
?
1
1
?
2
2
164
3
2
164
4
2
?
Viewing bottom 5 rows
df.tail()
symbolingint64
normalized-lossesobject
199
-1
95
200
-1
95
201
-1
95
202
-1
95
203
-1
95
View dataset
df
symbolingint64
-2 - 3
normalized-lossesobject
?19.6%
1615.4%
50 others75%
0
3
?
1
1
?
2
2
164
3
2
164
4
2
?
5
1
158
6
1
?
7
1
158
8
0
?
9
2
192
Exporting dataset into csv file
df.to_csv('autos.csv')