title here
Introduction
NASA Asteroids Classification
import pandas as pd
# Load the data set
df = pd.read_csv ('nasa.csv')
df
Methodology
Pre-Processing
df.info() # no null values
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4687 entries, 0 to 4686
Data columns (total 40 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Neo Reference ID 4687 non-null int64
1 Name 4687 non-null int64
2 Absolute Magnitude 4687 non-null float64
3 Est Dia in KM(min) 4687 non-null float64
4 Est Dia in KM(max) 4687 non-null float64
5 Est Dia in M(min) 4687 non-null float64
6 Est Dia in M(max) 4687 non-null float64
7 Est Dia in Miles(min) 4687 non-null float64
8 Est Dia in Miles(max) 4687 non-null float64
9 Est Dia in Feet(min) 4687 non-null float64
10 Est Dia in Feet(max) 4687 non-null float64
11 Close Approach Date 4687 non-null object
12 Epoch Date Close Approach 4687 non-null int64
13 Relative Velocity km per sec 4687 non-null float64
14 Relative Velocity km per hr 4687 non-null float64
15 Miles per hour 4687 non-null float64
16 Miss Dist.(Astronomical) 4687 non-null float64
17 Miss Dist.(lunar) 4687 non-null float64
18 Miss Dist.(kilometers) 4687 non-null float64
19 Miss Dist.(miles) 4687 non-null float64
20 Orbiting Body 4687 non-null object
21 Orbit ID 4687 non-null int64
22 Orbit Determination Date 4687 non-null object
23 Orbit Uncertainity 4687 non-null int64
24 Minimum Orbit Intersection 4687 non-null float64
25 Jupiter Tisserand Invariant 4687 non-null float64
26 Epoch Osculation 4687 non-null float64
27 Eccentricity 4687 non-null float64
28 Semi Major Axis 4687 non-null float64
29 Inclination 4687 non-null float64
30 Asc Node Longitude 4687 non-null float64
31 Orbital Period 4687 non-null float64
32 Perihelion Distance 4687 non-null float64
33 Perihelion Arg 4687 non-null float64
34 Aphelion Dist 4687 non-null float64
35 Perihelion Time 4687 non-null float64
36 Mean Anomaly 4687 non-null float64
37 Mean Motion 4687 non-null float64
38 Equinox 4687 non-null object
39 Hazardous 4687 non-null bool
dtypes: bool(1), float64(30), int64(5), object(4)
memory usage: 1.4+ MB
df['Orbiting Body'].unique()
# pwede i-drop kasi same val lang for everything
df['Orbit ID'].unique().shape
remove_columns = [x for x in df if 'km' in x.lower() or 'feet' in x.lower() or 'kilometers' in x.lower()]
remove_columns = remove_columns + ['Orbiting Body', 'Est Dia in M(min)', 'Est Dia in M(max)']
remove_columns
df[[x for x in df if x not in remove_columns]].iloc[0]