Classification with Python
We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.
Lets first load required libraries:
Downloading the data set
Loading the data from CSV file
Convert to date-time object
Data visualization and pre-processing
260 people have paid off the loan on time while 86 have gone into collection
Pre-processing: Feature selection/extraction
Lets check what day of the week people get the loan
We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4
Convert Categorical features to numerical values
Let's look at gender
Lets convert male to 0 and female to 1:
One Hot Encoding
How about education?
Feature before One Hot Encoding
Using one hot encoding technique to convert categorical varables to binary variables and append them to the feature Data Frame
Feature selection
Defining the features of x
What are our lables?
Normalize Data
Classification Now, i will use the training set to build an accurate model. Then use the test set to report the accuracy of the model. The following algorithms would be used:
K Nearest Neighbor(KNN) Decision Tree Support Vector Machine Logistic Regression