Step 1
Importing the spreadsheet into a pandas data frame
import pandas as pd
df = pd.read_csv('mortgage-testing-data.csv')
df.head()
Step 2
Replacing the categorical columns with Boolean values
df['conforming'] = df['conforming_loan_limit'].replace(['NC', 'C'], [0, 1])
df['not_conforming'] = df['conforming_loan_limit'].replace(['C', 'NC'], [0, 1])
df = df[df['derived_sex'].isin(['Male', 'Female'])]
df['derived_sex_is_female'] = df['derived_sex'].replace(['Male', 'Female'], [0, 1])
df['derived_sex_is_male'] = df['derived_sex'].replace(['Female', 'Male'], [0, 1])
df['application_accepted'] = df['action_taken'].replace([3, 1], [0, 1])
df['debt_to_income_ratio'] = df['debt_to_income_ratio'].replace(['<20%', '20%-<30%', '30%-<36%', '50%-60%', '>60%'], [10, 25, 33, 55, 70])
df['debt_to_income_ratio'] = pd.to_numeric(df['debt_to_income_ratio'])
df = df.drop(['conforming_loan_limit', 'derived_sex', 'action_taken'], axis=1)
Step 3
Exporting the cleaned up dataset into a new cvs file
df.to_csv('class12_example.csv')