import pandas as pd
df = pd.read_csv( 'practice-project-dataset-1.csv' )

df = df[['interest_rate','property_value','state_code','tract_minority_population_percent','derived_race','derived_sex','applicant_age']]
df.info()

import numpy as np
df['interest_rate'] = df['interest_rate'].replace( 'Exempt', np.nan )

df['interest_rate'] = df['interest_rate'].astype( float )

df['property_value'] = df['property_value'].replace( 'Exempt', np.nan )
df['property_value'] = df['property_value'].astype( float )
df.info()

df['applicant_age'].value_counts()

df['derived_race'] = df['derived_race'].astype( 'category' )
df['derived_sex'] = df['derived_sex'].astype( 'category' )
df['applicant_age'] = df['applicant_age'].astype( 'category' )
df.info()

lower_prices = df[df['property_value'] < 500000]
high_minority = lower_prices[lower_prices['tract_minority_population_percent'] > 75]
low_minority = lower_prices[lower_prices['tract_minority_population_percent'] < 25]

The lower_prices are defined with records that have property value that is less than 500000. Then, the remainning data are split into high minority and low minority group based on whether its tract_minority_population_percent is greater than 75 or less than 25.

import matplotlib.pyplot as plt
plt.hist( [ high_minority['property_value'], low_minority['property_value'] ],
bins=20, density=True )
plt.legend( [ 'High % minority', 'Low % minority' ] )
plt.title( 'Sample of 2018 Home Mortgage Applications' )
plt.xlabel( 'Property Value' )
plt.ylabel( 'Proportion' )
plt.show()

The values that are pre-defined are shown using a histogram with 20 categories for both high minority and low minority. Then, the labels of both the x-axis and y-axis and the title of the chart are specified. The legend function gives the representation of each category.

high_minority['property_value'].mean(), low_minority['property_value'].mean()

The average value of both high minority and low minority are calculated.

from scipy import stats
alpha = 0.05
statistic, pvalue = stats.ttest_ind( high_minority['property_value'],
low_minority['property_value'],
equal_var=False )
pvalue < alpha

The performed hypothesis test results in p-value that is less than significance level of 5%. Therefore, we can conclude that, at the significance level of 5%, we have evidence to reject the null hypothesis.