<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15120 entries, 0 to 15119
Columns: 101 entries, Unnamed: 0 to tract_median_age_of_housing_units
dtypes: float64(31), int64(43), object(27)
memory usage: 11.7+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15120 entries, 0 to 15119
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 interest_rate 10061 non-null object
1 property_value 12424 non-null object
2 state_code 14929 non-null object
3 tract_minority_population_percent 15120 non-null float64
4 derived_race 15120 non-null object
5 derived_sex 15120 non-null object
6 applicant_age 15120 non-null object
dtypes: float64(1), object(6)
memory usage: 827.0+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15120 entries, 0 to 15119
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 interest_rate 9660 non-null float64
1 property_value 12024 non-null float64
2 state_code 14929 non-null object
3 tract_minority_population_percent 15120 non-null float64
4 derived_race 15120 non-null object
5 derived_sex 15120 non-null object
6 applicant_age 15120 non-null object
dtypes: float64(3), object(4)
memory usage: 827.0+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15120 entries, 0 to 15119
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 interest_rate 9660 non-null float64
1 property_value 12024 non-null float64
2 state_code 14929 non-null object
3 tract_minority_population_percent 15120 non-null float64
4 derived_race 15120 non-null category
5 derived_sex 15120 non-null category
6 applicant_age 15120 non-null category
dtypes: category(3), float64(3), object(1)
memory usage: 517.9+ KB
Let's take a look at the property's valued below 500,000. Within this group of smaller property values, let's filter the data into two new sets: one including areas with a high minority population and another with areas of low minority population.
There is no output, which confirms that the code ran successfully (or without any errors), so now we can plot a frequency histogram of property value of mortgage applications among the high and low minority populated areas and compare them.
The histogram plotted above indicates a successfully executed code. It is apparent that for most given property values, low % minority areas have a similar proportion as high % minority areas. However, high % minority areas tend to have slightly greater proportions than low % minority areas in property values around 200,000 or less, while the opposite tends to be true for property values greater than 200,000, with the exception of the last two bins of property values which are slightly below 500,000.
Now that we know our data has been coded as we desired, let's compute the mean of the property values for the high % minority areas and the low % minority areas.
The output returns the mean of each population. The mean property value of high % minority areas is less than the mean property value of low % minority areas, however not by much. Let's see if there is a statistical significance between these two values.
Since the output yielded true (assuming the inputs above were done correctly), we have sufficient evidence to reject the null hypothesis. In other words, we can conclude that, based on our data, the population of high % minority areas has a different mean property value than the population of low % minority areas.