BART and COVID-19 Data Analysis in Alameda County: Public transportation’s effects on wider community case rates
By: Azucena Castro and Alexandra Stassinopoulos
From surface level analysis, it appears that zip code districts with BART stations have almost double the amount of cases at the time this analysis was completed. However, you'll notice that there is also double the population as well and that the case rates only differ by around 0.3.
Based on the p-values of individual parameters after the linear regression we can conclude that only the Household Median Income and the percent of population which belong to a community of color are statistically significant at the 5 percent level. However, it is interesting to note that the F-test statistic of all of the parameters together (2.24e-05) indicate that the combination of all factors is highly statistically significant.
To determine if the other variables's significance is perhaps affected by scale, we graphed each continuous independent variable against the dependent variable and compared it with the log of same variable against the dependent variable.
After graphing each independent variable with the dependent variable, the graphs show that data for population density, population and median income would all benefit from being on a log scale. Due to the sheer size of the values on the x-axis, scaling these variables will clarify their relationships to Case Rates when the regression is re-run.
The second multiple regression did not show any change in which variables were statistically significant. To determine if these variables are the ones causing the F-statistic to be so high, we ran an F-test (special Wald Test) to see if the other variables were still statistically significant if we excluded the Median Household Income and the Percent POC variables.
When we run an f-test and exclude the scaled Household Median Income variable and the percent POC variable, the statistical significance of the f-statistic disappears; when the two variables that are individually statistically significant are excluded, the p-value for the f-test is 0.1703721295637166, which is far above the traditional 5% cut-off.