Examining COVID-19 Vaccination Rates in California
How are race/ethnicity and income related to vaccination rates?
Association between race/ethnicity and COVID-19 vaccination rates
The different race/ethnicity categories in "race_table" are American Indian or Alaska Native, Asian, Black or African American, Latino, Multiracial, Native Hawaiian or Other Pacific Islander, Other Race, Unknown and White. Since the table is sorted according to the administration date of the vaccine, we decided to clean the table by only keeping the rows of the most recent administration date (11/12/2021).
Looking at the table, we noticed that many values in the "Other Race" and "Unknown" categories are nan. Checking the percentages in the table that are over 100:
Since the "Native Hawaiian or Other Pacific Islander" demographic largely contains percentages that are over 100, we decided to fully remove it from the table, as well as removing "Other Race" and "Unknown."
Seeing the differences in fully vaccinated rates along the different demographic groups, we decided to perform an A/B test to determine whether certain groups were getting vaccinated at statistically significantly higher rates than others. Instead of analyzing the differences in each combination of groups, we decided to split the demographics into "traditionally represented" and "traditionally underrepresented" racial groups, using sources about presence in higher education.
Our calculated p-value is 0.0011. Using a standard p-value cutoff of 0.05, we would reject the null hypothesis. From our analysis, we would conclude that traditionally underrepresented groups are being vaccinated a lower rates than other groups in California.
The above visualizations show that some groups that had low rates of full vaccination have high rates of one dose vaccination, such as the American Indian/Alaska Native and Multiracial demographics. This can potentially be explained by these groups getting vaccinated in larger numbers more recently, or that members in these groups only received one does and did not return for the second.
Linear Regression between COVID-19 vaccination rates and median income per county
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. FutureWarning The correlation coefficient is: 0.7889963375011697
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. FutureWarning The correlation coefficient is: 0.7431907440294258
The graphs above show that there is a strong, positive correlation between median income and the percentage of people who are fully vaccinated or have at least one dose in each county. Seeing that there is a correlation, we decided to create linear regression models that would be able to predict the percentage of vaccinated people given the median income of an area.
The coefficient of determination is 0.7402790757964923 The mean squared error is 25.511459853252248
The coefficient of determination is 0.6936244616227263 The mean squared error is 28.336745087480566
The coefficient of determination shows the proportion of outcomes that can be explained by the model. In order to reduce the mean squared error for the models, we decided to standardize the units by converting each measure to the number of standard deviations it is away from its mean.
The coefficient of determination is 0.7402790757964932 The mean squared error is 0.17516638882496108
The coefficient of determination is 0.6936244616227262 The mean squared error is 0.17316482475617587