Load the vaccination data and convert the dates into the correct data type.
import pandas as pd va = pd.read_csv('us_state_vaccinations.csv', parse_dates = ['date'] ) va.info()
The datatype of date is changed from this part.
How many people are vaccinated in Massachusetts from 1/12/2021 to 6/4/2021?
c =  d =  def func(state, column_name): for i in range(len(va[column_name])): if va['location'][i] == state: c.append(va[column_name][i]) d.append(va['date'][i]) i+=1 s = pd.Series(data = c, index = d) return s ser = func('Massachusetts', 'people_vaccinated') ser.index.name = 'Date' ser = pd.DataFrame(ser, columns = ['People_Vaccinated']) ser
How the number of people who are vaccinated varies in specific dates?
import matplotlib.pyplot as plt ser.plot() plt.xlabel('Date') plt.ylabel('People Vaccinated') plt.title('Date-People Vaccinated') plt.show()
The line plot shows that the number of people vaccinated increases steadily in the first half of 2021.
import numpy as np from scipy.optimize import curve_fit def our_model (x, β0, β1, β2): return β0 / (1 + np.exp(β1 * (-x + β2))) try: sernew = ser.dropna() xs = np.arange(len(sernew['People_Vaccinated'])) ys = sernew['People_Vaccinated'].values my_guessed_betas = [sernew['People_Vaccinated'].max(), 1, len(sernew['People_Vaccinated']) / 2] found_betas, covariance = curve_fit(our_model, xs, ys, p0=my_guessed_betas) β0, β1, β2 = found_betas print(β0, β1, β2) except: print('None')
After the equation of the model is defined, the pre-specified serie is used as instance to apply curve fit function to find the ideal beta values of the model to the Massachusetts. The try statement is used to avoid error message of some states in case the model does not find the optimal beta values.
guess_model = lambda x:our_model( x, β0, β1, β2) plt.plot(xs, ys) many_xs = np.linspace(0, len(sernew['People_Vaccinated'])) plt.plot(many_xs, guess_model(many_xs)) plt.xlabel('Days since 1/12/2021') plt.ylabel('# of people vaccinated in hundred thousand') plt.title('Number of people vaccinated in Massachusetts and logistic model') plt.show()
The graph of the model is illustrated with the original dataset for comparing and contrasting purpose.
beta =  s = pd.read_csv('State.csv') for i in s['US STATE']: w = func(i, 'people_vaccinated') w = pd.DataFrame(w, columns = ['People_Vaccinated']) wnew = w.dropna() try: x = np.arange(len(wnew['People_Vaccinated'])) y = wnew['People_Vaccinated'].values my_guessed_betas = [wnew['People_Vaccinated'].max(), 1, len(wnew['People_Vaccinated']) / 2] found_betas, covariance = curve_fit(our_model, x, y, p0=my_guessed_betas) β0, β1, β2 = found_betas beta.append([i, β0, β1, β2]) except: beta.append('') df = pd.DataFrame(beta, columns = ['State', 'maximum number of vaccinations that will arise long term', 'measure of the rate of vaccination', 'the time of maximum increase']) df
Then, the same process is performed to every states to find the beta values of corresponding states.
s = pd.read_csv('State.csv') p = pd.read_csv('Presidency 2016.csv') new = pd.merge(df, s, left_on = 'State', right_on = 'US STATE') nnew = pd.merge(new, p, left_on = 'ABBREVIATION', right_on = 'State') nnew
The dataframe with optimal beta values of each state is combined with two new dataframe for future use.
del nnew['State_x'] del nnew['US STATE'] del nnew['ABBREVIATION'] del nnew['State_y'] nnew
The columns with datatype of str or others are deleted since the correlation and heatmap will only porcess the data with numbers.
import seaborn as sns correlation_coefficients = np.corrcoef(nnew, rowvar=False ) sns.heatmap( correlation_coefficients, annot=True ) plt.show()
The correlation coefficiency between three beta values and the president supported by states are calculated and displayed in form of heatmap.
from scipy import stats alpha = 0.05 statistic, pvalue = stats.ttest_ind(nnew['maximum number of vaccinations that will arise long term'], nnew['Donald Trump'], equal_var = False) pvalue < alpha
Among all paris between beta values we calculated and political standing of 2016 president election, the one with greatest correlation is between the maximum number of vaccinations that will rise in long term and the voting for President Donald Trump. Then, we perform a hypothesis test on that. The result pvalue is less than the significant level so we are confident to reject the null hypothesis.