Unlisted

COVID-19 Data Research 

by Charles GagnonJan 5, 2021
3 likes4 duplicates
Share
Twitter iconTwitter
Facebook iconFacebook
Email
Copy link
Save as PDF
    1. COVID-19 by the numbers
    2. Intro
    3. Step 1 - cleanup
    4. Step 2 - Organize
    5. Step 3 - Global Data Plots
    6. Step 4 - Top 10 Countries by "Total Confirmed Cases"
    7. Step 5 - Comparing Countries
    8. Step 6 - All indicators for the US
    9. Step 7 - Daily new cases
    10. Step 8 - GROWTH

COVID-19 by the numbers

Original idea for this Notebook came from https://towardsdatascience.com/analyzing-coronavirus-covid-19-data-using-pandas-and-plotly-2e34fe2c4edc. I made several modifications, changed contries and customized my own plots. It's been my go-to personal dashboard to track COVID progression. Depending on the news and current trends, I modify the countries I'm looking at.

The last chart, displaying new confirmed cases of COVID-19 in the past week vs. the total confirmed cases to date was heavily inspired by work done at https://aatishb.com/covidtrends/. When plotted in this way, exponential growth is represented as a straight line that slopes upwards. Notice that almost all countries follow a very similar path of exponential growth.

!python --version

Intro

First we must download the csvs from CCSE at John Hopkins. From this download, we get:

  • Dates (as columns)
  • Total Confirmed Cases
  • Province
  • Countries
  • Latitude and Longitude
## Import Libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from pandas.plotting import register_matplotlib_converters register_matplotlib_converters() #%matplotlib inline import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots plt.rcParams['figure.figsize'] = [15, 5] from IPython import display from ipywidgets import interact, widgets ## Read Data for Cases, Deaths and Recoveries ConfirmedCases_raw=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv') Deaths_raw=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

Step 1 - cleanup

We use the pandas melt function to reshape the dates in one column and we setup indices on dates and countries. The datsets are now cleaner and much easier to plot.

### Melt the dateframe into the right shape and set index def cleandata(df_raw): df_cleaned=df_raw.melt(id_vars=['Province/State','Country/Region','Lat','Long'],value_name='Cases',var_name='Date') df_cleaned=df_cleaned.set_index(['Country/Region','Province/State','Date']) return df_cleaned # Clean all datasets ConfirmedCases=cleandata(ConfirmedCases_raw) Deaths=cleandata(Deaths_raw) #Recoveries=cleandata(Recoveries_raw)

Step 2 - Organize

We create a consolidated dataset that combines the datasets for Cases, Deaths and Recoveries. We also created a function to get daily count from the cumulative data. We use the following code for the processing. Finaly, the Consolidated dataset has the index as country and date and each of the metric as a column.

All the "Recoveries" related content was removed since the update in the date structure.

### Get Countrywise Data def countrydata(df_cleaned,oldname,newname): df_country=df_cleaned.groupby(['Country/Region','Date'])['Cases'].sum().reset_index() df_country=df_country.set_index(['Country/Region','Date']) df_country.index=df_country.index.set_levels([df_country.index.levels[0], pd.to_datetime(df_country.index.levels[1])]) df_country=df_country.sort_values(['Country/Region','Date'],ascending=True) df_country=df_country.rename(columns={oldname:newname}) return df_country ConfirmedCasesCountry=countrydata(ConfirmedCases,'Cases','Total Confirmed Cases') DeathsCountry=countrydata(Deaths,'Cases','Total Deaths') #RecoveriesCountry=countrydata(Recoveries,'Cases','Total Recoveries') ### Get DailyData from Cumulative sum def dailydata(dfcountry,oldname,newname): dfcountrydaily=dfcountry.groupby(level=0).diff().fillna(0) dfcountrydaily=dfcountrydaily.rename(columns={oldname:newname}) return dfcountrydaily NewCasesCountry=dailydata(ConfirmedCasesCountry,'Total Confirmed Cases','Daily New Cases') NewDeathsCountry=dailydata(DeathsCountry,'Total Deaths','Daily New Deaths') #NewRecoveriesCountry=dailydata(RecoveriesCountry,'Total Recoveries','Daily New Recoveries') CountryConsolidated=pd.merge(ConfirmedCasesCountry,NewCasesCountry,how='left',left_index=True,right_index=True) CountryConsolidated=pd.merge(CountryConsolidated,NewDeathsCountry,how='left',left_index=True,right_index=True) CountryConsolidated=pd.merge(CountryConsolidated,DeathsCountry,how='left',left_index=True,right_index=True) #CountryConsolidated=pd.merge(CountryConsolidated,RecoveriesCountry,how='left',left_index=True,right_index=True) #CountryConsolidated=pd.merge(CountryConsolidated,NewRecoveriesCountry,how='left',left_index=True,right_index=True) #CountryConsolidated['Active Cases']=CountryConsolidated['Total Confirmed Cases']-CountryConsolidated['Total Deaths']-CountryConsolidated['Total Recoveries'] #CountryConsolidated['Share of Recoveries - Closed Cases']=np.round(CountryConsolidated['Total Recoveries']/(CountryConsolidated['Total Recoveries']+CountryConsolidated['Total Deaths']),2) CountryConsolidated['Death to Cases Ratio']=np.round(CountryConsolidated['Total Deaths']/CountryConsolidated['Total Confirmed Cases'],3)

Step 3 - Global Data Plots

We plot:

  • Confirmed Cases
  • Active Cases
  • Deaths
  • Recoveries
  • and the "Death to Case" ratio

All using global data combined.

## Get totals for all metrics GlobalTotals=CountryConsolidated.reset_index().groupby('Date').sum() #GlobalTotals['Share of Recoveries - Closed Cases']=np.round(GlobalTotals['Total Recoveries']/(GlobalTotals['Total Recoveries']+GlobalTotals['Total Deaths']),2) GlobalTotals['Death to Cases Ratio']=np.round(GlobalTotals['Total Deaths']/GlobalTotals['Total Confirmed Cases'],3) GlobalTotals.tail(2) # Create Plots that show Key Metrics For the Covid-19 chartcol='red' fig = make_subplots(rows=2, cols=2,shared_xaxes=True, specs=[[{}, {}],[{"colspan": 2}, None]], subplot_titles=('Total Confirmed Cases','Deaths','Death to Cases Ratio')) fig.add_trace(go.Scatter(x=GlobalTotals.index,y=GlobalTotals['Total Confirmed Cases'], mode='lines+markers', name='Confirmed Cases', line=dict(color=chartcol,width=2)), row=1,col=1) fig.add_trace(go.Scatter(x=GlobalTotals.index,y=GlobalTotals['Total Deaths'], mode='lines+markers', name='Deaths', line=dict(color=chartcol,width=2)), row=1,col=2) fig.add_trace(go.Scatter(x=GlobalTotals.index,y=GlobalTotals['Death to Cases Ratio'], mode='lines+markers', line=dict(color=chartcol,width=2)), row=2,col=1) fig.update_layout(showlegend=False)

Step 4 - Top 10 Countries by "Total Confirmed Cases"

From the global data, we sort the 10 worst countries in terms of number of cases.

TotalCasesCountry=CountryConsolidated.max(level=0)['Total Confirmed Cases'].reset_index().set_index('Country/Region') TotalCasesCountry=TotalCasesCountry.sort_values(by='Total Confirmed Cases',ascending=False) TotalCasesCountryexclChina=TotalCasesCountry[~TotalCasesCountry.index.isin(['Others'])] Top10countriesbycasesexclChina=TotalCasesCountryexclChina.head(10) TotalCasesCountrytop10=TotalCasesCountry.head(10) fig = go.Figure(go.Bar(x=Top10countriesbycasesexclChina.index, y=Top10countriesbycasesexclChina['Total Confirmed Cases'], text=Top10countriesbycasesexclChina['Total Confirmed Cases'], textposition='outside')) fig.update_layout(title_text='Top 10 Countries by Total Confirmed Cases') fig.update_yaxes(showticklabels=False) fig.show()

Step 5 - Comparing Countries

We compare data from 8 countries in terms of Total Confirmed Cases.

ChinaFirstCase=CountryConsolidated.loc['China']['Total Confirmed Cases'].reset_index().set_index('Date') SpainFirstCase=CountryConsolidated.loc['Spain']['Total Confirmed Cases'].reset_index().set_index('Date') ItalyFirstCase=CountryConsolidated.loc['Italy']['Total Confirmed Cases'].reset_index().set_index('Date') SKFirstCase=CountryConsolidated.loc['Korea, South']['Total Confirmed Cases'].reset_index().set_index('Date') IranFirstCase=CountryConsolidated.loc['Iran']['Total Confirmed Cases'].reset_index().set_index('Date') GermanyFirstCase=CountryConsolidated.loc['Germany']['Total Confirmed Cases'].reset_index().set_index('Date') SingaporeFirstCase=CountryConsolidated.loc['Singapore']['Total Confirmed Cases'].reset_index().set_index('Date') USFirstCase=CountryConsolidated.loc['US']['Total Confirmed Cases'].reset_index().set_index('Date') CanadaFirstCase=CountryConsolidated.loc['Canada']['Total Confirmed Cases'].reset_index().set_index('Date') ChileFirstCase=CountryConsolidated.loc['Chile']['Total Confirmed Cases'].reset_index().set_index('Date') BrazilFirstCase=CountryConsolidated.loc['Brazil']['Total Confirmed Cases'].reset_index().set_index('Date') IndiaFirstCase=CountryConsolidated.loc['India']['Total Confirmed Cases'].reset_index().set_index('Date') ChinaGrowth=ChinaFirstCase[ChinaFirstCase.ne(0)].dropna().reset_index() SpainGrowth=SpainFirstCase[SpainFirstCase.ne(0)].dropna().reset_index() ItalyGrowth=ItalyFirstCase[ItalyFirstCase.ne(0)].dropna().reset_index() SKGrowth=SKFirstCase[SKFirstCase.ne(0)].dropna().reset_index() IranGrowth=IranFirstCase[IranFirstCase.ne(0)].dropna().reset_index() GermanyGrowth=GermanyFirstCase[GermanyFirstCase.ne(0)].dropna().reset_index() SingaporeGrowth=SingaporeFirstCase[SingaporeFirstCase.ne(0)].dropna().reset_index() USGrowth=USFirstCase[USFirstCase.ne(0)].dropna().reset_index() CanadaGrowth=CanadaFirstCase[CanadaFirstCase.ne(0)].dropna().reset_index() ChileGrowth=ChileFirstCase[ChileFirstCase.ne(0)].dropna().reset_index() BrazilGrowth=BrazilFirstCase[BrazilFirstCase.ne(0)].dropna().reset_index() IndiaGrowth=IndiaFirstCase[IndiaFirstCase.ne(0)].dropna().reset_index()
fig = make_subplots(rows=4, cols=2,shared_xaxes=True, subplot_titles=('US','Canada','Spain','Italy','Chile','Brazil','China','India',)) fig.update_xaxes(title_text="Number of Days since Outbreak", row=4, col=1) fig.update_xaxes(title_text="Number of Days since Outbreak", row=4, col=2) fig.add_trace(go.Scatter(x=USGrowth.index,y=USGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=1,col=1) fig.add_trace(go.Scatter(x=CanadaGrowth.index,y=CanadaGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=1,col=2) fig.add_trace(go.Scatter(x=SpainGrowth.index,y=SpainGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=2,col=1) fig.add_trace(go.Scatter(x=ItalyGrowth.index,y=ItalyGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=2,col=2) fig.add_trace(go.Scatter(x=ChileGrowth.index,y=ChileGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=3,col=1) fig.add_trace(go.Scatter(x=BrazilGrowth.index,y=BrazilGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=3,col=2) fig.add_trace(go.Scatter(x=ChinaGrowth.index,y=ChinaGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=4,col=1) fig.add_trace(go.Scatter(x=IndiaGrowth.index,y=IndiaGrowth['Total Confirmed Cases'], mode='lines+markers', name='Active Cases', line=dict(color=chartcol,width=2)), row=4,col=2) fig.update_layout(showlegend=False)

Step 6 - All indicators for the US

We revisit our key indicators:

  • Confirmed Cases
  • Active Cases
  • Deaths
  • Recoveries
  • and the "Death to Case" ratio

but focusing our plots on the US only.

def plotcountry(Country): fig = make_subplots(rows=3, cols=2,shared_xaxes=True, specs=[[{}, {}],[{"colspan": 2}, None],[{"colspan": 2}, None]], subplot_titles=('Total Confirmed Cases','Deaths','Death to Cases Ratio', 'Daily New Cases')) fig.add_trace(go.Scatter(x=CountryConsolidated.loc[Country].index,y=CountryConsolidated.loc[Country,'Total Confirmed Cases'], mode='lines+markers', name='Confirmed Cases', line=dict(color=chartcol,width=2)), row=1,col=1) fig.add_trace(go.Scatter(x=CountryConsolidated.loc[Country].index,y=CountryConsolidated.loc[Country,'Total Deaths'], mode='lines+markers', name='Total Deaths', line=dict(color=chartcol,width=2)), row=1,col=2) fig.add_trace(go.Scatter(x=CountryConsolidated.loc[Country].index,y=CountryConsolidated.loc[Country,'Death to Cases Ratio'], mode='lines+markers', name='Death to Cases Ratio', line=dict(color=chartcol,width=2)), row=2,col=1) CountryNewCases = CountryConsolidated.loc[Country]['Daily New Cases'].reset_index().set_index('Date') fig.add_trace(go.Bar(x=CountryNewCases.index,y=CountryNewCases['Daily New Cases'], name='Daily New Cases', ), row=3,col=1) fig.update_layout(showlegend=False, title_text="Data for %s" % Country) return fig plotcountry('US')
plotcountry('Sweden')
plotcountry('United Kingdom')

Step 7 - Daily new cases

One metric important to track for the recovery IMO, is the number of new cases per day.

# A list with countries of interest countries = ['US','Canada', 'Spain', 'Italy', 'Korea, South', 'Sweden', 'Denmark', 'Chile', 'Iran', 'India', 'United Kingdom', 'Japan'] rowsTot = int(len(countries)/2) fig = make_subplots(rows=rowsTot, cols=2,shared_xaxes=True, subplot_titles=(countries)) fig.update_xaxes(title_text="Daily New Cases", row=rowsTot, col=1) fig.update_xaxes(title_text="Daily New Cases", row=rowsTot, col=2) # Where we start rowNum=1 colNum=1 for country in countries: CountryNewCases = CountryConsolidated.loc[country]['Daily New Cases'].reset_index().set_index('Date') fig.add_trace(go.Bar(x=CountryNewCases.index,y=CountryNewCases['Daily New Cases'], name='Daily New Cases', ), row=rowNum,col=colNum) # Col numbers flips between 1 and 2, Rows increase by one if colNum % 2 == 1: colNum += 1 else: colNum = 1 rowNum += 1 fig.update_layout(showlegend=False)