Explaining the notebook
Importing the libraries
Reading the data
Data Preparation
Variable choice
Date parsing
Handling Missing data
Variable Continent
New cases and deaths and vaccinations variables
We worked in the following code with the assumption that the early numbers of new cases is zero because either at the time covid19 still hasn't appeared in that specific country or either it was undocumented but because it is only at the early stages of covid the numbers was low thus negligeable.
As for the number vaccination, the vaccin hasn't appeared very early and there is a difference in the time where each country received the vaccin, so any null value of vaccination number is immediatly a zero
We also assumed through observation that there is a generally small difference between the number of new cases and the number of new vaccinations in two consecutive days which is generally negligeable on a larger scale
Total cases and deaths variables
In the following code we replaced any null value in total_cases with the sum of the number of new cases in that day and the number of total cases of the day before.
However since we don't have the number of individuals who healed in that day, this is just a approximation, but it's not too far of an approximation.
And we do the same as new covid cases in case the date is early
For the number of new deaths we have decided not to launch the counter of new deaths for null values until the number of total cases reaches a certain level that we decided to make it 400 based on a empirical approximation.
And after we get passed that limit for each country we use the same approach as the new cases variable
For the total deaths and total vaccination variables we used the same approach as the one we use in the total cases variable
Reorganizing the Data
Starting from here we have decided to instead of working on daily data, to work on monthly data in order to accelerate and simplify the process
Thus we can conclude from the previous list that the countries don't have the same number dates, in other terms many countries are undocumented enough
These are the countrries that are undocumented enough
Thus we can say that we have only 75 countries that are documented enough to use thus we shall take in consideration these 75 countries
We can notice from these previous results that the countries with the 18 months have one month extra that is 01-2020, so we have 2 main choices, either we get rid of the first month for those 32 countries, or we add that month to the other 43 countries, but for every variable we will count it as Zero for that first month due to the fact that it has been the very first beggining of the pandemic, in other terms it hasn't spread far in countries beside Italy and China and USA, so the level of these variables in other countries low if not zero.
And now we make a dataframe containing these new data per month
And now we will apply these same modifications to the the continental data
We will combine all these modification in a function
And that's it for the data cleaning part we pass on the next part data analysis
ignore this part: I can't believe it took 4 days to do this entire part
Data Cleaning For Continental Data
In here we just applied the same changes that we did for our data to the continents and World data
Data Analysis
For all Countries
Irregular countries
Based on the previous graphs we notice that the countries that had the most variation in this pandemic are the US, India, UK, Mexico, Brazil
Thus, starting from now we will be working with only 5 countries: US, India, UK, Mexico, Brazil
In here irregular stands for countries with the most variations and that are different on the scale of variation from other countries.
Generally in all of these 5 countries the number of total cases keeps on increasing every month at different rates at an alarming rate every month
However starting from the month of May those numbers started to decrease in all these countries most likely due to the start of the vaccination process in these countries
in this graph, we are comparing the number of total cases per population that could translate the real effect and damage that each country is suffering from due to the pandemic.
Thus the most countries that are suffering are the US first, Brazil and UK that rank second and third interchangeably, followed by Mexico and India.
We can notice in the previous graph that the evolution of the number of total deaths follows almost the same pattern although the numbers are not as high as the number of total cases
In this previous plot it is very obvious that covid 19 had a huge effect on these countries, due to the great increase of the number of deaths per cases since February 2020 but
For these countries the vaccinations has started since almost december and the number of these vaccinations are increasing with an almost exponential rate
However all of these countries could be described as irregular due to the fact that their variation rates are much greater than the those in almost countries, so these countries do not constitute an accurate representation of the world's countries.
That's why we are going to work with countries that are more representative of this rest of world
Regular Countries
For the regular Countries we are going to work with 5 countries that are Morocco, Austria, Egypt, Ireland, Belgium
In this graph we can notice that the number of total cases keeps increasing but at 3 different paces
For Belgium, we have a relatively fast increasing pace of the number of total cases but not as much as the irregular countries but still faster than the other countries.
For Morocco and Austria, the increase rate is medium so the propagation pace is not very big in these countries
For Egypt and Ireland, we can notice a lower increasing pace thus the number of total cases in these countries is relatively low.
In here, the rapidly increasing total number of cases in Belgium translates to a similar effect in that country however Ireland despite lower numbers and a lower increasing rate suffered much more than expected as well as Austria that ranks second in the matter effect followed by Morocco.
And as expected Egypt did not suffer as much because of their low number of total cases generally.
Total_cases_per_population vs Total_cases
The number of total cases is a more accurate measure to judge the evolution as well as the effect of covid in each country.
It seems that generally in most countries, covid has been in it's peak in the period between the months of October 2020 and January 2021. (1)
In this previous chart, we can confirm that out of these countries Belgium is the most affected by the pandemic and is weirdly enough followed by Egypt that is supposedly the least affected in all of these countries then comes Morocco and Austria interchangebly and finally Ireland that is also weird enough. So what is the reason for this?
Let's take a look at the next graph: Total deaths per cases
In this graph obviously Belgium is first out of these countries, and then comes egypt and Ireland, followed by Morocco and Austria.
We can note from this graph that these countries as well as most countries were severly affected by the pandemic at it's early stages but then managed to get things under control.
Now back to our previous question that we are going to answer in the upcoming segment.
Total Cases Per Population VS Total Deaths Per Cases
These two variables give us the effect of the pandemic on each countries but how are they different and how can explain the previous oddities.
So the Total Cases per Population gives us the global effect of covid 19 on the country
However the Total Deaths Per Cases variable when combined Total Cases Per Population gives us the level of healthcare in that country, for example when the Total Cases Per Population is low but the Total Deaths Per Cases is high we can conclude the healthcare level of the country and the reverse is also True.
Conclusion
Thus, a good general estimate of the effect of covid is the Total Cases Per Population combined with the Healthcare Level, in other terms it is a combination of the Total Cases Per Population variable combined with the combination of the Total Cases Per Population and Total Deaths Per Cases variables.
As we can see the number of total vaccinations in the regular countries is relatively delayed compared to the irregular countries and it is also much lower compared to them.
Continental and World data
This chart represents the evolution of the number of total cases of covid all around the globe and in each continent, and it follows the same results in the previous cases.
Analysis Conclusion
Thus we can say that these continental data follows almost the same pattern that we have found in the previous data.