Project Proposal
This project aim to study the time series of the Earth's temperature in order to address the issue of global warming. The study of the earth's temperature is a time series depending on many physical variables, and each of these physical variables is a function of time. The following project we want to use the ARIMA (Autoregressive integrated moving average) method to study the behavior of the Earth's temperature.
Libraries
After Discovering its hard to distinguish the temperature difference between heat plot of two periods, we decide to directly plot the temperature difference.
Yearly land average temperature show a high volatility before 1950. But since about 1950, there is an obvious increasing trend, which is also corresponded to the trend of carbon dioxide in the figure before. From this figure, we can know that since the industrial revolution, there is a reasonable evidence of the global warming.
This figure shows with the improvement of detecting techonology, the uncertainty of land average temperature shows a obvious increase.
This figure shows the seansonal trend of global average temperature. The highest temperature happens on July, and the lowest average temperature occoured in January.
Through the seasonal_decompose package, the trend and seasonal figure prove our conclusion before. 1. the increasing trend of global average temperature. 2. the seansonal trend of land temperature.
Drop NA value for mahcine learning model training.
Before modeling, we first check whether it is stationary. Stationary means mean, variance, covariance is constant over periods. From the seasonal plot, we can observe the temperature is roughly the same in each period.
The acf plot shows the correlation between the same series with a time lag. When we have one lag the correlation factor is 1, two lag is 0.74...... As we can see the correlation slowly decay, so it is not stationary. It should ideally decay immediately to 0 or negative, showing that the data is stationary.
The correlation decay to 0, so it is good stationary.
We can see there is no trend here. It is stationary. This data is good enough to move forward to modeling.
Find the best parameter p d q of Arima model
Print out every combination of the parameter. AIC is the score of every model, and we want to minimize the AIC.
ARIMA(5,1,5)(0,0,0)[0] intercept : AIC=1760.698, Time=11.88 sec
We find the best parameter is 5 1 5
We can see there is no seasonality (0,0,0) in dataset, we can use this Arima model.
Train the Model
Make prediction on start - end
we need to keep date since we want to make plot in future
Plot Result
We plot the prediction and actual data in the same graph, we can see they are pretty close.
Calculate the Error
The error is pretty small, so the model did good prediction
Predict for future date
Now we want to use the model trained to predict for the future.
We use this model to predict for the next 40 years
The prediction model did good prediction up to year 2017. After year 2017, the time lag is too long and model doesn't work well. So in the future, we would try to figure out how to predict for long period of time.
Based on our current prediction, in year 2017, the average temperature will increase from 9.560 to 9.574.
For comparing the predicting accuracy better, we employed another time series model - Prophet.
We build the prophet model first, and let porphet do the same job as the ARIMA did before.
The MSE of Prophet is 0.1141353608465227
We used Prophet tot predict the temperature between Jan, 2013 to Dec, 2015 (Same as the work in ARIMA). Compare the predict data and the ture data, Prophet hold a accuracy as 0.1141353608465227 MSE. It's reasonable to believe that under such a big size data set, the prediction of Prophet got a good result.
Compare this value with the accuracy in ARIMA (MSE with 0.15716738194270644), Prophet got a bit higher accuracy.
We built the time series trend of global avg temperature in the actual data and the Prophet predict data.
From the first figure, it can be observed that the Porphet could capture the chaning trend of global temperature well. There is no obvious gap between the predict and the true value.
From the second figure, we additionally forecast the global land average temperature in 2016. The seasonal trend of temperaturein 2016 also be captured well.
In this project, we made the heatmap of the global land avg temperature. From that, we found that there was an obvious increase of global temperature since the industrial revolution. And we also made the time series plot and the trend plot of land avg temperature to prove this point. After data processing, we employed the ARIMA model and Prphet model to predict the temperature from 2013 to 2015. Through comparing the true value, both model achieved a good result with MSE lower than 0.2. The accuracy in Prophet is a bit higher, but both result were reasonable and acceptable. Additionally, we used the two models to forecast the temperature in 2016. The results are exciting because both two models could capture the seasonal trend well in 2016.
Furthermore, we employed the ARIMA to forecast the global avg temperature until 2055. ARIMA captured a basic increasing trend before about 2016, but after that, ARIAM cannot capture the changing trend anymore. Which means our model may not have a good ability for long term forecasting. This is an unresolved problem and may need our further research.
And also for futher research, maybe combine different machine learning models may help us to get a better predicting accuracy.