Background
- Stewie, a filthy rich Casino mogul, hires us to analyze his step count and predict the final amount of steps for the year 2020.
- If we can accurately predict his final step count, we will be paid $100,000.
- Stewie is a health nut and wears both a Garmin watch and Fitbit to count his daily steps.
Plan of Attack
- Define what we need.
- Access the data.
- Clean the data.
- Analyze and Visualize the data.
- Make future predictions using algorithmic models (Data Science).
Access and Clean Data
Import Modules.
Import each Json file and combine everything into one DataFrame.
What does the data look like?
There are a lot of features - but we only care about totalSteps and calendarDate. Let's pull out those fields, convert to a proper date format and set date to index to allow for time series indexing operations.
Data Exploration and Visualization
Let's explore the data visually.
Let's further smooth the data by cutting it on week and month.
Variation of Data: Histograms and Box/Whisker Plots
Differences between days of week?
Predicting the Future
How many Steps by the end of 2020?
Transform data to show a running sum by day, for the year 2020. Use this cumulative sum column to make a prediction using a simple linear regression. The image below shows the general concept - create a 'best fit' line based on data points in order to predict future values.
Let's manipulate the data to get a clean cumulative step total for the year 2020.