Pattern Recognition Mini Project 1
Part-1 Data Analysis
- Delete bad values related to
Of what data type are the features?
Are there fields that do not contain a value or are ’NaN’?
There are no values in line 1520. Deleting the line...
Unique values exist in each column (feature)
Some values have problems:
- Intermediate values in
- Weird value for
- Intermediate values in
- 1970 value in
How often is each app running?
Cleaning some columns
We can clear apps which are running
<1% of the time because they are a part of OS or they don't give any information about the battery usage.
How often is the battery change positive / negative?
Total Application Usage on each time slot
Predictions on Battery Status
battery_status = 0: No battery status & battery plugged information for that state.
battery_status = 1: Never happened. Maybe the battery is dead?
battery_status = 2: The battery is charging.
battery_status = 3: The battery is discharging.
battery_status = 4: The battery is full?
battery_status = 5: The battery is NOT charging. (bad USB cable? battery is already full?)
2 < battery_status < 3: User is using the phone while charging? Or user charged the phone for 15mins and used for 15mins.
3 < battery_status < 4: User is using the phone while charging?
4 < battery_status < 5: Charging completed early in the timeslot so it is not charging in the last minutes for the same slot?
Analyze The Time
Year 1970 value at line 2407 is clearly an outlier so we drop it.
Using 3 different rows is difficult for time calculations. Let's create a time column for this task.
Analyze The Battery
Look at the histogram for avg change in battery level in the both cases. It should be an error if battery level increases without battery is plugged in normal cases. But if
battery_status = 0 then (according to our assumptions) it is possible that we don't know if the battery is plugged or not.
On the other hand,
battery_plugged feature creates a biased noise in
battery_level, because it is difficult to measure an app's battery usage while charging.
Our model should predict battery usage between two different time points. Predicting charging times would be confusing to user, since users can go outside of their habits (like going to musical, concert, party etc. after work). In this viewpoint, telling to user that the phone's battery will run out at 01.00 assuming the user will charge the phone before leaving the work would be problematic.
According to our assumptions, values 2, 4, 5 are charging status of the battery. In addition to that, if the battery is plugged the phone should be charging, so feeding these data to our model would make the model predict that some app's behaviour are not consuming the battery.
And lastly for the charging case, it is needed that if the battery level goes up the battery is charging actually. So we should clean those columns too.
After these drops,
battery_plugged columns will not include significant information, so we can drop those.
The rows with faulty battery status or in which battery_plugged is true, is useful for prediction of user profile.
So we save this in another dataframe and use this to train our Markov Chain Model