HW: How to Assess Models
In this homework, we'll be looking at a dataset of the top 500 movies by production budget -- i.e. the 500 most expensive films ever made, as found on the film data website The Numbers. Original Kaggle dataset can be found here.
Set-up
Below, we read the dataset into Pandas, then normalize only the numerical columns. Check here for the documentation for sklearn preprocessing's normalize function.
0
1
2019-04-23
1
2
2011-05-20
2
3
2015-04-22
3
4
2015-12-16
4
5
2018-04-25
Normalization + Splitting into train & test datasets
Split the dataframe df into training and test sets using train_test_split. If you forgot how, check out the documentation! Fill in the blank below.
414
415
2002-11-15
116
117
2017-07-20
470
471
2019-11-14
263
264
2005-07-29
146
147
2014-07-09
Evaluation of a Regression Model
Here, we're going to train a regression model on the numerical columns of this dataset, to try and predict the Worldwide Gross Earnings of each movie. From there, we'll use evaluation methods for regression models that we learnt in lecture!
Below, we define the predictor and prediction columns in both the train and test datasets. X refers to the predictor dataset, and Y refers to the column we're trying to predict.
414
415
0.3641550787134771
116
117
0.14228890860322524
470
471
0.13272159458211324
263
264
0.5943802139205936
146
147
0.5957906382445246
Training a linear model
So we trained a model -- how can we visualize its performance on the test set?
Predict the Y values based on the train and test predictor sets (called X). Fill in the blanks below.
Evaluation Metrics
Other than visualizing the performance on the test set, we can quantify it. As we explained in class, there are different kinds of mean error we could be looking at: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, and R-squared. We'll focus on rMSE here.
Here's a function that calculates the rMSE for you:
Now use the function above to calculate the rMSE for the train and test sets. Fill in the blanks below. Use:
Y_train and Y_train_pred
Y_test and Y_test_pred
Training RMSE: 0.8219626331593956
Test RMSE: 0.3947338347287329
Looks like our model did better on the test set than the train set! That's great.
Evaluation of a Classification Model
Moving onto the application of error evaluation to a classification model. Here, we're going to train a classification model on this dataset, to try and predict the genre of each movie.
It looks like 42% of the movies in this dataset are Action movies. Maybe you just watched Top Gun Maverick, and you're looking for another movie in the action genre. Let's see whether we can predict whether a movie is in the action genre using this dataset.
We'll conduct logistic regression, which is a statistical model that models the probability of an event taking place. Here, the event would be if the movie in question is in the action genre.
Training a logistic regression model
What genre are we predicting? Fill in the blanks below.
414
415
0.8863119988913408
116
117
0.9185581546153849
470
471
0.9109333545099008
263
264
0.7604144454905883
146
147
0.7241572989888352
We've gotten an array of predictions: True for action movies; False for non-action movies.
Evaluation Metrics
Accuracy is defined as the number of correct predictions / the number of total predictions.
Check if the predictions of the X train/test sets are the same as the original Y values
Train accuracy: 0.5697
Test accuracy: 0.5929
But accuracy isn't everything. Let's look at a confusion matrix instead. Here's the documentation for the sklearn.metrics function.
Confusion matrix, without normalization
[[54 21]
[25 13]]
From the confusion matrix above, what is the number of false negatives?
54
Your answer:
Looks like both our accuracy and the confusion matrix indicate that our model is pretty bad at predicting whether a movie is in the action genre. The confusion matrix, however, indicates that most of that low accuracy is driven by labels that are wrongly predicted as 'False' when they are actually 'True' -- movies that are actually action movies but are not predicted as such.