HW: How to Assess Models
In this homework, we'll be looking at a dataset of the top 500 movies by production budget -- i.e. the 500 most expensive films ever made, as found on the film data website The Numbers. Original Kaggle dataset can be found here.
Set-up
Below, we read the dataset into Pandas, then normalize only the numerical columns. Check here for the documentation for sklearn preprocessing's normalize function.
0
1
2019-04-23
1
2
2011-05-20
2
3
2015-04-22
3
4
2015-12-16
4
5
2018-04-25
Normalization + Splitting into train & test datasets
Split the dataframe df into training and test sets using train_test_split. If you forgot how, check out the documentation! Fill in the blank below.
323
324
2012-08-03
454
455
2015-12-02
304
305
2014-11-19
158
159
2016-10-24
337
338
2013-08-09
Evaluation of a Regression Model
Here, we're going to train a regression model on the numerical columns of this dataset, to try and predict the Worldwide Gross Earnings of each movie. From there, we'll use evaluation methods for regression models that we learnt in lecture!
Below, we define the predictor and prediction columns in both the train and test datasets. X refers to the predictor dataset, and Y refers to the column we're trying to predict.
323
324
0.3688333976257265
454
455
0.5648108154352158
304
305
0.2822079057324572
158
159
0.6207122350990869
337
338
0.17841371083523078
Training a linear model
So we trained a model -- how can we visualize its performance on the test set?
Predict the Y values based on the train and test predictor sets (called X). Fill in the blanks below.
Evaluation Metrics
Other than visualizing the performance on the test set, we can quantify it. As we explained in class, there are different kinds of mean error we could be looking at: Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, and R-squared. We'll focus on rMSE here.
Here's a function that calculates the rMSE for you:
Now use the function above to calculate the rMSE for the train and test sets. Fill in the blanks below. Use:
Y_train and Y_train_pred
Y_test and Y_test_pred
Training RMSE: 0.7113987040280376
Test RMSE: 0.5983994387977111
Looks like our model did better on the test set than the train set! That's great.
Evaluation of a Classification Model
Moving onto the application of error evaluation to a classification model. Here, we're going to train a classification model on this dataset, to try and predict the genre of each movie.
It looks like 42% of the movies in this dataset are Action movies. Maybe you just watched Top Gun Maverick, and you're looking for another movie in the action genre. Let's see whether we can predict whether a movie is in the action genre using this dataset.
We'll conduct logistic regression, which is a statistical model that models the probability of an event taking place. Here, the event would be if the movie in question is in the action genre.
Training a logistic regression model
What genre are we predicting? Fill in the blanks below.
323
324
0.8796433441428424
454
455
0.6237161006708474
304
305
0.9085926562504877
158
159
0.6703628316143388
337
338
0.8695476536480065
We've gotten an array of predictions: True for action movies; False for non-action movies.
Evaluation Metrics
Accuracy is defined as the number of correct predictions / the number of total predictions.
Check if the predictions of the X train/test sets are the same as the original Y values
Train accuracy: 0.1262
Test accuracy: 0.1544
But accuracy isn't everything. Let's look at a confusion matrix instead. Here's the documentation for the sklearn.metrics function.
Confusion matrix, without normalization
[[70 13]
[56 10]]
From the confusion matrix above, what is the number of false negatives?
Your answer:
Looks like both our accuracy and the confusion matrix indicate that our model is pretty bad at predicting whether a movie is in the action genre. The confusion matrix, however, indicates that most of that low accuracy is driven by labels that are wrongly predicted as 'False' when they are actually 'True' -- movies that are actually action movies but are not predicted as such.