Uncovering the Best Star Wars Movie
In this project, we'll be analyzing Star Wars survey results to learn which Star Wars movie is the best of the bunch.
Approach
To come to a decision, we'll find:
- The best-ranked movie from the survey
- The most-viewed movie from the survey
Results
From our analysis, we've learned that the best-ranked and most-viewed movies were Star Wars: Episode V The Empire Strikes Back
and Star Wars: Episode VI Return of the Jedi
respectively.
Data Overview
We'll be using a dataset of 1,187 Star Wars survey responses collected by FiveThirtyEight.
The dataset has 38 columns, with columns such as:
RespondentID
: An anonymized ID for the respondent (person taking the survey)Gender
: The respondent's genderAge
: The respondent's ageHousehold Income
: The respondent's incomeEducation
: The respondent's education levelLocation (Census Region)
: The respondent's locationHave you seen any of the 6 films in the Star Wars franchise?
: Has a Yes or No responseDo you consider yourself to be a fan of the Star Wars film franchise?
: Has a Yes or No response
We can make the following observations:
- Responses related to one column are also spread across multiple columns. For example, the
Which of the following Star Wars films have you seen? Please select all that apply.
column's related answers are in the following columns:Unnamed: 4
Unnamed: 5
Unnamed: 6
Unnamed: 7
Unnamed: 8
- There are a number of columns which don't help us find the best movie.
Data Cleaning
Drop Columns
We'll drop the survey response columns which don't give us movie information:
Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.
and related answer columnsUnnamed: 16
toUnnamed: 28
.Which character shot first?
.Are you familiar with the Expanded Universe?
.Do you consider yourself to be a fan of the Expanded Universe?æ
Do you consider yourself to be a fan of the Star Trek franchise?
Convert Yes/No to Booleans
Let's map values in the following columns to be True
/False
instead of Yes
/No
:
Have you seen any of the 6 films in the Star Wars franchise?
Do you consider yourself to be a fan of the Star Wars film franchise?
Cleaning Checkbox Columns
Movie viewing
The next six columns represent a single checkbox question of the movies the respondent has seen:
Which of the following Star Wars films have you seen? Please select all that apply.
: Whether or not the respondent sawStar Wars: Episode I The Phantom Menace
.Unnamed: 4
: Whether or not the respondent sawStar Wars: Episode II Attack of the Clones
.Unnamed: 5
: Whether or not the respondent sawStar Wars: Episode III Revenge of the Sith
.Unnamed: 6
: Whether or not the respondent sawStar Wars: Episode IV A New Hope
.Unnamed: 7
: Whether or not the respondent sawStar Wars: Episode V The Empire Strikes Back
.Unnamed: 8
: Whether or not the respondent sawStar Wars: Episode VI Return of the Jedi
.
For each of these columns, if the value in a cell is the name of the movie, it means the respondent saw that movie.
If the value is NaN
, we'll assume they didn't see the movie.
We'll rename the columns and map them to boolean values.
Movie ranking
The next six columns ask the respondent to rank the movies in order of most favourite (1
) to least favourite (6
). Similarly the responses have been spread across columns:
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.
: How much the respondent likedStar Wars: Episode I The Phantom Menace
.Unnamed: 10
: How much the respondent likedStar Wars: Episode II Attack of the Clones
.Unnamed: 11
: How much the respondent likedStar Wars: Episode III Revenge of the Sith
.Unnamed: 12
: How much the respondent likedStar Wars: Episode IV A New Hope
.Unnamed: 13
: How much the respondent likedStar Wars: Episode V The Empire Strikes Back
.Unnamed: 14
: How much the respondent likedStar Wars: Episode VI Return of the Jedi
.
Here, we'll just need to convert each column to a numeric type and then rename them.
Data Analysis
Before analysing, we should segment our data to be those who consider themselves a fan of the franchise:
Best-Ranked Movie
Recall, a lower average ranking towards 1
is better than a higher average ranking towards 6
. This is the order of the movies by best ranking:
Star Wars: Episode V The Empire Strikes Back
.Star Wars: Episode VI Return of the Jedi
.Star Wars: Episode IV A New Hope
.Star Wars: Episode I The Phantom Menace
.Star Wars: Episode II Attack of the Clones
.Star Wars: Episode III Revenge of the Sith
.
Most-Viewed Movie
This is the order of the movies by viewership:
Star Wars: Episode V The Empire Strikes Back
.Star Wars: Episode VI Return of the Jedi
.Star Wars: Episode I The Phantom Menace
.Star Wars: Episode IV A New Hope
.Star Wars: Episode II Attack of the Clones
.Star Wars: Episode III Revenge of the Sith
.
The best ranked movies also tend to be the most frequently watched, with only movie 1
and 4
trading places.
Results
From our analysis, we've learned that the best-ranked and most-viewed movies were Star Wars: Episode V The Empire Strikes Back
and Star Wars: Episode VI Return of the Jedi
respectively.