Netflix Data Analysis
import pandas as pd
import numpy as np
df = pd.read_csv('/work/netflix_titles.csv', encoding='ISO-8859-1')
Run to view results
df.head()
Run to view results
df.shape
Run to view results
df.columns
Run to view results
#cleaning bad columns
df = df[['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
'release_year', 'rating', 'duration', 'listed_in', 'description']]
df.head()
Run to view results
#turning date_added into a date
df['date_added'] = pd.to_datetime(df['date_added'], format='mixed')
Run to view results
# Extract the year and create a new column
df['year_added'] = df['date_added'].dt.year
Run to view results
Netflix Appears to have more movies than tv shows
df.groupby('type')['title'].count()
Run to view results
Has that been trending in a particular direction?
Run to view results
We see that way more movies are added each year and that content spiked after 2015
What is the breakdown by genre?
Run to view results
We can see that tv-ma is the most popular rating
What is the relationship between length of movie and rating?
movie_df = df[df['type'] == 'Movie']
Run to view results
movie_df['duration'] = movie_df['duration'].astype(str).str.replace(' min', '')
movie_df['duration'].dropna(inplace = True)
movie_df['duration'] =movie_df['duration'].astype(float)
movie_df.head()
Run to view results
Run to view results