Netflix Data Analysis
import pandas as pd
import numpy as np
df = pd.read_csv('/work/netflix_titles.csv', encoding='ISO-8859-1')
df.head()
df.shape
df.columns
#cleaning bad columns
df = df[['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
'release_year', 'rating', 'duration', 'listed_in', 'description']]
df.head()
#turning date_added into a date
df['date_added'] = pd.to_datetime(df['date_added'], format='mixed')
# Extract the year and create a new column
df['year_added'] = df['date_added'].dt.year
Netflix Appears to have more movies than tv shows
df.groupby('type')['title'].count()
Has that been trending in a particular direction?
Run the app to see this chart
Press the run button in the top right corner
We see that way more movies are added each year and that content spiked after 2015
What is the breakdown by genre?
Run the app to see this chart
Press the run button in the top right corner
We can see that tv-ma is the most popular rating
What is the relationship between length of movie and rating?
movie_df = df[df['type'] == 'Movie']
movie_df['duration'] = movie_df['duration'].astype(str).str.replace(' min', '')
movie_df['duration'].dropna(inplace = True)
movie_df['duration'] =movie_df['duration'].astype(float)
movie_df.head()
Run the app to see this chart
Press the run button in the top right corner