Importing the data into a pandas data frame. Then, a series of all unique user ID's is created from the data frame.
Drawing a random sample from the dataset because of its large size. A sample size of 1500 should still allow us to draw valuable conclusions from the data while also allowing the code to run much faster. 'random_state' argument is used to replicate the process below using the same sample.
In the code below, we first filter the data so that it only includes rows from the users randomly selected in the step above. We then removed all but 3 columns from the data frame which we are interested in.
Filtering the sample data into three columns of interest: User ID, artist of song listened to, and the title of the song
Creating a new column in our sample data, combining the song and artist columns into one
Dropping the artist and title columns now that we merged them into the song column
Exporting our manipulated sample data frame into a new csv file