Import pandas for the purpose of creating a pandas series and reading in some datasets
read in the jams tsv file
the jams tsv file contains data on different users favorites songs from 2011 to 2015
create a pandas series that displays the full set of user_ids in the dataset
with the output of the series, one can see every user_id in the tsv file
for the purpose of time, utilizing every piece of data in the dataset may take to long. Using the sample function we can select a random subset of users to work with.
upon output, one can notice the random subset based on the out of order indexes.
now we can create a dataframe that displays the columns relating to the user_ids in our random sample
upon output one can notice the user_id and the relating columns
now all we need is the user_id, artist and title coulmns in the tsv file.
with the output one can notice the dropping of a couple of columns
Next we are creating a new column called final_jam. This new column combines the song title and the artist of our sample.
now that our final_jam column contains both the song title and artist we do not need ether of these columns anymore.
finally we can export this data frame to a csv file for the purpose analyzing in class.
read in the new csv file just to make sure everything looks right. Index is set to false so our sample is reordered correctly.