Extracting Song Data From Spotify using a Web API
This is a tutorial on how audio data can be extracted using Spotify's Web API (Application Program Interface) which is based on simple REST principles. The Spotify Web API endpoints return JSON metadata containing data about music artists, albums, and tracks. Furthermore, in-depth statistics on songs such as features describing the "feel" of audio in terms of its "liveness", "acoustics" and "energy" can be retrieved. Also, it provides access to user-related data such as playlists, and music that the user saves in the Your Music library. In this tutorial, we will focus on extracting audio data on songs uploaded or created by an artist.
JSON stands for JavaScript Object Notation developed by Douglas Crockford in the 2000s.
Libraries
There are two Python libraries or packages available for the extraction of audio data from Spotify, namely plamere/spotipy and tekore. These are wrapper classes that are embedded with HTTP request methods such as GET and POST which are used to access data via Spotify's host URL: api.spotify.com. We will use the "Spotipy" library for this tutorial.
It is not adviceable to reveal your Spotify ID and Key if you intend to publish your Python Code or Notebooks (here is a video explaining how to hide them). There are two ways to extract data from Spotify. It's either with or without user authentication. The latter is used when your application does not need to authenticate users, however, you would still need to get a Spotify ID and Key to connect to the Spotify's servers. To get started, head to Spotify, create an app, and add your new ID and SECRET to your environment:
Data Extraction
Let's begin with a basic search of an artist name which will be passed as a string. We will search for the name "Sarkodie" and NOT "sarkodie".
The function takes two arguements, a string and an integer, that are passed on to the search object in line three of the function. The forth line is a list comprehension that prints an enumerated list of song names returned in the JSON file. The fifth line returns the total number of items available to return.
This search returned some popular hits of Sarkodie and other songs he might have been featured on. Below is a structure of the returned JSON file:
The root node is called tracks and within tracks there are 7 parent nodes namely href, items, limit, next, offset, previous, and total. The href parent node contains the URL of the API host address. Inside items is where the list of albums are returned. Album could be type album, single, or compilation. Limit is an integer; the number of items returned in each set (the maximum limit allowed for each set of results is 50). If you have more results than the limit specified, the next node will provide a URL that has the subsequent set of list to retrieve. The offset is an integer that sets the index for the first returned results; this is used in getting the next pages/lsit of songs. The total node returns the total number of search results that can be querried.
At this point we wish to extract song data from a particular artisit. What we need is the artist's Spotify ID. There are three ways to retrieve this information:
Some of the tracks/albums may have more than one artist and we want to be sure we are getting the right Spotify ID of the artist in question. In the JSON file there two places we can retrieve the Spotify ID of an artist: In the spotify and uri node. The spotify node has the Spotify profile page of an artist, similar to the link above in [1]. The uri which stands for Universal Record Identifier, stores a unique identifier for every artist on spotify; it looks like this: "spotify:artist:spotify_id". The Spotify ID will be in the third token where is says "spotify_id".
Sakordie's Spotify ID is "01DTVE3KmoPogPZaOvMqO8". Now lets get Sarkodie's tracks/albums on Spotify.
Sarkodie has 7 ablums on Spotify with a grand total of 119 tracks. For this tutorial we are going to extract the tracks for Sarkodies latest album JAMZ. But before we proceed we need the album's ID which is: 4N96XJi7wu1B0ACzCgPLLc
Exploratory Data Analysis (EDA)
The following list explains the metrics that come with each track.
Visuals
Let's get some statistical insights into our data. For our data analysis, we will take a deep dive into learning how to show patterns, correlations, and visualize our data to the common eye. Creating visually appealing statistics about our data is important because the DataFrame is a bit hard to understand, especially in the general scope. Through graphs and plots, we will be able to better understand our data and get a better overview of it.
The highest correlating metrics are the duration and instrumentalness with a score of 0.81. The song titled She Bad (feat. Oxlade) has the longest duration of about 5 mins and has higher instrumentalness than the other songs in JAMZ.
The next highest correlation is between danceability and valence with a score of 0.69. 7 of the songs are all over the mean danceability of 0.82 whereas 5 of the songs are in above the mean valence. Cougar (feat. Lojay) is the most danceable song according to Spotify followed by Country Side (feat. Black Sherif). In the valence category Labadi (feat. King Promise) provides listeners with more positivity, followed by Cougar (feat. Lojay) and then Country Side (feat. Black Sherif).
Other observations with high correlations are loudness & speechness with 0.66, liveness and tempo with 0.63, and then energy and tempo with 0.62.
Spotify has its own way of calculating a track's popularity. Usually popularity should correlate with the number of streams, however, Spotify generates the popularity metric on how recent the streams of the song are, in order words its streaming frequency. The most popular song in JAMZ are She Bad (feat. Oxlade), Country Side (feat. Black Sherif) and Better Days (feat. BNXN fka Buju). In this visual, popularity correlates positively with the mode, which is the modality (major or minor) of the song. Popularity also correlates with duration_ms, and liveness.
With these findings I decided to create a formula to rate the songs:
new_popularity = danceability + energy + loudness + tempo
Supposing you are at a party, preferably you would want to dance to loud, energetic music with a bit of tempo, thus, I created the mathematical function above.
In the heat map above the calculated popularity is positively correlated with danceability, energy, liveness, and tempo.
Voilà, with the calculated popularity Country Side (feat. Black Sherif) receives the highest popularity followed by Labadi (feat. King Promise) and then Better Days (feat. BNXN fka Buju). And by the way this is according to my Dance Formula.
I hope you enjoy going through this tutorial. If you have any questions or suggestions follow me on Twitter (@kingKwabs) and send me text me. This analysis is as of November 12, 2022 and the data retrieved may be not reflect the current trends of the artist used for this analysis.