Network Data Exercise
For this exercise, we are going to be manipulating dataframes, so we will need to import the pandas
library
We will read in the shipping data excel file using the read_excel
method
The above code creates a dictionary of all of the different sheets from the excel file we just read in
We can index the dictionary, shipping_file
, to create a different dataframe from each excel sheet
The above code separates the four different dataframes.
Next, we want to create a new dataframe that changes the origin and destination columns in the shipping dataframe from the state abbreviations, to the full state name
The above code creates a dictionary of all state abbreviations and thir corresponding full names. We can now
create a new dataframe by making a copy of the shipping_df
and mapping our dictionary onto the origin and
destination columns
Here, we can see that our mapping worked. We now have a new shipping dataframe with full state names for origin and destination
We also change the adjacent
column of the distances_df
to be of type Boolean as it is a better
description of what the variable is saying. The code below converts the column to booleans
The output above shows that the adjacent column now contains true or false values.
We now want to add two columns to the shipping_df
, the distance between the origin state and the
destination state as well as whether these states are adjacent. To do this, we can create a
function that takes the origin and destination states as arguments, and searches for the values for
distance and adjacent in distance_df
for the row with these states. The code below creates these two functions,
getDistance()
and getAdjacent
Now that we have our functions, we can apply them the the shipping_df
to add a Distance
and Adjacent
value for each order (This takes a couple of minutes to run).
The output above shows that we have succesfully added the distance between the origin and destination
states, as well as whether these states are adjacent to shipping_df
.