Shopify Data Science Intern Challenge
Questions from Part 1
On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis.
Think about what could be going wrong with our calculation. Think about a better way to evaluate this data. What metric would you report for this dataset? What is its value?
Converting created_at columns to datetime data type in order to query the data....
shop_id AOV Per Shop 0 1 158.0 1 2 94.0 2 3 148.0 3 4 128.0 4 5 142.0 .. ... ... 95 96 153.0 96 97 162.0 97 98 133.0 98 99 195.0 99 100 111.0 [100 rows x 2 columns]
number of outliers: 2 max outlier value: 25725.0 min outlier value: 352.0
0 153.0 dtype: float64 0 153.0 dtype: float64 407.99
Conclusion for Part 1
I would've use mode AOV as the metric as it reduces the effect of outliers. However, if we choose to perform the above analysis and eliminate the outliers from the original dataset, we could also just use the mean calculated from the without outlier dataset. In this case, I would say the value would be 153 and it would be a good starting point for businesses to consider on how to improve their business to maximize revenue. I hope you enjoyed my analysis.
Questions from Part 2 SQL
For this question you’ll need to use SQL.
Follow this link to access the data set required for the challenge. Please use queries to answer the following questions. Paste your queries along with your final numerical answers below.
A. How many orders were shipped by Speedy Express in total?
B. What is the last name of the employee with the most orders?
C. What product was ordered the most by customers in Germany?