MapReduce notebook: max transaction value by store
This notebook focuses on MapReduce, which processes large datasets. It is designed to find the maximum transaction value by store using sales data. For those who want to learn MapReduce from scratch, this notebook covers the basics. For more information, here is a detailed article.
1. Data exploration
In this section, the sales data are explored to gain insights into transaction values.
2. Finding maximum transaction per store
This code snippet shows a simplified MapReduce process to find the maximum transaction value for each store. The mapper function extracts the Store and Cost columns, while the reducer groups the data by Store and calculates the maximum Cost for each store. The results display the highest transaction values per store, showing the efficiency of MapReduce in data analysis.
3. Visualizations
3.1 Maximum transaction per store & total share of transaction cost by store
In the bar chart left below, the maximum transaction cost for each store is displayed. The second histogram shows how to the transaction costs are distributed.
3.2. Distribution of transaction costs
In the histogram below, you can see how transaction costs are distributed.