Adidas Sales Analysis - Project Two
Andrea Ruiz and Annelis Parra
December 3, 2025
1. Goals and Summary
How does personal income and total sales effect Adidas' operating profit?
The goal of this project is to determine how average personal income in each U.S. state and total Adidas sales in that state effect their operating profit. To explore this, we combined two separate datasets, one containing Adidas’ state-level sales and profits and another containing quarterly personal income by state from the BEA.
After merging these datasets, we created a yearly average income value for each state, summarized Adidas’ total sales and operating profit by state, built regression models, and created visualizations to understand how sales and income relate to profit.
2. Data Provenance
Data Origins:
Dataset 1 — Adidas Sales Data (Kaggle)
Why It Fits Our Goals:
Dataset 1: This dataset provides Adidas’ total sales and operating profit by state. Since our project aims to understand what influences Adidas’ operating profit, we need detailed state-level performance data. This dataset gives us the dependent variable (operating profit) and an important predictor (total sales).
Dataset 2: This dataset provides the average personal income for each state. Since income is our second predictor, this dataset helps us see whether wealthier states generate more operating profit for Adidas. It also covers the same year (2021), making it suitable for merging.
Suitable State:
Dataset 1: Suitable because it includes state-level observations, allowing us to aggregate and compare Adidas’ performance across all 50 states. The format is already clean, with correct data types, and merge-ready after grouping.
Dataset 2: Suitable because it also reports data by state, making it directly compatible with the Adidas dataset. Quarterly income values can be averaged to create one annual income measure for each state, which aligns with the yearly Adidas data. The only minor change we needed to make was to rename Alaska and Hawaii since they originally showed up as Alaska* and Hawaii*.
3. Data Merge
First, we loaded the Adidas sales dataset into a dataframe so we could view the state-level sales and operating profit data that would form the base of our merge.
Run to view results
After loading this dataset, we can see all 9,637 sales records with details that include retailer, invoice date, region, state, and operating profit. This confirms the data loaded correctly and is ready to be cleaned and merged with the income data set in the next step.
Next, we load the BEA income dataset and clean the state names so they match the formatting in the Adidas dataset. This ensures both datasets can be merged correctly by state.
Run to view results
After loading and fixing Alaska and Hawaii's names, we can see that the income dataset displays each state along with its quarterly income values. The data is now clean and ready to be merged with the Adidas sales data.
To prepare the data for merging, we first grouped the Adidas sales dataset by state and calculated each state's total sales and total operating profit.
Run to view results
After grouping, we now have a clean table of all 50 states with their combined total sales and operating profit values. This summarized dataset is ready to be merged with the income data.
Now that both datasets are cleaned and aligned, we merge the summarized Adidas sales data with the income data using the state column so each state has its sales, profit, and income values in one place.
Run to view results
After merging, we get one complete dataframe with all 50 states and their total sales, operating profit, and quarterly income values. This combined dataset is now ready for analysis and visualization.
We check the shape of the merged dataset to confirm that all 50 states and all expected columns were included after the merge.
Run to view results
The output shows a (50,7) dataframe, meaning all states are present and the merge was successful.
Next, we check for any missing values to make sure the dataset is clean before creating new variables or running any analysis.
Run to view results
The results show zero missing values across all columns, confirming that the dataset is complete and ready for further processing.
We calculate the average personal income for each state by averaging all four quarters of 2021. This gives us one yearly income value per state for use in our regression model.
Run to view results
After computing the average, a new column, avg_income, is added to the dataframe, giving each state its combined yearly income value. This prepares the dataset for statistical analysis.
Next, we filter the dataset to identify which states have an average income of $80,000 or higher so we can quickly examine how high-income states compare.
Run to view results
The filtered results show that only Connecticut and Massachusetts meet this requirement, giving us a quick look at their sales and operating profit before moving into the statistical analysis.
4. Statistical Analysis
Now that the dataset is prepared, we run an OLS regression to measure how average income and total sales affect Adidas's operating profit across all 50 states.
Run to view results
The regression results show the model's coefficients, significance levels, and overall fit. This output will help us interpret which factors have meaningful impacts on operating profit and how strong those relationships are.
Operating Income = 353,900 + (Avg Income * -4.9218) + (Total Sales * .2564).
Both predictors in this model, avg income and total sales are significant in explaining operating profit.
It is important to note [2] at the bottom of the table. Since the Adidas data is synthetic, it may not be true. Given the data, this is how we would analyze and come to a conclusion if the numbers were accurate.
5. Data Visualization
Here, we create a scatterplot to visualize the relationship between total sales and operating income, with point color representing average income for each state.
Run to view results
The scatterplot shows a strong positive relationship between total sales and operating income, with color shading helping us see how income levels vary across states.
Next, we gather all state names and build a dictionary that groups them into their correct U.S. regions. After running the loop, the dictionary confirms that all 50 states are assigned to a region. This sets us up to add a region column to the dataframe.
Run to view results
Since there are 50 unique states in our dictionary, we can use it to create a the new column 'region' in adidas_df
This code creates a new region column by matching each state to its correct U.S. region using a predefined dictionary. It loops through every row, checks which region the state belongs to, and labels it accordingly. We want to group states by region and analyze operating profit at a regional level.
Run to view results
Now that each state has an assigned region, we can create a box plot to compare operating profit across the different U.S. regions.
We would like to create a visual of operating profit in regions to see which make the most.
Run to view results
This box plot shows how Adidas’ operating profit differs across U.S. regions. The Southeast has the highest profits overall, with several strong outliers indicating states that generate the most operating profit for Adidas. The South and West show moderate profit levels with more variation, while the Northeast and Midwest have the lowest and most consistent profit values.
For files used and report, click here