Adidas Sales Analysis - Project Two

Andrea Ruiz and Annelis Parra

December 3, 2025

1. Goals and Summary

How does personal income and total sales effect Adidas' operating profit?

The goal of this project is to determine how average personal income in each U.S. state and total Adidas sales in that state effect their operating profit. To explore this, we combined two separate datasets, one containing Adidas’ state-level sales and profits and another containing quarterly personal income by state from the BEA.

After merging these datasets, we created a yearly average income value for each state, summarized Adidas’ total sales and operating profit by state, built regression models, and created visualizations to understand how sales and income relate to profit.

2. Data Provenance

Data Origins:

Dataset 1 — Adidas Sales Data (Kaggle)

Dataset 2 — U.S. Personal Income Data (Bureau of Economic Analysis)

Why It Fits Our Goals:

Dataset 1: This dataset provides Adidas’ total sales and operating profit by state. Since our project aims to understand what influences Adidas’ operating profit, we need detailed state-level performance data. This dataset gives us the dependent variable (operating profit) and an important predictor (total sales).

Dataset 2: This dataset provides the average personal income for each state. Since income is our second predictor, this dataset helps us see whether wealthier states generate more operating profit for Adidas. It also covers the same year (2021), making it suitable for merging.

Suitable State:

Dataset 1: Suitable because it includes state-level observations, allowing us to aggregate and compare Adidas’ performance across all 50 states. The format is already clean, with correct data types, and merge-ready after grouping.

Dataset 2: Suitable because it also reports data by state, making it directly compatible with the Adidas dataset. Quarterly income values can be averaged to create one annual income measure for each state, which aligns with the yearly Adidas data. The only minor change we needed to make was to rename Alaska and Hawaii since they originally showed up as Alaska* and Hawaii*.

3. Data Merge

First, we loaded the Adidas sales dataset into a dataframe so we could view the state-level sales and operating profit data that would form the base of our merge.

import pandas as pd adidas = pd.read_csv('Adidas US Sales.csv') adidas # adidas['region'].unique()

Run to view results

After loading this dataset, we can see all 9,637 sales records with details that include retailer, invoice date, region, state, and operating profit. This confirms the data loaded correctly and is ready to be cleaned and merged with the income data set in the next step.

Next, we load the BEA income dataset and clean the state names so they match the formatting in the Adidas dataset. This ensures both datasets can be merged correctly by state.

income = pd.read_csv('Table (1).csv', skiprows=3) # Alaska and Hawaii weren't in the same format as adidas, so we changed the names to match income.loc[income['GeoName'] == 'Alaska *', 'GeoName'] = 'Alaska' income.loc[income['GeoName'] == 'Hawaii *', 'GeoName'] = 'Hawaii' income

Run to view results

After loading and fixing Alaska and Hawaii's names, we can see that the income dataset displays each state along with its quarterly income values. The data is now clean and ready to be merged with the Adidas sales data.

To prepare the data for merging, we first grouped the Adidas sales dataset by state and calculated each state's total sales and total operating profit.

state = adidas.groupby('state')[['total_sales','operating_profit']].sum().reset_index() state

Run to view results

After grouping, we now have a clean table of all 50 states with their combined total sales and operating profit values. This summarized dataset is ready to be merged with the income data.

Now that both datasets are cleaned and aligned, we merge the summarized Adidas sales data with the income data using the state column so each state has its sales, profit, and income values in one place.

adidas_df = pd.merge(state, income, how='inner', left_on='state', right_on='GeoName') adidas_df = adidas_df.drop(columns=['GeoName']) adidas_df = adidas_df.set_index('GeoFips') adidas_df

Run to view results

After merging, we get one complete dataframe with all 50 states and their total sales, operating profit, and quarterly income values. This combined dataset is now ready for analysis and visualization.

We check the shape of the merged dataset to confirm that all 50 states and all expected columns were included after the merge.

adidas_df.shape

Run to view results

The output shows a (50,7) dataframe, meaning all states are present and the merge was successful.

Next, we check for any missing values to make sure the dataset is clean before creating new variables or running any analysis.

adidas_df.isnull().sum()

Run to view results

The results show zero missing values across all columns, confirming that the dataset is complete and ready for further processing.

We calculate the average personal income for each state by averaging all four quarters of 2021. This gives us one yearly income value per state for use in our regression model.

qt_avg = adidas_df[['2021:Q1', '2021:Q2', '2021:Q3', '2021:Q4']].mean(axis=1) adidas_df['avg_income'] = qt_avg adidas_df

Run to view results

After computing the average, a new column, avg_income, is added to the dataframe, giving each state its combined yearly income value. This prepares the dataset for statistical analysis.

Next, we filter the dataset to identify which states have an average income of $80,000 or higher so we can quickly examine how high-income states compare.

adi = adidas_df[adidas_df['avg_income'] >= 80000] adi

Run to view results

The filtered results show that only Connecticut and Massachusetts meet this requirement, giving us a quick look at their sales and operating profit before moving into the statistical analysis.

4. Statistical Analysis

Now that the dataset is prepared, we run an OLS regression to measure how average income and total sales affect Adidas's operating profit across all 50 states.

import statsmodels.formula.api as smf model = smf.ols(formula='operating_profit ~ avg_income + total_sales', data=adidas_df).fit() print(model.summary())

Run to view results

The regression results show the model's coefficients, significance levels, and overall fit. This output will help us interpret which factors have meaningful impacts on operating profit and how strong those relationships are.

Operating Income = 353,900 + (Avg Income * -4.9218) + (Total Sales * .2564).

Both predictors in this model, avg income and total sales are significant in explaining operating profit.

It is important to note [2] at the bottom of the table. Since the Adidas data is synthetic, it may not be true. Given the data, this is how we would analyze and come to a conclusion if the numbers were accurate.

5. Data Visualization

Here, we create a scatterplot to visualize the relationship between total sales and operating income, with point color representing average income for each state.

import matplotlib.pyplot as plt plt.scatter( adidas_df['total_sales'], adidas_df['operating_profit'], c=adidas_df['avg_income'], cmap='BuPu' ) plt.colorbar(label='Average Income') plt.ticklabel_format(style='plain', axis='both') plt.xticks(rotation=45) plt.title('Total Sales and Operating Income') plt.xlabel('Total Sales') plt.ylabel('Operating Income')

Run to view results

The scatterplot shows a strong positive relationship between total sales and operating income, with color shading helping us see how income levels vary across states.

Next, we gather all state names and build a dictionary that groups them into their correct U.S. regions. After running the loop, the dictionary confirms that all 50 states are assigned to a region. This sets us up to add a region column to the dataframe.

state_list = adidas_df['state'].to_list() # state_list region_dict = { 'Southeast': [], 'Northeast': [], 'West': [], 'North': [], 'South': [], 'Midwest': []} for i in range(len(adidas)): state = adidas.loc[i, 'state'] region = adidas.loc[i, 'region'] if state in state_list: if state not in region_dict[region]: region_dict[region].append(state) # region_dict total = 0 for region, states in region_dict.items(): total += len(states) total

Run to view results

Since there are 50 unique states in our dictionary, we can use it to create a the new column 'region' in adidas_df

This code creates a new region column by matching each state to its correct U.S. region using a predefined dictionary. It loops through every row, checks which region the state belongs to, and labels it accordingly. We want to group states by region and analyze operating profit at a regional level.

adidas_df['region'] = None for idx in adidas_df.index: state = adidas_df.loc[idx, 'state'] for region, states in region_dict.items(): if state in states: adidas_df.loc[idx, 'region'] = region break # print(adidas_df[['state', 'region']]) adidas_df

Run to view results

Now that each state has an assigned region, we can create a box plot to compare operating profit across the different U.S. regions.

We would like to create a visual of operating profit in regions to see which make the most.

import seaborn as sns plt.figure(figsize=(7, 6)) ax = sns.boxplot(data=adidas_df, x='region', y='operating_profit', palette='flare') ax.set_yticks([500000, 1000000, 1500000, 2000000]) plt.ticklabel_format(style='plain', axis='y') plt.title('Operating Profit by Region') plt.xlabel('Operating Profit') plt.ylabel('Region')

Run to view results

This box plot shows how Adidas’ operating profit differs across U.S. regions. The Southeast has the highest profits overall, with several strong outliers indicating states that generate the most operating profit for Adidas. The South and West show moderate profit levels with more variation, while the Northeast and Midwest have the lowest and most consistent profit values.

For files used and report, click here

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Adidas Sales Analysis - Project Two

1. Goals and Summary

2. Data Provenance

Data Origins:

Why It Fits Our Goals:

Suitable State:

3. Data Merge

4. Statistical Analysis

5. Data Visualization

Adidas Sales Analysis - Project Two