Exploratory Data Analysis in SQL
Main Takeaways at the Bottom
Getting an Idea of what is included in the data frame.
Customer Analysis
What are the countries to which our customers reside?
How many transactions in each country?
How many customers are there per country?
How many customers do we have in total?
Top 10 Customers with the most orders
What is the top 10 products for customer 17850?
Gross Sales to Each Customer (Top 10)
Product Analysis
Average Order Price? Average Unit Price? Average Order Quantity?
Gross Sales. Gross Sales by SKU
What are the top 10 selling products? In UK?
Average Price Per SKU?
Average Order Price
Average Unit Price
Average Order Quantity
The block above includes some records that I do not want to factor in, therefore I will exclude them in the next block to get a true sense of the highest average priced items.
Time Series Analysis
Firstly we need to clean the data to be in the correct format and we need to extract information from the InvoiceDate column.
Used python to add the day, month, year columns to our data frame. Will use SQL below to analyze.
Time EDA
Sales per Year
Gross Revenue per Year
Gross Revenue per Month over the two years.
Main Takeaways
- We sell to 38 countries - UK has the highest count of customers at 3,951 out of 4,373 total customers. - While Customer 17850 has ordered the most product by quantity, customer 14646 has contributed to the most amount of revenue at $279,489.02 out of a total of $9,747,747.93 in the 13 months observed. - Our average order price is $17.99, the average unit price is $4.61, and the average quantity ordered is 9.55 units. - For 2011, the best month by revenue was November at $1,461,756.25.