Data Analytics with Pandas ๐ผ
The Pandas package is the most imperative tool in Data Science and Analysis working in Python nowadays.
In this notebook, we would analyze Supermarket Data Across the Country; Company XYZ.
Alright, letโs start!
Step 1 - Loading the Dataset
['Lagos_Branch.csv', 'Port_Harcourt_Branch.csv', 'Abuja_Branch.csv']
Step 2 - Data Exploration
It is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with.
The dataset was explored using some built-in pandas function like; Numpy for carrying out numerical computations, pandas for making a dataframe object seaborn and matplotlib for visualizations. The .head() to view the first few observations of the dataframe and .info() for information about the dataframe.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 1000 non-null int64
1 Invoice ID 1000 non-null object
2 Branch 1000 non-null object
3 City 1000 non-null object
4 Customer type 1000 non-null object
5 Gender 1000 non-null object
6 Product line 1000 non-null object
7 Unit price 1000 non-null float64
8 Quantity 1000 non-null int64
9 Tax 5% 1000 non-null float64
10 Total 1000 non-null float64
11 Date 1000 non-null object
12 Time 1000 non-null object
13 Payment 1000 non-null object
14 cogs 1000 non-null float64
15 gross margin percentage 1000 non-null float64
16 gross income 1000 non-null float64
17 Rating 1000 non-null float64
dtypes: float64(7), int64(2), object(9)
memory usage: 140.8+ KB
Step 3 - Dealing with DateTime Features
Step 4 - Unique Values in Columns
Total Number of unique values in the Branch Column : 3
This is for Branch Coulmns
['A', 'C', 'B']
This is for City Coulmns
['Lagos', 'Port Harcourt', 'Abuja']
This is for Customer type Coulmns
['Member', 'Normal']
This is for Gender Coulmns
['Female', 'Male']
This is for Product line Coulmns
['Health and beauty', 'Home and lifestyle', 'Sports and travel', 'Electronic accessories', 'Food and beverages', 'Fashion accessories']
This is for Payment Coulmns
['Epay', 'Card', 'Cash']
This is for Branch Coulmns
Total Number of unique value: 3
This is for City Coulmns
Total Number of unique value: 3
This is for Customer type Coulmns
Total Number of unique value: 2
This is for Gender Coulmns
Total Number of unique value: 2
This is for Product line Coulmns
Total Number of unique value: 6
This is for Payment Coulmns
Total Number of unique value: 3
Step 5 - Aggregation with GroupBy
Step 6 - Data Visualization
Summary
In this tutorial, we have learnt how to perform Exploratory Data Analysis (EDA) in Python and how to use external Python packages such as Pandas, Numpy, Matplotlib, Seaborn etc. to conduct univariate analysis, bivariate analysis and data visualization.