Data Analytics with Pandas 🐼
The Pandas package is the most imperative tool in Data Science and Analysis working in Python nowadays.
In this notebook, we would analyze Supermarket Data Across the Country; Company XYZ.
Alright, let’s start!
Step 1 - Loading the Dataset
['Lagos_Branch.csv', 'Port_Harcourt_Branch.csv', 'Abuja_Branch.csv']
Step 2 - Data Exploration
It is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with.
The dataset was explored using some built-in pandas function like; Numpy for carrying out numerical computations, pandas for making a dataframe object seaborn and matplotlib for visualizations. The .head() to view the first few observations of the dataframe and .info() for information about the dataframe.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 1000 non-null int64 1 Invoice ID 1000 non-null object 2 Branch 1000 non-null object 3 City 1000 non-null object 4 Customer type 1000 non-null object 5 Gender 1000 non-null object 6 Product line 1000 non-null object 7 Unit price 1000 non-null float64 8 Quantity 1000 non-null int64 9 Tax 5% 1000 non-null float64 10 Total 1000 non-null float64 11 Date 1000 non-null object 12 Time 1000 non-null object 13 Payment 1000 non-null object 14 cogs 1000 non-null float64 15 gross margin percentage 1000 non-null float64 16 gross income 1000 non-null float64 17 Rating 1000 non-null float64 dtypes: float64(7), int64(2), object(9) memory usage: 140.8+ KB
Step 3 - Dealing with DateTime Features
Step 4 - Unique Values in Columns
Total Number of unique values in the Branch Column : 3
This is for Branch Coulmns ['A', 'C', 'B'] This is for City Coulmns ['Lagos', 'Port Harcourt', 'Abuja'] This is for Customer type Coulmns ['Member', 'Normal'] This is for Gender Coulmns ['Female', 'Male'] This is for Product line Coulmns ['Health and beauty', 'Home and lifestyle', 'Sports and travel', 'Electronic accessories', 'Food and beverages', 'Fashion accessories'] This is for Payment Coulmns ['Epay', 'Card', 'Cash']
This is for Branch Coulmns Total Number of unique value: 3 This is for City Coulmns Total Number of unique value: 3 This is for Customer type Coulmns Total Number of unique value: 2 This is for Gender Coulmns Total Number of unique value: 2 This is for Product line Coulmns Total Number of unique value: 6 This is for Payment Coulmns Total Number of unique value: 3
Step 5 - Aggregation with GroupBy
Step 6 - Data Visualization
In this tutorial, we have learnt how to perform Exploratory Data Analysis (EDA) in Python and how to use external Python packages such as Pandas, Numpy, Matplotlib, Seaborn etc. to conduct univariate analysis, bivariate analysis and data visualization.