E-Commerce Analysis
The goal of this notebook is to provide an analytical perspective on e-commerce relationships in Brazil. To achieve this, we will begin by conducting an exploratory data analysis using graphical tools. This will enable us to create self-explanatory plots that facilitate a better understanding of the Brazilian online purchasing landscape. Finally, we will analyze customer reviews and implement Sentiment Analysis by utilizing Natural Language Processing tools to classify the text.
Throughout this notebook, we will embark on an extensive journey to fully comprehend the data and produce useful charts that elucidate key concepts and offer insights derived from the data. In conclusion, we will provide a step-by-step guide on text preparation and sentiment classification, utilizing customer reviews from various online platforms.
Libraries
Reading the Data
For this task we have differente data sources, each one describing a specific topic related to e-commerce sales. The files are:
olist_customers_dataset.csv olist_geolocation_dataset.csv olist_orders_dataset.csv olist_order_items_dataset.csv olist_order_payments_dataset.csv olist_order_reviews_dataset.csv olist_products_dataset.csv olist_sellers_dataset.csv product_category_name_translation.csv
The relationship between these files are described on the documentation. So let's read the datasets and make an initial analysis with all of them. This step will help us a lot to take right decisions in a future exploratory data analysis
An Overview of the Dataset
Before creating a unique dataset with all useful information, let's look at the shape of each dataset, so we can be more assertive on how to use joining statements.
Now let's look at each dataset and bring some detailed parameters about the data content.
Exploratory Data Analysis
So now we will go trough an exploratory data analysis to get insights from E-Commerce in Brazil. The aim here is to divide this session into topics so we can explore graphics for each subject (orders, customers, products, items, and others).
Total Orders on E-Commerce
We know that e-commerce is really a growing trend in a global perspective. Let's dive into the orders dataset to see how this trend can be presented in Brazil, at least on the dataset range.
Looking at the dataset columns, we can see orders with different status and with different timestamp columns like purchase, approved, delivered and estimated delivery. First, let's look at the status of the orders we have in this dataset.
By the time this dataset was created, the highest amount of orders went from delivered ones. Only 3% of all orders came from the other status.
For the next plots, let's dive into the real evolution of e-commerce in terms of purchase orders. For this, we have to extract some info on the order_purchase_timestamp following the topics:
1. Transform timestamp columns. 2. Extract time attributes from these datetime columns (year, month, day, day of week and hour). 3. Evaluate the e-commerce scenario using this attributes.
So now we can purpose a complete analysis on orders amount of brazilian e-commerce during the period of the dataset. For that let's plot three graphs using a GridSpec with the aim answer the following questions:
1. Is there any growing trend on brazilian e-commerce? 2. On what day of week brazilian customers tend to do online purchasing? 3. What time brazilian customers tend do buy (Dawn, Morning, Afternoon or Night)?
By the chart above we can conclude:
Obs: we have a sharp decrease between August 2018 and September 2018 and maybe the origin of that is related to noise on data. For further comparison between 2017 and 2018, let's just consider orders between January and August in both years
E-commerce: a comparison between 2017 and 2018
E-Commerce Around Brazil
For preparing the data to a workaround analysis on brazilian's states e-commerce, we will take the following steps:
1. Merge the orders data to order_items data; 2. Use an API (brazilian government) to return the region of each customer_state; 3. Purpose useful charts to answer business questions.
Brazilian APIs ans links for geolocation info:
An overview of customer's order by region, state and city
By the map we showed above, we have already the insight that the southeast of Brazil has the highest number of orders given through e-commerce.
E-Commerce Impact on Economy
Until now, we just answered questions on E-commerce scenario considering the number of orders received. We could see the volumetry amonth months, day of week, time of the day and even the geolocation states.
Now, we will analyze the money moved by e-commerce by looking at order prices, freights and others
For answering this question, let's first group our data in a way to look at the evolution overall.
It's very interesting to see how some states have a high total amount sold and a low price per order. If we look at SP (São Paulo) for example, it's possible to see that it is the state with most valuable state for e-commerce (5,188,099 sold) but it is also where customers pay less per order (110.00 per order).
Here we can get insights about the customers states with highest mean freight value. For example, customers in Roraima (RR), Paraíba (PB), Rondônia (RO) and Acre (AC) normaly pays more than anyone on freights.
Payment Type Analysis
One of the datasets provided have informations about order's payment. To see how payments can take influence on e-commerce, we can build a mini-dashboard with main concepts: payments type and payments installments. The idea is to present enough information to clarify how ecommerce buyers usually prefer to pay orders.
In fact, we can see by the line chart that payments made by credit card really took marjority place on brazilian e-commerce. Besides that, since 201803 it's possible to see a little decrease on this type of payment. By the other side, payments made by debit card is showing a growing trend since 201805, wich is a good opportunity for investor to improve services for payments like this.
On the bar chart above, we can see how brazilian customers prefer to pay the orders: mostly of them pay once into 1 installment and it's worth to point out the quantity of payments done by 10 installments.
Natural Language Processing
As long as we could improve our relationship with the data, the path is open to start the Natural Language Processing step to analyze the comments left on e-commerce orders. The goal is to use this as input to a sentimental analysis model for understanding the customer's sentiment on purchasing things online. Let's take a look on the reviews data.