User Segmentation Analysis
This is a dataset of user transactions in a peer-to-peer money transfer service.
I'll develop a user segmentation method to group users for targeting through marketing and sales strategies.
Data Validation
First I'll make sure the data is complete and doesn't contain any null values.
Run to view results
There are 8 fields, and they seem to be read in the correct format.
Next I'll check for missing values.
Run to view results
Great, there are no missing values.
Next I'll check how many unique values are in each field.
Run to view results
We're working with a dataset of people sending money in 2 currencies and from 7 countries to 3 currencies.
Exploratory Data Analysis
Next I'll do some exploratory analysis to better understand the data and see if there are any obvious trends.
Run to view results
The 2 numerical fields show there is 52,589 records. Since we're working with different currencies the summary statistics aren't very valuable to us.
I need to validate if the sender_amt is in the local currency or in a standardized currency such as USD. There are 2 currencies in this dataset: GBP and EUR.
Run to view results
The values are in similar distribution patterns, and the ranges overlap significantly, but the FX rate from Euro to Pound is 1.17 to 1 so it's not a big difference. I'll assume the values are in 2 different currencies and create a new field that shows all values in Euro to keep it simple.
Run to view results
Next we'll look at the total number of transactions over time.
Run to view results
We have data from 60 days, with a general upward trend in the number of transactions over time, suggesting growing popularity in the service.
There are some seasonality trends, which may require further investigation. There are also some spikes in demand, suggesting promotions or other circumstances that are engaging customers more.
Next lets look at top sender countries by amount sent.
Run to view results
The top sender country by far is Italy, followed by Spain.
Top Send Countries
Finally, lets look at top sender countries by send transaction volume.
Run to view results
We see a similar trend as the send amount chart.
Top Receive Countries
Lets do the same analysis as above but for receive countries, starting with transaction volume.
Because we only have the recipient network, we have to do some transformation of the country name.
Run to view results
Run to view results
Cameroon contributes 85% of total received transaction volume.
Customer Segmentation
Feature Selection
First we have to decide how we want to segment our customer base. What are the most important data points to segment these users?
Some options include by transaction volume or frequency (or both). Country? Do we want to segment based on sender or receiver attributes?
In order to determine the most important factors we have to define our goal of this segmentation.
Since these users are already customers we don't have to segment for user acquisition, so I'll define the goal as:
a. optimize for revenue, and
b. customer retention,
focusing on the sender's attributes, since they are paying the transaction costs and are our customers.
Assumptions:
a. higher send values are more valuable in terms of revenue for the company.
b. we can adjust the marketing campaign in each country by just adjusting language and don't have to consider any other factors such as cultural changes.
Revenue Segmentation
First we'll calculate the total revenue generated by each sender, then we can segment them into 3 bins:
High-revenue (top third of revenue generated)
Medium-revenue (middle third of revenue generated)
Low-revenue (bottom third of revenue generated)
Run to view results
Retention Segmentation
Next we'll look at how customers retention can be grouped. We'll use the distribution of transaction counts for segmentation.
Run to view results
Since the majority of users have made 1-2 transactions, it's quite skewed and won't result in a great segmentation. So we'll segment the users into the following bins:
Rare: 1 transaction
Occasional: 2 transactions
Frequent 3+ transactions
Run to view results
Combined Segmentation
Now that we have the 2 distributions, we can combine them.
Run to view results
Next we can visualize the combined segmentation.
Run to view results
As we see above, the largest segment is low-revenue, rare transacting customers.
Here we can see the segments by total sender amount, showing low rare is also sending the most money, due to the quantity of users, not because each person is sending a lot of money.
Run to view results
With this data, we now know where to focus our attention.
Part 2: SQL Queries
Average Transaction Size
Run to view results
Monthly Growth in Gabon
Run to view results