User Segmentation Analysis
This is a dataset of user transactions in a peer-to-peer money transfer service.
I'll develop a user segmentation method to group users for targeting through marketing and sales strategies.
Data Validation
First I'll make sure the data is complete and doesn't contain any null values.
There are 8 fields, and they seem to be read in the correct format.
Next I'll check for missing values.
Great, there are no missing values.
Next I'll check how many unique values are in each field.
We're working with a dataset of people sending money in 2 currencies and from 7 countries to 3 currencies.
Exploratory Data Analysis
Next I'll do some exploratory analysis to better understand the data and see if there are any obvious trends.
The 2 numerical fields show there is 52,589 records. Since we're working with different currencies the summary statistics aren't very valuable to us.
I need to validate if the sender_amt is in the local currency or in a standardized currency such as USD. There are 2 currencies in this dataset: GBP and EUR.
The values are in similar distribution patterns, and the ranges overlap significantly, but the FX rate from Euro to Pound is 1.17 to 1 so it's not a big difference. I'll assume the values are in 2 different currencies and create a new field that shows all values in Euro to keep it simple.
Next we'll look at the total number of transactions over time.
We have data from 60 days, with a general upward trend in the number of transactions over time, suggesting growing popularity in the service.
There are some seasonality trends, which may require further investigation. There are also some spikes in demand, suggesting promotions or other circumstances that are engaging customers more.
Next lets look at top sender countries by amount sent.
The top sender country by far is Italy, followed by Spain.
Top Send Countries
Finally, lets look at top sender countries by send transaction volume.
We see a similar trend as the send amount chart.
Top Receive Countries
Lets do the same analysis as above but for receive countries, starting with transaction volume.
Because we only have the recipient network, we have to do some transformation of the country name.
Cameroon contributes 85% of total received transaction volume.
Customer Segmentation
Feature Selection
First we have to decide how we want to segment our customer base. What are the most important data points to segment these users?
Some options include by transaction volume or frequency (or both). Country? Do we want to segment based on sender or receiver attributes?
In order to determine the most important factors we have to define our goal of this segmentation.
Since these users are already customers we don't have to segment for user acquisition, so I'll define the goal as:
a. optimize for revenue, and
b. customer retention,
focusing on the sender's attributes, since they are paying the transaction costs and are our customers.
Assumptions:
a. higher send values are more valuable in terms of revenue for the company.
b. we can adjust the marketing campaign in each country by just adjusting language and don't have to consider any other factors such as cultural changes.
Revenue Segmentation
First we'll calculate the total revenue generated by each sender, then we can segment them into 3 bins:
High-revenue (top third of revenue generated)
Medium-revenue (middle third of revenue generated)
Low-revenue (bottom third of revenue generated)
Retention Segmentation
Next we'll look at how customers retention can be grouped. We'll use the distribution of transaction counts for segmentation.
Since the majority of users have made 1-2 transactions, it's quite skewed and won't result in a great segmentation. So we'll segment the users into the following bins:
Rare: 1 transaction
Occasional: 2 transactions
Frequent 3+ transactions
Combined Segmentation
Now that we have the 2 distributions, we can combine them.
Next we can visualize the combined segmentation.
As we see above, the largest segment is low-revenue, rare transacting customers.
Here we can see the segments by total sender amount, showing low rare is also sending the most money, due to the quantity of users, not because each person is sending a lot of money.
With this data, we now know where to focus our attention.