User Retention Charts
1. Read the data
select
sessions.user_id,
date_trunc('week',users.signed_up_at) as signed_up_at_week,
floor(extract('day' from session_started_at - signed_up_at)/7) as week -- The number of weeks that passed since the user signed up
from
sessions
left join users on sessions.user_id = users.user_id
2. Select the time range
We have data for the last 365 days, but we don't usually need to look so far back. Let's restrict our analysis to a specific time range.
StartDate
EndDate
import pandas as pd
# Round these time ranges to the nearest week
week_start = pd.to_datetime(StartDate).to_period('W').to_timestamp().tz_localize('UTC')
week_end = pd.to_datetime(EndDate).to_period('W').to_timestamp().tz_localize('UTC')
# Filter out any cohorts outside our time range
sessions_weekly = sessions_weekly.loc[
(sessions_weekly.signed_up_at_week >= week_start) & (sessions_weekly.signed_up_at_week <= week_end)
].copy()
df = sessions_weekly.copy()
df.info()
3. Calculate retention
retention = sessions_weekly.copy()
# Save the cohort size before we start calculating retention
cohort_size = retention.groupby(['signed_up_at_week']).user_id.nunique().reset_index(name='cohort_size')
# For each cohort-week, calculate the number of users who visited
retention = retention.groupby(['signed_up_at_week','week']).user_id.nunique().reset_index(name='n_users')
# Pivot and melt the table. This is a little trick that allows us to add rows during weeks where a cohort was not active.
retention = retention.pivot(index=['signed_up_at_week'],columns='week',values='n_users').fillna(0)
retention = retention.melt(value_name='n_users',ignore_index=False).reset_index()
# If part of the cohort is still not finished the week, then exclude that cohort-week from the data
# To do this we add 6 days to the sign up date. This gives us the last sign ups of that cohort.
retention = retention.loc[
~(retention.signed_up_at_week + pd.to_timedelta(retention.week + 1,'W') + pd.Timedelta(6,'D') >
pd.Timestamp.now(tz='UTC').floor('D'))
]
# Divide by the cohort size to get a percentage
retention = retention.merge(cohort_size,on=['signed_up_at_week'])
retention['prop'] = retention.n_users / retention.cohort_size
4. Visualize
# Rename the columns so they are more pretty
retention = retention.rename(columns = dict(
prop='Retention',
week='Week',
signed_up_at_week='Cohort',
n_users='Number of users',
cohort_size='Cohort size'
))
Let's start by plotting a time series of each cohort's retention rate. This is useful for identifying changes over time. For example, we can see that our cohorts in September are performing much better than the ones in May through August.
Sometimes it's helpful to look at the number of users, not just the retention rate. It's also useful to visualize retention as a matrix, to find any outliers.
import altair as alt
base=alt.Chart().encode(
x=alt.X('Week:N', title='Week', axis=alt.Axis(labelAngle=0)),
y=alt.Y('monthdate(Cohort):N', title='Cohort')
)
rect=base.mark_rect(tooltip=True).encode(
alt.Color('Retention', scale=alt.Scale(scheme='greenblue'))
)
text=base.mark_text(tooltip=True, color='white').encode(
alt.Text('Number of users', format=''),
)
(rect+text).properties(data=retention, width=800, height=500)