This article is interactive. You can change its parameters and see the results live. ⚡️
User Retention Charts
1. Read the data
Saved to variable
floor(extract('day'from session_started_at - signed_up_at)/7)as week -- The number of weeks that passed since the user signed upfrom
leftjoin users on sessions.user_id = users.user_id
2. Select the time range
We have data for the last 365 days, but we don't usually need to look so far back. Let's restrict our analysis to a specific time range.
import pandas as pd
# Round these time ranges to the nearest week
week_start = pd.to_datetime(StartDate).to_period('W').to_timestamp().tz_localize('UTC')
week_end = pd.to_datetime(EndDate).to_period('W').to_timestamp().tz_localize('UTC')# Filter out any cohorts outside our time range
sessions_weekly = sessions_weekly.loc[(sessions_weekly.signed_up_at_week >= week_start)&(sessions_weekly.signed_up_at_week <= week_end)].copy()
retention = sessions_weekly.copy()# Save the cohort size before we start calculating retention
cohort_size = retention.groupby(['signed_up_at_week']).user_id.nunique().reset_index(name='cohort_size')# For each cohort-week, calculate the number of users who visited
retention = retention.groupby(['signed_up_at_week','week']).user_id.nunique().reset_index(name='n_users')# Pivot and melt the table. This is a little trick that allows us to add rows during weeks where a cohort was not active.
retention = retention.pivot(index=['signed_up_at_week'],columns='week',values='n_users').fillna(0)
retention = retention.melt(value_name='n_users',ignore_index=False).reset_index()# If part of the cohort is still not finished the week, then exclude that cohort-week from the data# To do this we add 6 days to the sign up date. This gives us the last sign ups of that cohort.
retention = retention.loc[~(retention.signed_up_at_week + pd.to_timedelta(retention.week +1,'W')+ pd.Timedelta(6,'D')>
pd.Timestamp.now(tz='UTC').floor('D'))]# Divide by the cohort size to get a percentage
retention = retention.merge(cohort_size,on=['signed_up_at_week'])
retention['prop']= retention.n_users / retention.cohort_size
# Rename the columns so they are more pretty
retention = retention.rename(columns =dict(
n_users='Number of users',
Let's start by plotting a time series of each cohort's retention rate. This is useful for identifying changes over time. For example, we can see that our cohorts in September are performing much better than the ones in May through August.
Sometimes it's helpful to look at the number of users, not just the retention rate. It's also useful to visualize retention as a matrix, to find any outliers.
import altair as alt
x=alt.X('Week:N', title='Week', axis=alt.Axis(labelAngle=0)),
alt.Text('Number of users',format=''),)(rect+text).properties(data=retention, width=800, height=500)