Get started
← Back to all data apps

Customer churn ML playbook

By Katerina Hynkova

Updated on October 21, 2025

End-to-end machine learning workflow for predicting customer churn using behavioral, engagement, and demographic features. The analysis explores key churn indicators across sessions, NPS, tenure, and plan types, then builds interpretable models (Logistic Regression and Random Forest) achieving ~77% accuracy and 0.79 AUC. Includes feature importance analysis, partial dependence plots, and customer segmentation via PCA clustering.

Use template ->

Data patterns and signals

The exploratory analysis reveals clear behavioral differences between users who churn and those who stay.

Engagement metrics show the strongest separation. Churners concentrate below 6 sessions per week, while active users spread across higher frequencies. The sessions distribution is bimodal, suggesting distinct user archetypes. Days since last login correlates strongly with churn—users who've been inactive for extended periods are substantially more likely to leave.

NPS distributions diverge noticeably. Active users cluster around positive scores (median ~30), while churners show a wider spread with concentration at lower values, including a visible tail of detractors. This aligns with the correlation heatmap, where NPS shows negative correlation with churn alongside sessions per week and tenure months.

Plan type and acquisition channel matter. The Sankey flow diagram illustrates that referral and organic channels tend to feed into Premium and VIP plans with better retention. Free plan users naturally show higher churn rates, though this may reflect both product-market fit and conversion funnel dynamics. Monthly spend increases monotonically from Free to VIP as expected, but the relationship with churn isn't linear—some high-spending users still leave.

Geographic and demographic variation exists but isn't dominant. The treemap of user composition by region shows that EMEA and Americas have different plan-type distributions, with some markets skewing toward Free/Basic tiers. However, country features appear lower in the permutation importance rankings, suggesting engagement patterns outweigh geography for prediction.

Model performance and insights

Two models were trained using scikit-learn with proper preprocessing: Logistic Regression as an interpretable baseline and Random Forest for capturing non-linear interactions.

Both models perform similarly. Logistic Regression achieves 76% accuracy with F1 of 0.564 and AUC of 0.794. Random Forest reaches 77% accuracy with F1 of 0.531 and AUC of 0.789. The ROC curves nearly overlap, suggesting that linear relationships capture most of the signal. The precision-recall curves show the typical tradeoff—both models can be tuned depending on whether false positives or false negatives are more costly for the business.

Feature importance highlights what drives predictions. Permutation importance identifies player_level, sessions_per_week, days_since_last_login, tenure_months, and NPS as top predictors. Country features (Germany, Italy, France) appear but with lower importance. Behavioral engagement consistently outranks demographics.

Partial dependence plots quantify relationships. For tenure_months, churn probability drops sharply in the first 10 months then plateaus. Sessions per week shows a steady negative relationship—more sessions mean lower churn risk. Days since last login has a strong positive effect that accelerates after ~20 days of inactivity.

The confusion matrix shows room for threshold tuning. The Random Forest correctly identifies 128 of 139 non-churners (92%) but only catches 35 of 61 churners (57%). Depending on intervention costs, adjusting the classification threshold could shift this balance toward higher recall at the expense of precision.

Segmentation via PCA and KMeans reveals three broad user clusters. When overlaid with churn labels, certain segments show higher churn concentrations, suggesting that retention strategies could be segment-specific rather than one-size-fits-all. The clustering provides a foundation for targeted experiments.

Katerina Hynkova

That’s it, time to try Deepnote

Get started – it’s free
Book a demo

Footer

Solutions

  • Notebook
  • Data apps
  • Machine learning
  • Data teams

Product

Company

Comparisons

Resources

Footer

  • Privacy
  • Terms

© 2025 Deepnote. All rights reserved.