Sign inGet started
← Back to all posts

Collaborative Jupyter notebook: a checklist for data teams

By Eric Wendt

Updated on April 10, 2023

Searching for a collaborative Jupyter notebook? Start by understanding which boxes need to be checked for your data team.

Illustrative image for blog post

Spend enough time on an online message board built for data professionals and you’ll spot it: someone on the hunt for a collaborative Jupyter notebook.

They’re struggling to share their notebooks with team members. They can’t collaborate with teammates in real time. And giving and receiving feedback is a chore.

So they send out an SOS to the community and the suggestions come flooding in: various hacks, Jupyter notebook extensions, and web apps. But these workarounds are often stopgap measures, and they only move the needle so far.

To solve the underlying problem, it’s helpful to understand what a collaborative Jupyter notebook actually looks like. Here’s a checklist of what you need to help your data team work better together.

invite-members.png

Item #1: shareability

Sharing a notebook with team members should be as simple as sending a link. Unfortunately, that’s not the case with Jupyter. Instead, it’s usually a multi-step process.

You’re either downloading your notebooks as PDFs and sending them to a teammate (removing all interactivity and reproducibility) or asking them to open .ipynb files on their local machine, something that requires significant setup (database connection, environment configuration, etc.). There’s also the nbconvert-to-html dance, but it’s slow and leaves much to be desired in terms of security. And while JupyterHub sounds nice in theory, it also means buying and managing your own dedicated servers.

There are options that make it easier to quickly view notebooks (e.g., GitHub, nbviewer, Binder), but the notebooks themselves are either static documents or only exist in isolated environments that prevent meaningful collaboration.

This was an issue the team at Webflow struggled with. Team members tried sharing individual GitHub repositories to collaborate, but the headache of using this method had the opposite effect.

“No one actively collaborated on their code,” said Webflow’s Senior Manager of Data Science & Analytics Allie Russell.

That’s when the team started using a cloud-based data notebook.

“There’s nothing that filled the space in our stack that allowed us to do an analysis and share it in a repeatable way until we adopted it,” Russell said.

Shareability is also crucial for working with business stakeholders. Non-technical team members may struggle to navigate a standard Jupyter notebook, which is why so many data professionals end up wasting time porting screenshots from their notebooks into other documents.

The data team at Gusto wanted to avoid this, so it opted for a data notebook that makes it easy to publish notebooks as shareable articles, dashboards, and apps.

“Because we partner with many people across the company — we might have 200 people viewing our analysis — the ability to publish notebooks into an article using Deepnote has been a game changer,” said Gusto’s Head of People Analytics & Insights Scott Jacobsen. “Now all of our important stakeholders can work and interact with the article.”

As Jacobsen put it, Gusto is no longer “copying and pasting screenshots from external sources.”

sql_block.png

Item #2: reproducibility

Team members should be able to jump right into your notebook and reproduce it with zero friction.

But with Jupyter notebooks, everything your teammates need to replicate your work — datasets, environments, files, etc. — is stuck on your local machine.

Teammates have to find the right credentials, hope they have access to the same tables, connect to the same data sources, install the very same Python and system libraries, and so on. And that process must be followed for each and every team member who needs to take a peek at your notebook.

Long story short, teams end up spending more time setting up than collaborating.

Cloud-based data notebooks allow team members to share integrations and environments, eliminating the hassle of repeatedly connecting to databases and managing library differences.

Said Webflow’s Allie Russell: “Currently, I have a team member on leave. Deepnote allows me to look into her work without having to understand her environment or running into a bunch of errors, and that’s pretty powerful.”

comment.png

Item #3: interactivity

Team members should be able to share the same environment simultaneously, whether to code together or simply give each other feedback.

But that’s not possible with a Jupyter notebook. Teammates must take turns editing code, saving the new version of the notebook, sharing it, opening it back up, and so on, ad nauseam. Meanwhile, any conversations teammates want to have about the analysis have to happen on other channels.

It’s even worse when you’re working with non-data folks. Without a way to easily bring business stakeholders into the exploratory analysis phase, organizations end up waiting until a project is finished to share it for feedback — opening the door to mistakes, missed opportunities, and being forced to start back at square one. We’ve all suffered the indignity of polishing something up, sharing it, and then getting a barrage of change requests (that final output quickly stops feeling so final).

But cloud-based notebooks allow multiple team members to share the same environment at the same time for real-time collaboration — or leave comments for each other within the notebook for asynchronous teamwork. That’s why the team at Slido went looking for an alternative to the status quo.

“Since metrics require a lot of input from subject matter experts, data consumers, and business stakeholders to define and align on definitions, we needed a collaborative layer where we could get immediate feedback,” said Slido’s Head of Analytics Engineering Michal Koláček.

Webflow’s Allie Russell agrees.

“Most other tools are made for the final version of a product — they skip past intermediate steps where feedback and buy-in are critical,” Russell said. “To be able to bring people along with the data work, especially remotely, is hugely valuable.”

workspace_projects.png

Item #4: discoverability

Data collaboration isn’t a one-time event, it’s a continuous process. Teams should be able to save and organize notebooks into searchable repositories — not as static documents, but as fully executable notebooks.

But Jupyter notebooks make organization and discoverability difficult. You can create a GitHub repository, but these are static documents. Great for version control, not so much for organizing interrelated projects.

The team at Gusto faced a similar problem. Team members had their work spread across multiple tools — from SQL editors to Tableau. Visibility into each other’s work suffered, and valuable time was wasted looking for relevant analyses.

But that’s no longer the case now that Gusto is using a cloud-based data notebook. Both technical and non-technical team members can easily find the work of their teammates and, based on the permissions they’ve been granted, access notebooks as viewers, contributors, editors, and more.

“It brings the team together in a central platform to collaborate on code,” Gusto’s Scott Jacobsen said.

This way, teammates can keep track of each other’s projects and build on each other’s work over time instead of trying to reinvent the wheel.

“We can all see each other’s code and it makes writing new analyses far easier since we’re never starting from scratch,” Jacobsen said.

See Deepnote in action in the video below:

A collaborative Jupyter notebook isn’t a Jupyter notebook at all

When you understand which features and functionalities are necessary for a collaborative data notebook, it becomes clear that Jupyter isn’t the best option.

Offline notebooks aren’t designed to be collaborative, and there’s no comprehensive fix for that. What data teams need is a notebook that takes teams from the endless back-and-forth of locally hosted notebooks to the seamless collaboration of online notebooks.

Cloud-based notebooks make it easy to share your work with others, reproduce it, work on it together at the same time, and organize it for team members and stakeholders to access when they need it.

The answer to how to collaborate with a Jupyter notebook is surprisingly straightforward: Use a notebook that’s built to be collaborative instead.

See what a collaborative Jupyter notebook looks like with Deepnote

Get started for free to see how you can explore, collaborate on, and share data with ease.

Eric Wendt

Senior Content Editor @ Deepnote

Follow Eric on LinkedIn

Blog

Illustrative image for blog post

How to effectively prompt Deepnote AI

By Ondřej Romancov

Updated on March 26, 2024

That’s it, time to try Deepnote

Get started – it’s free
Book a demo

Footer

Product

  • Integrations
  • Pricing
  • Documentation
  • Changelog
  • Security

Company

Comparisons

Resources

  • Privacy
  • Terms

© Deepnote