The past, present, & future of notebooks

Data notebooks have come a long way since their introduction. Here's how we got here, where the market is at, and predictions for the future.

Since Mathematica was released over 30 years ago, notebooks have evolved into the go-to tool for data scientists across academia and the workplace.

In the next few years, we’re going to see a big shift in how we use notebooks for day-to-day work and the types of problems we’ll be able to solve. Let’s have a look at what’s to come.

Larry Tesler on Xerox Alto, the first computer with a graphical user interface, courtesy of Xerox PARC

Exploratory data programming

To understand notebooks, we first have to understand exploratory programming.

Most of us are probably familiar with software engineering best practices, which have been studied in detail for decades. There's a massive support ecosystem out there (think agile software development, Git versioning, continuous integration, infrastructure providers, etc.) to help us take a mockup of a prototype all the way to production.

But while this ecosystem is well adapted for building applications, it doesn't translate well into data science and analytics.

Historically, data teams have had to rely on tools from the software engineering world and, as a result, have adopted the same mental models, workflows, and principles.

But the goal of a data professional isn't always to ship a working product. Instead, they spend most of their time uncovering insights. Rather than a deployed app, their goal is to uncover and understand an underlying problem.

Getting to this insight is an output of exploratory analysis and comes in multiple forms. Sometimes it's a chart. Sometimes it's a well-documented presentation. Sometimes it's simply an analyst's improved understanding of a domain.

Data teams write code to uncover insights. And we call this exploratory programming.

Notebooks 1.0: The birth of the notebook

When it comes to exploratory programming, notebooks happened to be the perfect tool.

Unlike traditional code editors, notebooks allow users to run queries, write code, visualize data sets, and document thought processes as a proposal — all in one medium. The format itself significantly lowers the barrier to entry to exploratory programming. Beginners can jump into a notebook and start writing documents and building visualizations. And the format scales as the users begin introducing more complex models by writing code.

Mathematica 1.0, 1988

Introduced in 1988 by Wolfram Mathematica, the first notebooks (Mathematica, MathCad, Maple) were very much focused on pure mathematics. They had to be. Data science as we know it today didn't exist back then. And for the most part, notebooks remained in the academic world of mathematics, statistics, and physics.

Mathematica was a niche product aimed at a technical crowd (which it remains to this day). The first notebooks were so specialized that they never became popular among a wider audience. Even today, many people are not familiar with notebooks. And that's because, around the same time that notebooks emerged, a different idea started to gain much more traction.

The era of the spreadsheet

Since the first versions of VisiCalc in 1979, spreadsheets became the killer app and popularized personal computers for business applications.

VisiCalc spreadsheet on an Apple II

In a story too long to tell here, Excel emerged as the market leader, beating out VisiCalc, Lotus 1-2-3, and others. Excel's tabular view, along with declarative programming and reactive recomputing, provided an elegant model for working with data. It was so simple it could be picked up by virtually anyone.

Seemingly overnight, spreadsheets took center stage and notebooks were relegated to a niche tool known only to a small number of people in even smaller circles.

While nearly anyone could access and grok a spreadsheet, it took a Ph.D. in mathematics or statistics to even get exposure to a notebook interface.

Until Jupyter, that is.

Notebooks 2.0

On December 21, 2011, IPython 0.12 was released. Along with some features and bug fixes, the main highlight of this release was an all-new browser-based, interactive notebook.

It was the beginning of the Jupyter notebook.

IPython 0.12

The environment retained all the features of the familiar console-based IPython while providing a cell-based execution workflow. It supported both code and any element a modern browser could display. You could create interactive computational documents with explanatory text, results of computations, figures, videos, etc.

The timing was perfect. More than 20 years after the emergence of the first notebooks, there was suddenly a market that needed to analyze large amounts of unstructured data, which was not possible within the confines of a spreadsheet.

A new generation of notebooks in the form of Jupyter started to take off. And today, like with spreadsheets some 30 years ago, you'd have a hard time finding a data science team not utilizing the notebook format in some shape or form.

So why are notebooks back?

Usability vs. power

When it comes to the adoption of a new technology or tool, there is often friction between user-friendliness and power.

This rule holds true in data science and analytics. On one side of the market, we have spreadsheets, which are so easy to use that Excel users don't even realize they're writing software. And on the other side, we have programming languages like Python, which have a much higher ceiling of what you can do with them but also have a much higher barrier to entry.

Spreadsheets have been the winning computing paradigm for exploratory programming over the past few decades. However, the same properties — reactivity and declarative-ness — that made spreadsheets popular are also limiting factors when projects need to scale. The mix of code and data in the same document provides an initial productivity boost but comes at the cost of performance issues, lack of data provenance, and poor versioning.

That cost is known as the low ceiling. There's a limit to what you can do in spreadsheets.

Some of us know this from hitting Excel's hard limits (such as the infamous 1M row limit) and limitations around collaboration (you know what I'm talking about if you've ever tried explaining a slightly more complicated Excel model to someone). It's no surprise that Excel users prefer to work alone. The cost of inviting someone else is often too high. And, as data sets grow, more and more users are starting to hit the ceiling.

When it comes to exploratory programming, we once had just two options:

Explore data in spreadsheets
Jump to the other side of the spectrum and use traditional programming languages with IDEs

The problem here is the gap between Excel and Python — we were missing a tool with the accessibility and simplicity of a spreadsheet that can also accommodate the scale of large data sets and the power of code.

Enter a new era of notebooks.

Notebooks 3.0

Modern notebooks are here to bridge the wide gap between spreadsheets and IDEs.

Notebooks are already powerful (pretty much as powerful as any programming environment) and are quickly turning from a tool that people with Ph.Ds use to a tool accessible to anyone. While it’s true that spreadsheets are also becoming more powerful (e.g., Excel is now Turing complete), notebooks are also becoming just as user-friendly as spreadsheets while already being more powerful.

Looking back at how tooling evolves as new industries emerge, an interesting pattern stands out: Tools are initially shaped by a few power users at the early stage of a market’s formation. As the market matures, features for the masses take center stage, lowering barriers to entry and leading to the widespread adoption and redefinition of a new market.

That doesn't exactly mean that old tools die. On the contrary, adoption oftentimes keeps growing. But not as fast as the new generation of tooling that makes the field accessible to newcomers.

To paint a picture, let’s take a look at the story of Figma.

The story of a design file

There's no better place to see the trend of collaboration taking over the world than the field of graphic design.

Much like data science, graphic design started as a specialized field. It took serious skill and dedication to become a great designer. And if a non-designer needed something visual, they'd have to go and ask the designers for help. It was also expensive for corporations to purchase a license for users who would barely use a full-featured design tool.

But design couldn't be contained within a single design department.

A new generation of tools like Figma and Canva has democratized access to design. Everyone from designers to engineers to product managers can collaborate and contribute to the latest product mockups or comment on the new branding.

And while it’s true that Photoshop is not going away anytime soon (adoption might even slowly grow), a new generation of tools are bringing design into the hands of everyone across an organization at an exponentially faster clip.

Data & design

There's a lot to learn from the evolution of the design industry, as we see a similar shift occurring in the data space.

Much like design, data work can't be constrained within a single team. An entire organization benefits from access to data and insights.

While the data science and analytics tools we use now will not entirely go away, over time they will specialize to suit the needs of a few power users, much as Photoshop did. And we'll see a crop of new tools appear that radically expand access to data by lowering the barrier to entry.

Rather than becoming more beginner-friendly, tools like Jupyter will continue to serve an ever-narrower audience of high-demand power users. In this sense, it's unlikely that the current generation of notebooks will serve a much broader, less-technical user base.

It's the next generation of notebooks that will be accessible to a much broader audience and bring notebooks from just a few specialized professionals into everyone's hands.

Market predictions

As the data science and analytics market matures and the workforce expands, we'll start to see a real sea change.

With a lower barrier to entry, notebooks as a medium for analysis will transform from an isolated tool to a collaborative medium. Notebooks will transition from a niche format available to a small subset of experts to a communication tool for the whole organization.

The user interface of notebooks will shift from being optimized for power users to a friendly user interface that anyone can use. Over time, notebooks will become as easy to work with as Notion or Google Docs. Those able to read or modify a spreadsheet will be able to read and modify a notebook.

As more teams with requirements for remote access to insights use notebooks, we’ll see the transition from notebooks running locally to running in the cloud. Just like Figma made sharing and accessing design files as simple as opening a URL in a browser, notebooks will do the same for data projects.

We'll see a shift from notebooks being isolated within a data science team to becoming a shared library of knowledge within an organization. Access to data will become more ubiquitous, as integrations to the data warehouse and data tools will be set up in minutes and shared with anyone who has the right permissions.

Such a setup will also be more secure than a siloed approach — no leaking of user data and PII through screenshots or emails. Management of passwords and secrets will be centralized and workflows will be secure and compliant by default.

Designing the data science & analytics platform of the future

In the best data-driven organizations, data teams are not centralized — experts are dispersed throughout the organization. Data scientists often don't even have the title of data scientist. They're domain experts, enabled to get the insights they need from a well-organized data platform.

In such an organization, collaboration doesn't happen just within data teams. The whole organization is a data team. This is not the standard today, but it's just a matter of time until data work becomes fully collaborative. Notebooks will be the perfect medium for that.

Deepnote: a notebook for everyone

We’re building a data notebook for the future at Deepnote. Our goal is to make data science and analytics more accessible by focusing on three core pain points:

1. Collaboration

Data science and analytics are collaborative by nature, so we made collaboration a first-class citizen in Deepnote. We envisioned a collaborative space where sharing the work of data teams is as simple as sending a link, analysis is done in real time with multiplayer mode if needed, and everything is organized and hosted in a single place.

2. Connectivity

We built Deepnote to play nicely with the modern data stack because meaningful insights don’t come from fussing with integrations. Since its launch, we’ve added dozens of native integrations to Deepnote and will continue to make access to data a non-issue.

3. Productivity

We launched Deepnote with the ability to write clean code, define dependencies, and create reproducible notebooks. Deepnote gives data teams superpowers.

We’re building a new standard in data tooling — a notebook that brings teams together to explore, collaborate on, and share data from start to finish. We hope you'll join us.

The past, present & future of notebooks