By Deepnote team on November 2, 2022
The past, present, and future of notebooks
Since Mathematica was released over 30 years ago, notebooks have evolved into the go-to tool for data scientists across academia and the workplace. In the next few years, we’re going to see a big shift in how we use notebooks for day-to-day work and the types of problems we’ll be able to solve. Let’s have a look at what’s to come...
Larry Tesler on Xerox Alto, the first computer with graphical user interface, courtesy of Xerox PARC
Exploratory data programming
To understand notebooks, we first have to understand exploratory programming.
Most of us are probably familiar with software engineering best practices which have been studied in detail for decades. And there's a massive support ecosystem out there (think agile software development, git versioning, continuous integration, infrastructure providers) to help us get from a mockup of a prototype and take it all the way to production.
But while this ecosystem is well adapted for building applications, it doesn't translate well into data science and analytics.
Historically, data scientists have had to rely on tools from the software engineering world and, as a result, have adopted the same mental models, the same workflows, and the same principles.
But the goal of a data scientist isn't always to ship a working product. Instead, data scientists spend most of their time on uncovering insights. Rather than a deployed app, their goal is to uncover and understand an underlying problem.
Getting to this insight is an output of an exploratory analysis and comes in multiple forms: Sometimes it's a chart. Sometimes it's a well-documented presentation. Sometimes it's simply an analyst's improved understanding of a domain.
Data scientists write code to uncover insights. And we call this exploratory programming.
Notebooks 1.0: The birth of the notebook
When it comes to exploratory programming, notebooks happened to be the perfect tool.
Unlike traditional code editors, notebooks allow users to run queries, write code, visualize datasets, and document thought processes as a proposal — all in one medium. The format itself significantly lowers the barrier to entry to exploratory programming. Beginners can jump into a notebook and start writing documents and building visualizations. And the format scales as the users begin introducing more complex models by writing code.
Mathematica 1.0, 1988
Introduced in 1988 by Wolfram Mathematica, the first notebooks (Mathematica, MathCad, Maple) were very much focused on pure mathematics. They had to be. Data science as we know it today didn't exist back then. And for the most part, notebooks remained in the academic world of mathematics, statistics, and physics.
Mathematica was a niche product aimed at a technical crowd (which it remains to this day). The first notebooks were so specialized that they never became popular among a wider audience. Even today, many people are not familiar with notebooks. And that's because around the same time that notebooks emerged, a different idea started to gain much more traction.
The era of spreadsheet
Since the first versions of VisiCalc in 1979, spreadsheets became the killer app and popularized personal computers for business applications.
VisiCalc spreadsheet on an Apple II
In a story too long to tell here, Excel emerged as the market leader beating out VisiCalc, Lotus 1-2-3, and others. Excel's tabular view, along with declarative programming and reactive recomputing, provided an elegant model for working with data. It was so simple, it could be picked up by virtually anyone.
Seemingly overnight, spreadsheets took center stage and notebooks were relegated into a niche tool known only to a small number of people in even smaller circles.
While nearly anyone could access and grok a spreadsheet, it took Ph.D. in mathematics or statistics to even get exposure to a notebook interface.
Until Jupyter that is…
On December 21 2011, IPython 0.12 was released. Along with some features and bug fixes, the main highlight of this release was an all-new browser-based interactive notebook.
It was the beginning of the Jupyter notebook.
The environment retained all the features of the familiar console-based IPython while providing a cell-based execution workflow. It supported both code and any element a modern browser could display. You could create interactive computational documents with explanatory text, results of computations, figures, videos, etc.
The timing was perfect. After more than 20 years since the emergence of the first notebooks, there was suddenly a market that needed to analyze large amounts of unstructured data which was not possible within the confines of a spreadsheets.
A new generation of notebooks in the form of Jupyter started to take off. And today, like with spreadsheets some 30 years ago, you'd have a hard time finding a data science team not utilizing the notebook format in some shape or form.
So why are notebooks back?
Usability vs. power
When it comes to the adoption of a new technology or tool, there is often friction between user-friendliness and power.
This rule holds true in data science & analytics. On one side of the market, we have spreadsheets, which are so easy to use that Excel users don't even realize they're writing software. And on the other side, we have programming languages like Python, which have a much higher ceiling of what you can do with them but also have a much higher barrier to entry.
Spreadsheets have been the winning computing paradigm for exploratory programming over the past few decades. However, the same properties — reactivity and declarative-ness — that made spreadsheets popular are also a limiting factor when projects need to scale. The mix of code and data in the same document provides an initial productivity boost but comes at the cost of performance issues, lack of data provenance, and poor versioning.
That cost is known as the low ceiling. There's a limit to what you can do in spreadsheets.
Some of us know this from hitting Excel's hard limits, such as the infamous 1M row limit, and soft limits around collaboration. (You know what I'm talking about if you've ever tried explaining a slightly more complicated excel model to someone.) It's no surprise that excel users prefer to work alone. The cost of inviting someone else is often too high. And, as data sets grow, more and more users are starting to hit the ceiling.
When it comes to exploratory programming, we once had just two options: 1) explore data in spreadsheets or 2) jump to the other side of the spectrum and use traditional programming languages with IDEs.
The problem here is the gap between Excel and Python — we were missing a tool with the accessibility and simplicity of a spreadsheet that can also accommodate the scale of large data sets and the power of code.
Enter a new era of notebooks.
Modern notebooks are here to bridge the wide gap between spreadsheets and IDEs.
Notebooks are already powerful (pretty much as powerful as any programming environment) and are quickly turning from a tool that people with PhDs use to a tool accessible to anyone. While it’s true that spreadsheets are also becoming more powerful (e.g., Excel is now Turing complete), notebooks are also becoming just as user friendly as spreadsheets while already being more powerful.
Looking back at how tooling evolves as new industries emerge, an interesting pattern stands out: tools are initially shaped by a few power users at the early stage of a market’s formation. As the market matures, features for the masses take center stage, lowering barriers to entry and leading to the widespread adoption and redefinition of a new market.
That doesn't exactly mean that old tools die. On the contrary, adoption oftentimes keeps growing. But not as fast as the new generation of tooling that makes the field accessible to the newcomers.
To paint a picture, let’s take a look at the story of Figma...
The story of a design file
There's no better place to see the trend of collaboration taking over the world than the field of graphic design.
Much like data science, graphic design started as a specialized field. It took serious skill and dedication to become a great designer. And if a non-designer needed something visual, they'd have to go and ask the designers for help. It was also expensive for corporations to purchase a license for users who would barely use a fully-featured design tool.
But design couldn't be contained within a single design department.
A new generation of tools like Figma and Canva have democratized access to design. Everyone from designers to engineers to product managers can collaborate and contribute to the latest product mockups or comment on the new branding.
And while it’s true that Photoshop is not going away anytime soon (adoption might even slowly grow), a new generation of tools (i.e., Figma) are bringing design into the hands of everyone across an organization at an exponentially faster clip.
Data science & design
There's a lot to learn from the evolution of the design industry as we see a similar shift occuring in data science.
Much like design, data science can't be constrained within a single data science team. An entire organization benefits from access to data and insights.
While the data science and analytics tools we use now will not entirely go away, over time they will specialize to suit the needs of a few power users, much as Photoshop did. And we'll see a crop of new tools appear that radically expand access to data by lowering the barrier to entry.
Rather than becoming more beginner-friendly, tools like Jupyter will continue to serve an ever-narrower audience of high-demand power users. In this sense, it's unlikely that the current generation of notebooks will serve a much broader, less-technical user base.
It's the next generation of notebooks that will be accessible to a much broader audience and bring notebooks from just a few specialized professionals into everyone's hands.
As the data science and analytics market matures and the workforce expands, we'll start to see a real sea change.
With a lower barrier to entry, notebooks as a medium for data science will transform from an isolated tool to a collaborative medium. Notebooks will transition from a niche format available to a small subset of experts to a communication tool for the whole organization.
The user interface of notebooks will shift from being optimized for power users to a friendly user interface that anyone can use. Over time, notebooks will become as easy to work with as Notion or Google Sheets. Those able to read or modify a spreadsheet will be able to read and modify a notebook.
As more teams use notebooks with requirements for remote access to insights, we’ll see the transition from notebooks running locally to running in the cloud. Just like Figma made sharing and access to design files as simple as opening a URL in a browser, notebooks will do the same for data projects.
We'll see a shift from notebooks being isolated within a data science team to becoming a shared library of knowledge within an organization. Access to data will become more ubiquitous as integrations to the data warehouse and data tools will be set up in minutes and shared with anyone who has the right permissions.
Such a setup will also be more secure than a siloed approach — no leaking of user data and PIIs through screenshots or emails. Management of passwords and secrets will be centralized, and workflows will be secure and compliant by default.
Designing a data science and analytics platform of the future
In the best data-driven organizations, data teams are not centralized, and data scientists are dispersed throughout the organization. Data scientists often don't even have the title of data scientist. They are domain experts, enabled to get the insights they need from a well-organized data platform.
In such an organization, collaboration doesn't happen just within data teams. The whole organization is a data team. This is not the standard today, but it's just a matter of time until data science becomes fully collaborative. Notebooks will be the perfect medium for that.
Deepnote: a notebook for everyone
We’re building a data notebook for the future at Deepnote. Our goal is to make data science and analytics more accessible by focusing on three core pain points:
- Collaboration: Data science and analytics are collaborative by nature so we made collaboration a first-class citizen in Deepnote. We envisioned a collaborative space where sharing the work of data teams is simple as sending a link, analysis is done in real-time with multiplayer mode if needed, and everything is organized and hosted in a single place.
- Connectivity: We built Deepnote to play nicely with the modern data stack because meaningful insights don’t come from fussing with integrations. Since launch we’ve added dozens of native integrations and will continue to make accessibility to data a non-issue.
- Productivity: We launched Deepnote with the ability write clean code, define dependencies, and create reproducible notebooks. Deepnote gives data teams superpowers.
We’re building a new standard in data tooling — a notebook that brings teams together to explore, analyze and present data from start to finish. And we hope you join us.
Share this post
Join the world's best data teams and get started with Deepnote
No credit card required. Run your first notebook in seconds.