All posts

– by Deepnote team on January 6, 2022

How Not To Draw An Owl

Chances are you've experienced the initial excitement of learning something new followed quickly by a stark realization — you are not familiar with the prerequisites. You've actually got a mountain to climb! This article uses Great Expectations to highlight how Deepnote naturally lends itself to effective, context-based learning.

I've been thinking a lot lately about how to effectively learn new skills and technologies. I was recently studying data testing with Great Expectations. They have solid documentation, a human-readable CLI, automatically generated and narrated notebooks, and so much more. Data teams could hardly expect a better foundation upon which to learn how to test their data.

This experience did, however, remind me that as tools become more composable, difficulties may emerge with onboarding because tools don't exist in a vacuum. For example, if tool B depends on setting up tool A and C, we quickly get to a "how to draw an owl" situation, hence the memetic title.

Deepnote Puts The Pieces Together

Since Deepnote is designed to bring tools, teams, and workflows together, it has become clear to me that, in the context of learning, we can be much more than a compendium for demonstrating scientific tools. Instead, we can promote learning by allowing scientists to observe tools in their natural habitat, plugged neatly into their associated technologies. In other words, Deepnote embodies context-based learning.


Great Expectations

By way of example, let's take my recent foray into learning Great Expectations. For those who don't know, Great Expectations is the leading tool for validating, documenting, and profiling your data. Great Expectations brings the software development discipline of automated testing to data science teams.


(Image source here)

Their docs clearly state what they are and what they're not; however, in order to truly grok their value, data scientists will have to interact with "what they're not" sooner or later, and that is where the rub is. Great Expectations naturally shines when observed within a larger software ecosystem, as do many other tools (e.g., dbt, airflow, git). Let's take a look at how Deepnote puts the pieces together for you when learning Great Expectations.

Clean House Clean Mind

The getting started tutorial for Great Expectations is very well done — human-readable CLI commands, and automatically created and narrated notebooks. There is a fair bit of context switching though, from notebook to notebook, to CLI, to docs, and back. My thinking was that newcomers would be able to learn more effectively if context switching could be minimized, since it is cognitively costly (similar to multi-tasking). This frees up more "brain power" for internalizing new concepts and tool-specific parlance.


(Image adapted from here)

An all-in-one-place demonstration of the basics makes learning Great Expectations "cheaper" for the mind and Deepnote provides exactly this: It spins up a complete, runnable workflow that does not require the terminal, environment setup/installation, or multiple notebooks. Everything that is related to the learning experience is presented in the same place. No context switching needed.

The First Across The Pipeline

The very first item in what Great Expectations doesn't do relates to pipeline execution. They are not a pipeline execution framework. Makes perfect sense. The only problem with this is that we end up at the "how to draw an owl" problem again. Data testing of any real value will have to end up in a pipeline at some point. While Deepnote is not going to set up Airflow for you, it does provide a GUI for scheduling your notebooks. Scheduling puts your data testing into a production-level pipeline without any additional learning or peripheral setup.

Be A Good Host

One of the best features of Great Expectations is their data docs. Every time you validate tests against your data, Great Expectations builds a human-readable documentation site. These HTML docs describe your validation results and more. They are a continuously updated data quality report. In the image below, you can see a page from the data docs showing a failed set validation. The data docs are amazing!


(Image source here)

Unfortunately, we're back at the owl drawing issue again: Now we have to host these docs on the web so that our team can access them. There are likely plenty of data experts who don't want to deal with hosting sites at all. Now, Deepnote is not designed to host your personal website; however, we do allow incoming connections from the web to your cloud machine. This means that Great Expectations learners can spin up the data docs, and even share them publicly, without having to draw the whole owl, so to speak.

In Summary

There is a proliferation of tools that are capable of being integrated with other tools. On one hand, this is helpful, but it also comes at a cost. Learners are often smacked with a list of prerequisites so long and complex that observing tools in the wild, let alone adopting them, is too heavy a burden to bear.

When it came to learning Great Expectations, I couldn't help but to reflect on the snowballing effect of learning new technologies in general. The role Deepnote is playing with regards to learning is significant — it provides an instantly spun up, "terraformed" world, where related tools can be seen truly living together.

See for yourself — click the link to the notebook to observe Great Expectations in the wild and enjoy learning.

Share this post

Twitter icon

Join the world's best data teams and get started with Deepnote

No credit card required. Run your first notebook in seconds.

© 2022 Deepnote. All rights reserved.