Great Expectations

Great Expectations (GE) is a tool for data testing and documentation. Onboarding to GE, however, usually comes with a few challenges for newcomers such as switching between multiple notebooks, using the terminal, and hosting documentation. This guide will fast-forward you through any pain points and enable you to bring the software development discipline of automated testing to your data science team.

How to set it up

All default Python environments in Deepnote come with preinstalled Pandas (learn more about all preinstalled packages here). GE can be installed through a simple !pip install great_expectations . All that is needed to get started with GE, then, is to initialize GE via !great_expectations --yes --v3-api init. Note that both of these statements could also be run in a terminal within your Deepnote project (without the !, of course).

How to use

Once initialized, you can start using Great Expectations within your notebooks. In the example below, three Expectations (tests) are defined on the fictitious df_pass Pandas DataFrame. In simple terms,

the skill cannot contain null values
the runner column must contain unique values
the total_time column must have values between 70 and 100

# import pandas and great_expectations
import pandas as pd
import great_expectations as ge

# initialize a Pandas DataFrame
df_pass = ge.from_pandas(df_pass)

# define Expectations
df_pass.expect_column_values_to_not_be_null('skill')
df_pass.expect_column_values_to_be_unique('runner')
df_pass.expect_column_values_to_be_between('total_time', 70, 100)

Next steps

Jump right into Deepnote & take a look at this thorough walkthrough of Great Expectations in Deepnote. You can also save yourself some setup work by hitting the View source button first before clicking on Duplicate in the top-right corner to start exploring on your own!