Behind Deepnote AI Copilot

Recently, Deepnote became the first data notebook with an integrated AI Copilot. It offers blazingly fast code suggestions as you type, letting you write code more quickly and with fewer errors. Let’s explore how Deepnote AI Copilot works behind the scenes.

Why AI Copilot?

Writing code takes time. You often have to write many repetitive code patterns, and writing the same stuff over and over isn't exciting. You also have to remember the correct syntax. Sure, that's not an issue for operations you use regularly, but those may not be enough when you're trying to solve the hardest problems effectively.

This becomes even more apparent in data notebooks, where you're frequently dealing with external libraries and tools. Data teams doing exploratory analysis aren't necessarily familiar with the exact structure of their data before they start. This leads to a situation where it's easy to make typos, mistakes, or even critical errors.

That’s why we decided to build Deepnote AI Copilot. It’s an integrated AI service that displays predictive inline suggestions (or, as Anton from Ramp likes to call them, ghost text) as you type. We have decided to partner with our friends at Codeium to provide cutting-edge code suggestions for both Python and SQL, making the most of Deepnote’s multi-lingual environment.

Building context

Large language models produce better results when given appropriate context. The same applies to the language model powering Deepnote's AI Copilot. To build a comprehensive context, we've constructed a context module made up of several key components, each contributing in a different way:

Notebook contents
Naturally, the main contextual component is the notebook's contents, as they represent the current state of your work. These contents are likely to include your initial steps as well as any updates you've made, giving us a clear picture of your intentions and potential next steps.
We can take advantage of the notebook's unique characteristics, like its content being divided into individual cells (or as we like to call them, blocks). This lets us select the most relevant blocks based on your current position in the notebook or the programming language you're working in.
Notebook metadata
In some cases, your notebook's contents just aren't enough, like when you're starting with an empty notebook. Passing in notebook metadata as context will hint to the language model your primary goal and the best way to get you started. This is enhanced by Deepnote's organizational structure, where you can group your notebooks into projects and categorize those projects into folders, creating an even better context. Notice how naming the project and notebook sets us off to relevant completions:
Runtime variables
References to variable names are already part of your notebook's contents, but that doesn't give Copilot the full picture. That's why we supplement this by pulling data from Deepnote's variable explorer, which holds references to all variables created during your machine's runtime. Each variable is passed along with its Python type and additional metadata, like element length in the case of an array.
This comes in handy when working with DataFrames, as Copilot knows their column names and data types. Because DataFrames are first-class citizens used to display the outputs of your blocks, you can reference them right away in the rest of your notebook. Take a look at how Copilot picks up a reference to a column name from a text block:
Additional context
Leveraging existing work across your workspace helps us give you an even better context. We search your workspace for blocks that might be relevant, scoring them based on how often you use them and other factors—like if they're in a regularly scheduled notebook or part of a published app.
Having this unique context is particularly useful, not only when you're starting with an empty notebook but also when you're working in SQL blocks. Imagine you need to query a database. Deepnote AI Copilot knows that you are using PostgreSQL and often join the users table with the orders table. So when you begin typing SELECT, Copilot predicts you might be interested in a query like SELECT first_name, last_name FROM users JOIN orders ON users.id = orders.user_id. It understands your SQL habits, the SQL dialect as well as your preferred column-naming conventions.

AI Copilot's context updates with each keystroke, specifically tailored for the block you're currently editing. The block's position in the notebook, its type, and additional metadata are key inputs to the contextual module that determine what information is passed as context. While providing more context generally improves the quality of the model's outputs, too much can be distracting and negatively impact output quality. So, the contextual module aims to not just select relevant data, but also to weed out irrelevant information.

Generating completions

With the right context, we can finally generate some code completions. But context actually isn't the main input for the model. The primary inputs are the content of the block you're editing and where your cursor is in the notebook. For both, we need to calculate position offsets from the start of the notebook and the block, respectively. Getting these offsets right ensures not only relevant code completions for what you're currently typing, but also the accurate placement of generated completions relative to your cursor.

Copilot suggesting fill in the middle completions

This is especially important for inline fill-in-the-middle completions. Besides completions at the start of an empty line or at the end of a line, Deepnote AI Copilot can generate completions for positions that are sandwiched between existing code, which becomes extremely useful when making edits.

No matter the position of the resulting code completion, all completions are generated via a request to Codeium with an encoded payload of the model's inputs. The response we get is decoded similarly. Code completions are first normalized, and then turned into objects containing the completion's text and range. This lets us display them right where your cursor is. Deepnote uses the Monaco editor, which also powers VS Code, allowing us to use Monaco's native inline completions API for showing the generated completions.

Completion post-processing

But if we were to show the generated code completions right away, we'd miss important steps such as filtering out irrelevant ones or keeping your current Deepnote workflows intact..

Large language models can make stuff up sometimes, and Deepnote Copilot is no different. While we haven't seen it invent a new pandas filtering method yet, it can spit out completions that don't really fit. For instance, the if __name__ == "__main__" completion doesn't make sense in a Jupyter notebook environment like Deepnote. We catch and remove such irrelevant completions. Similarly, we skip over suggestions that are just a single character long (after removing whitespace). We find these one-letter completions are more of a distraction than a help, so we leave that for you to fill in manually 😉.

Another feature that speeds up your coding is autocomplete, also known as IntelliSense. This tool offers method and variable name suggestions through a scrollable widget, and it's particularly useful for writing SQL statements since it knows your database schema. However, some early Copilot users told us that the two features can clash, each vying to suggest your next move. To resolve this, we've tweaked Copilot to step back when autocomplete is the better fit—for example, right after you type a period in SQL blocks.

Autocomplete taking priority over AI Copilot

Tips and tricks

Here are some tips to get the best out of AI Copilot, even if you're up against some unique challenges:

First off, make sure your Deepnote project and notebook have descriptive names. This will help you get relevant code suggestions right from the get-go
Use text blocks to outline what you're aiming to do. This gives better context and, in turn, better code completions
When you load up tabular data like .csv files, you'll expose the dataset's schema, which helps generate completions with the right column names and data types
Don't like the completion you see? No worries, just scroll through other options using the command shortcut (Option/Alt+[ and Option/Alt+])

Summary

The user feedback to Deepnote AI Copilot has been overwhelmingly positive. People love how spot-on the suggestions are and how they just fit right into their coding flow. Today, Copilot saves our users from typing over 300,000 characters each week - about as many as the number of 'life hacks' one can find on the internet that nobody actually uses.

We're always tweaking things behind the scenes to make all our Deepnote AI features, including Deepnote AI Copilot, even better. We believe that with AI at its heart, Deepnote will revolutionize how data teams solve the hardest problems, making coding faster, smarter, and a lot more fun. If you haven’t given it a go yet, sign up and take Copilot for a spin!

Behind Deepnote AI Copilot

Why AI Copilot?

Building context

Generating completions

Completion post-processing

Tips and tricks

Summary

Blog

Introducing modules: reusable workflows for your entire team

Beyond AI chatbots: how we tripled engagement with Deepnote AI

How we made data apps 40% faster

That’s it, time to try Deepnote