Deepnote data apps provide a convenient way to transform a notebook into a shareable, interactive dashboard that requires no specific knowledge of notebooks or Python to use. Apps act as a presentation layer on top of a notebook with their own Python kernel, not shared between app visitors.
Notebooks have inherent issues with reproducibility and state, meaning users can execute blocks in various orders resulting in different outcomes (we will address this problem with an exciting update next week!). In apps, we aim for all consumers to consistently see correct data.
Originally, we addressed this by not reusing notebook state, and always rerunning the entire app from top to bottom on every input change. The kernel that executed the app was killed right after the execution, so no state (meaning - values of the variables) was preserved for the next, consequent execution. This ensured correctness but unfortunately slowed execution.
We were not happy with the fact that apps felt slow. To address this issue, we set out to implement two changes:
- Detect dependencies between blocks and avoid running unnecessary blocks that are not affected by changed inputs. However, this required us to:
- Keep app kernels alive after the execution to preserve the session's state.
Introducing block dependencies
The obvious solution to speed things up is to avoid running unnecessary blocks. This requires detecting dependencies between blocks.
In many cases, this is straightforward - for example, a visualization block depending on a dataframe, or a text input block defining a Python variable used by other blocks.
However, with code blocks containing arbitrary user code, the problem is more complicated. These blocks can define new variables, reassign or mutate them, which can affect other blocks.
Creating Directed Acyclic Graph (DAG)
Every code block, including code segments within SQL blocks using Jinja templates, is parsed into an Abstract Syntax Tree (AST). We traverse this AST to compile lists of variables used and defined by each block. By compiling these lists for every block in the notebook, along with their position in the notebook, we construct a Directed Acyclic Graph (DAG) of blocks.
When an app or user requests to execute a block (and implicitly all dependent blocks), the DAG is traversed from the requested block downward. All blocks encountered during this traversal are executed. For instance, if a user changes the max_rating filter, causing execution of that input block, subsequent blocks that filter and vizualize the dataframe will also execute.
The order of blocks in the notebooks are taken into account on purpose. This ensures that executing a block will never result in executing a block above it. We made this decision for two primary reasons:
- Many data scientists are accustomed to mutating the same variable throughout a notebook. Ignoring block order could lead to cyclic dependencies.
- Execution from top to bottom is the natural order of operation in notebooks
App kernels lifecycle change and idle status
As mentioned before, until now, app kernels were killed right after the execution. This helped to save resources, since every app’s visitor gets their own kernel (we call it a session), which runs on the underlying project’s machine.
If an app receives a lot of visitors, this means that many kernels are competing for the same machine resources. This isn’t necessarily a significant problem, because the project's machine's RAM or CPU can be easily increased by changing it in the Environment section. However, what about users who come to the app, execute it once, leave it open for a few days, and never return? Keeping a kernel running would lead to wasting a lot of resources (by preventing the machine to be shut down because of inactivity), which our customers are paying for.
The obvious solution is to kill those kernels after a period of inactivity. If a user decides to re-execute an app the next day, it will be run from top to bottom again, which is a good tradeoff. But what's the ideal time after which the kernel should be killed to free up machine resources?
Let’s plot a histogram of time differences between subsequent app executions for the same user and the app for a sample of 2000 app executions.
From the graph, it's clear that most re-executions happen shortly after the first one. In fact, the median is 23 seconds and the 90th percentile is 7 minutes. Therefore, 10 minutes of inactivity seems like a good threshold to consider app kernels as inactive and consequently be killed - and that’s the threshold we chose.
Evaluation
Over the past two months, after every app was rerun with different inputs, we logged anonymised information about the estimated execution time with and without block dependencies enabled. The estimated execution time is calculated by summing the duration of the last execution of blocks about to be executed - this information is already tracked and also shown to users in the UI.
The graph shows the "% of execution time saved" compared to the baseline (100%). For instance, if an entire app previously took 8 seconds to update and now updates in 5 seconds, that represents a 37.5% time saving.
As you can see, the time saved oscillates around 40-50% for few months already.
Conclusion
We are very happy that detecting block dependencies helped our customers speed up their existing apps by roughly 40%.
We are also introducing block dependencies and reactive executions to notebooks, but more about that will be shared another time 🙂