Scheduling Jupyter notebooks is a common task among data teams. Why? Because if data goes stale, yesterday’s insights can quickly become tomorrow’s missed opportunities.
The data that powers modern businesses is always evolving. And if you’re using a notebook to analyze it, it’s up to you to keep it fresh.
But scheduling a Jupyter notebook run is more trouble than it needs to be. Let’s look at how to make the process easier.
Scheduled notebooks are supposed to simplify collaboration
When we talk about scheduling Jupyter notebooks, what we’re really talking about is data collaboration.
If you’re scheduling a notebook run, more of than not, it’s because you’re keeping a business stakeholder updated. Your notebook is a deliverable — a way to explore data to answer a question or identify an issue.
And your goal is to deliver without wasting a ton of time and resources (i.e., manually rerunning your notebook every time a refresh is needed). By automating notebook runs at specific intervals, you can keep your business partners happy and informed without all the heavy lifting.
Easy enough, right? Well, not if you do it the hard way.
The hard way to schedule Jupyter notebooks
There are multiple options for scheduling a Jupyter notebook, and they all generally follow the same playbook. Take one of the most popular methods: cron jobs.
With this command-line job scheduler, you can execute scripts and applications at predetermined times. Need to run a notebook every day? Cron can help you get the job done with just a line of syntax.
But only if your computer is on. If you want to schedule a notebook run using cron, you have to leave your machine running at the time of execution. Of course, you could always invest in a remote server that’s always on, but that means paying for and managing it.
Hey, maybe your company has deep pockets and you’ve got free time. In any case, you’ll also need to think about error handling. Not only do you need to set up a cron job on your computer, you’ll also need to figure out how to handle any failed runs (commence automated email writing).
Next you have to ask yourself what format you’re sharing your notebook in. If it’s for a business stakeholder, odds are they don’t have Jupyter installed. So now you’ll also need to reformat your notebook as a PDF each time a new one is created (or maybe an HTML file if you want to be able to view it on the web).
That means each time you run a scheduled notebook, you’re creating a new file. Where will those live? A shared drive somewhere? It won’t be too long before people get confused about which file they’re supposed to be looking at.
The overhead just keeps going up. What should be quite simple — scheduling a notebook — suddenly becomes more complex and time-consuming.
The easy way to schedule Jupyter notebooks
It shouldn’t be difficult to schedule a notebook. And with modern, cloud-based data notebooks, it’s not.
Since your notebook is already in the cloud — fully accessible and shareable — no backend workaround or cron syntax is required. Just point and click. Your cloud resources are fully managed, so there’s no overhead for you. You simply create your analysis the way you want (maybe you even publish it as an app) and schedule it to rerun with the push of a button.
Notebooks can be scheduled to run on an hourly, daily, or weekly basis, and you can automatically replace your currently published notebook with the new one after every successful run.
Let’s say you’ve published your notebook as a dashboard for a business stakeholder. They want to make sure they’re always looking at the most up-to-date information. Once your cloud-based notebook is scheduled, you can set it and forget it. If the notebook fails for any reason (e.g., your friendly database administrator added a null value somewhere), you’ll automatically get notified.
No managing command-line utilities or server resources. No worrying about error handling. Just the freshest data, automatically delivered to you and your business partners. And when it’s time to distribute, the newly run notebook can be shared with a link and opened in a browser. No muss, no fuss.
If you’re scheduling a notebook, there’s a good chance you’re using it as an asset for someone else. And nothing leaves a bad taste in the mouth quite like stale data.
Next time you’re looking to schedule a notebook run, try the path of least resistance: a cloud-based notebook that makes collaboration a breeze.
Simplify scheduling Jupyter notebooks with Deepnote
Get started for free to see how easy it is to schedule and collaborate on data notebooks.