Python Friday #161: Organise Data and Code in Jupyter Notebooks

Jupyter Notebooks offer us a great flexibility on how we can work with code and documentation. Unfortunately, not every attempt will give us a maintainable notebook. Let us look at ways to organise our code and the data we need.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

Small code fragments go in the notebook…

It is absolutely no problem to use a notebook for the plumbing code and put together a few method calls. That is what the code cells are for and what gives us the flexibility we enjoy. Even if you need to define a function, you can do it directly in the notebook.

We define a function in one cell and use it in another.

 

… big chunks of code go in a separate file

For bigger code chunks (like classes) or when we want to reuse our functions, we should extract the code into regular Python files. That way we can put our code under version control as we usually do and import only the functionality that we need in a specific notebook.

I put that (extracted) code in a code folder next to my notebook:

The folder code can sit next to your notebooks

We can expand the Python path in our notebook with our code folder using this snipped:

The longer function fizz_buzz() goes into the separate file code/fizzbuzz.py:

In the notebook we can import our function as we would do it in any other Python script:

From here on the notebook works as if the function was written directly inside the notebook:

Everything is neatly cleaned up and we can focus on the work we want to do in the notebook.

That way our notebooks get smaller, and we can focus on the important parts that we want to share.

 

Put data outside of your notebooks

Data often changes and we need to update it frequently. It would be great if we can use the newest data source without rewriting all our notebooks. To achieve that, it is a good practice to put the data into a data or datasets folder next to our notebook. As with the code, we can reference the data once at the beginning of our notebook with a statement like this one:

We get the flexibility to update and reuse the data, while the notebook works as expected:

After referencing the data we can access it as if the data would be directly in the notebook.

 

Next

So far, I got good results following the guidelines described in this post. If you know of a better way to organise code and data, please post a comment – I am always happy to learn about simpler ways to write code. Next week we look at some tips and tricks to work effectively with JupyterLab.

3 thoughts on “Python Friday #161: Organise Data and Code in Jupyter Notebooks”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.