Jupyter Notebooks offer us a great flexibility on how we can work with code and documentation. Unfortunately, not every attempt will give us a maintainable notebook. Let us look at ways to organise our code and the data we need.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Small code fragments go in the notebook…
It is absolutely no problem to use a notebook for the plumbing code and put together a few method calls. That is what the code cells are for and what gives us the flexibility we enjoy. Even if you need to define a function, you can do it directly in the notebook.
… big chunks of code go in a separate file
For bigger code chunks (like classes) or when we want to reuse our functions, we should extract the code into regular Python files. That way we can put our code under version control as we usually do and import only the functionality that we need in a specific notebook.
I put that (extracted) code in a code folder next to my notebook:
We can expand the Python path in our notebook with our code folder using this snipped:
1 2 3 |
import os, sys code_dir = os.path.abspath(os.path.join(os.getcwd(), 'code')) sys.path.insert(0, code_dir) |
The longer function fizz_buzz() goes into the separate file code/fizzbuzz.py:
1 2 3 4 5 6 7 8 9 |
def fizz_buzz(value): if(value % 3 == 0 and value % 5 == 0): return "FizzBuzz" elif (value % 3 == 0): return "Fizz" elif(value % 5 == 0): return "Buzz" else: return value |
In the notebook we can import our function as we would do it in any other Python script:
1 |
from fizzbuzz import fizz_buzz |
From here on the notebook works as if the function was written directly inside the notebook:
That way our notebooks get smaller, and we can focus on the important parts that we want to share.
Put data outside of your notebooks
Data often changes and we need to update it frequently. It would be great if we can use the newest data source without rewriting all our notebooks. To achieve that, it is a good practice to put the data into a data or datasets folder next to our notebook. As with the code, we can reference the data once at the beginning of our notebook with a statement like this one:
1 2 |
import os data_file = os.path.abspath(os.path.join(os.getcwd(), 'data', 'project_size.csv')) |
We get the flexibility to update and reuse the data, while the notebook works as expected:
Next
So far, I got good results following the guidelines described in this post. If you know of a better way to organise code and data, please post a comment – I am always happy to learn about simpler ways to write code. Next week we look at some tips and tricks to work effectively with JupyterLab.
3 thoughts on “Python Friday #161: Organise Data and Code in Jupyter Notebooks”