Python Friday #18: Working With the File System

If you do anything with automation, you may need to work with the file system and move files and folders around. This post shows you how to use the different built-in modules and functions of Python to get the job done.

This post is part of my journey to learn Python. You can find the other parts of this series here.

 

Setup

For the examples of this post I use a file system structure like this one:

The glob, os and shutil modules were Python’s workhorses when it came to interaction with the filesystem. They grow over the years and get the work done. Since Python 3.4 we can use the pathlib module for an object-oriented access to filesystem paths (see PEP 428).

I prefer the object-oriented approach and focus mainly on pathlib for this post. If you want to know more about the other options, you should read Working With Files in Python from Vuyisile Ndlovu at Real Python. Be aware, pathlib does not offer everything you may need and knowing a little bit about the other possibilities may quickly become helpful.

 

List the content of a directory

To list the content of a folder we had to use the os module and the listdir() function:

With the pathlib module, we can us use the glob() method on the Path class:

Those two approaches are similar, they manly differ in the values we get back. While listdir() gives us strings, glob() returns Path objects.

On a Path objects we can call various helpful methods to get more information back that just the name:

 

List the content of a directory and its sub-directories

When I work with the file system, I most often need to work with the whole file system tree including its sub-directories. To get all the files and sub-directories you need to write your own recursive function when you work with os.listdir(). With pathlib you just switch from the glob() method to rglob():

 

Creating directories

We can create folders by creating a Path object and call the method mkdir(). If we want to create all the specified folders that do not yet exist, we can set the parents argument to True:

If you prefer to join the parts of the path by using an environment specific folder separator, you can use the joinpath() method:

 

Moving files and directories around

With the rename() method on the Path object we can move a file to another location on the file system:

The same rename() method works with folders and copies them and their content (files and sub-folders):

 

Copy a file

There is currently no method on the Path object to copy a file. You can read the content and write it to copy, but that is rather cumbersome. A much simpler approach is to use the shutil module and the copy2() function. There is a copy() function as well, but copy2() preserves the file metadata while copy() loses them.

If you use Python 3.6 or newer, then you can use the Path objects from pathlib with the copy2() function:

 

Copy a directory tree

The same problems we face by creating a copy of a file occurs when we want to copy a directory. The Path object does not have a method to copy directories, but shutil has with the copytree() function:

 

Deleting files and directories

We can delete files using the unlink() method on the Path object:

For empty directories we need to use the rmdir() method:

If the directory is not empty, we get an OSError exception. This safety mechanism is often not what we want (and in hindsight would have saved us a lot of time when we removed the wrong folder). To remove the whole subtree with all its folders and files, we can use the rmtree() function from shutil:

 

Conclusion

Working in an object-oriented way with the file system using the pathlib module is a lot simpler than only having strings to represent files and directories. Not everything we may need is in pathlib, but when something is missing, the shutil module most likely offers this functionality.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.