If you do anything with automation, you may need to work with the file system and move files and folders around. This post shows you how to use the different built-in modules and functions of Python to get the job done.
This post is part of my journey to learn Python. You can find the other parts of this series here.
Setup
For the examples of this post I use a file system structure like this one:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
D:\Python>tree /F … D:. │ file1.txt │ ├───folderA │ file2.txt │ file3.txt │ └───folderB │ file4.txt │ └───folderC file5.txt |
The glob, os and shutil modules were Python’s workhorses when it came to interaction with the filesystem. They grow over the years and get the work done. Since Python 3.4 we can use the pathlib module for an object-oriented access to filesystem paths (see PEP 428).
I prefer the object-oriented approach and focus mainly on pathlib for this post. If you want to know more about the other options, you should read Working With Files in Python from Vuyisile Ndlovu at Real Python. Be aware, pathlib does not offer everything you may need and knowing a little bit about the other possibilities may quickly become helpful.
List the content of a directory
To list the content of a folder we had to use the os module and the listdir() function:
1 2 3 4 5 6 7 8 |
import os files = os.listdir('D:\Python') for file in files: print(file) file1.txt folderA folder |
With the pathlib module, we can us use the glob() method on the Path class:
1 2 3 4 5 6 7 8 |
import pathlib files = pathlib.Path("D:\Python").glob("*") for file in files: print(file) D:\Python\file1.txt D:\Python\folderA D:\Python\folderB |
Those two approaches are similar, they manly differ in the values we get back. While listdir() gives us strings, glob() returns Path objects.
On a Path objects we can call various helpful methods to get more information back that just the name:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import pathlib file1 = pathlib.Path(r"D:\Python\file1.txt") print(f"The parent folder: {file1.parent}") print(f"The file name: {file1.name}") print(f"The file name without suffix: {file1.stem}") print(f"The file suffix: {file1.suffix}") print(f"The absolute path: {file1.absolute()}") print(f"Is it a file? {file1.is_file()}") print(f"Is it a directory? {file1.is_dir()}") The parent folder: D:\Python The file name: file1.txt The file name without suffix: file1 The file suffix: .txt The absolute path: D:\Python\file1.txt Is it a file? True Is it a directory? False |
List the content of a directory and its sub-directories
When I work with the file system, I most often need to work with the whole file system tree including its sub-directories. To get all the files and sub-directories you need to write your own recursive function when you work with os.listdir(). With pathlib you just switch from the glob() method to rglob():
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pathlib files = pathlib.Path.cwd().rglob("*") for file in files: print(file) D:\Python\file1.txt D:\Python\folderA D:\Python\folderB D:\Python\folderA\file2.txt D:\Python\folderA\file3.txt D:\Python\folderB\file4.txt D:\Python\folderB\folderC D:\Python\folderB\folderC\file5.txt |
Creating directories
We can create folders by creating a Path object and call the method mkdir(). If we want to create all the specified folders that do not yet exist, we can set the parents argument to True:
1 2 3 |
import pathlib deep = pathlib.Path(r"D:\Python\deep\onemore\evenmore") deep.mkdir(parents=True) |
If you prefer to join the parts of the path by using an environment specific folder separator, you can use the joinpath() method:
1 2 3 4 |
import pathlib parent = pathlib.Path(r"D:\Python") deep = parent.joinpath("deep2", "onemore", "evenmore") deep.mkdir(parents=True) |
Moving files and directories around
With the rename() method on the Path object we can move a file to another location on the file system:
1 2 3 4 5 6 |
import pathlib old = pathlib.Path(r"D:\Python\file1.txt") new = old.rename(r"D:\Python\FolderA\file1_moved.txt") print(new) D:\Python\FolderA\file1_moved.txt |
The same rename() method works with folders and copies them and their content (files and sub-folders):
1 2 3 4 5 6 |
import pathlib old = pathlib.Path(r"D:\Python\folderB") new = old.rename(r"D:\Python\FolderA\movedB") print(new) D:\Python\FolderA\movedB |
Copy a file
There is currently no method on the Path object to copy a file. You can read the content and write it to copy, but that is rather cumbersome. A much simpler approach is to use the shutil module and the copy2() function. There is a copy() function as well, but copy2() preserves the file metadata while copy() loses them.
1 2 3 4 5 6 7 |
import shutil original = r"D:\Python\folderA\file2.txt" copy = r"D:\Python\file2copy.txt" new = shutil.copy2(original, copy) print(new) D:\Python\file2copy.txt |
If you use Python 3.6 or newer, then you can use the Path objects from pathlib with the copy2() function:
1 2 3 4 5 6 7 8 |
import pathlib import shutil original = pathlib.Path(r"D:\Python\folderA\file2.txt") copy = pathlib.Path(r"D:\Python\file2copy2.txt") new = shutil.copy2(original, copy) print(new) D:\Python\file2copy2.txt |
Copy a directory tree
The same problems we face by creating a copy of a file occurs when we want to copy a directory. The Path object does not have a method to copy directories, but shutil has with the copytree() function:
1 2 3 4 5 6 7 |
import shutil original = r"D:\Python\folderA" copy = r"D:\Python\folderCopy" new = shutil.copytree(original, copy) print(new) D:\Python\folderCopy |
Deleting files and directories
We can delete files using the unlink() method on the Path object:
1 2 3 |
import pathlib file = pathlib.Path(r"D:\Python\folderA\file2.txt") file.unlink() |
For empty directories we need to use the rmdir() method:
1 2 3 |
import pathlib folder = pathlib.Path(r"D:\Python\folderCopy") folder.rmdir() |
If the directory is not empty, we get an OSError exception. This safety mechanism is often not what we want (and in hindsight would have saved us a lot of time when we removed the wrong folder). To remove the whole subtree with all its folders and files, we can use the rmtree() function from shutil:
1 2 |
import shutil shutil.rmtree(r"D:\Python\folderCopy") |
Conclusion
Working in an object-oriented way with the file system using the pathlib module is a lot simpler than only having strings to represent files and directories. Not everything we may need is in pathlib, but when something is missing, the shutil module most likely offers this functionality.