Whenever I need to find values that are part of one list but not another one, I like to work with set operations. I find them more elegant than looping through the lists. Let’s look what Python to solve this problem.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
NumPy?
NumPy is the fundamental package for scientific computing with Python and a great addition to pandas. As with pandas, NumPy offers a large set of features that you find in the official documentation.
We can install NumPy with pip:
1 |
pip install numpy |
Preparation
With set operations we can get parts of two lists (or arrays) without iterating through them. This gives a more elegant solution and may reveal the goal of your code more clearly. For this post we need the two lists a and b:
1 2 3 4 |
import numpy as np a = [1,2,3,4] b = [3,4,5,6] |
I find it helpful to have a graphical representation of the two “sets” we work on. The numbers 1-4 are part of list a, while the numbers 3-6 are part of list b. The numbers 3 and 4 are in both lists:
Set difference: Elements in the first but not the second list
We can use the set difference when we want the elements in list a that are not part of list b. In NumPy this method is called np.setdiff1d():
1 2 |
in_a_not_in_b = np.setdiff1d(a,b) print(f"in a but not in b: {in_a_not_in_b}") |
This gives us the numbers 1 and 2:
in a but not in b: [1 2]
If we want to know what elements are in b but not in a, we need to switch the two lists:
1 2 |
in_b_not_in_a = np.setdiff1d(b,a) print(f"in b but not in a: {in_b_not_in_a}") |
This gives us the numbers 5 and 6:
in b but not in a: [5 6]
Intersection: Get the elements that are in both lists
If we want the elements that are in both lists, we can use the method np.intersect1d():
1 2 |
intersect = np.intersect1d(a,b) print(f"both in a AND b: {intersect}") |
This gives us the numbers 3 and 4:
both in a AND b: [3 4]
Union: Get a set of all elements
With the method np.union1d() we get a set of all elements in both lists, but each element only comes up once:
1 2 |
union = np.union1d(a,b) print(f"everything from a and b: {union}") |
This gives us the numbers 1 through 6:
everything from a and b: [1 2 3 4 5 6]
Conclusion
NumPy is a powerful library and the set operations are only a tiny bit of all the things it offers. For certain problems I like the set operations a lot and it is nice that Python offers support for them.
1 thought on “Python Friday #109: Set Operations on Lists With NumPy”