Last week we looked at how to import data into pandas. With this problem solved, we can now move on to the more interesting task of converting our data into a pivot table.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Reminder: The sample data
The data I work with looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
month;project;loc 2021.01;A;100 2021.01;B;1000 2021.01;C;1100 2021.02;A;200 2021.02;B;2100 2021.02;C;1100 2021.03;A;500 2021.03;B;2100 2021.03;C;3100 2021.04;A;1000 2021.04;B;5000 2021.04;C;4100 |
The pivot() method
On the data frame is the pivot() method that will transform our data as we need it. I want to use the ‘project’ column as the index (top to down), the ‘month’ as columns (left to right) and the ‘loc’ column as its values:
1 2 3 4 |
import pandas as pd pivot = df.pivot(index='project', columns='month', values='loc') print(pivot) |
If we run this code, we get this pivot table:
1 2 3 4 5 |
month 2021.01 2021.02 2021.03 2021.04 project A 100 200 500 1000 B 1000 2100 2100 5000 C 1100 1100 3100 4100 |
We can now see how the projects increased in lines of code over time.
The pivot_table() method
We often not only need a pivot table but some calculations over the rows and columns. For this requirement we can use the pivot_table() method in pandas and create additional columns and rows for grand totals around our data frame (those extra columns are called margin).
The first part of the method call is the same as to pivot(), the parameters after that use the sum function to create a grand total over the pivot table:
1 2 3 4 |
pt = df.pivot_table(index='project', columns='month', values='loc', aggfunc = 'sum', fill_value = 'N/A', margins = True, margins_name='Total') print(pt) |
This code adds the additional columns we often need in a pivot table:
1 2 3 4 5 6 |
month 2021.01 2021.02 2021.03 2021.04 Total project A 100 200 500 1000 1800 B 1000 2100 2100 5000 10200 C 1100 1100 3100 4100 9400 Total 2200 3400 5700 10100 21400 |
We can use other aggregate functions like min or max if we want to find the extreme values in every column and row.
Conclusion
Pandas allows us to create a pivot table with a single command. It looks like magic but when you use it with your own data you quickly get the expected result, especially when you know how to create pivot tables in Excel. Try it!
2 thoughts on “Python Friday #104: Creating a Pivot Table in Pandas”