Until now I use Excel to make a Pivot out of my data stored in CSV files. While this works, I need a lot of manual steps that are cumbersome and time-consuming. Pandas offers an automatable way to do this task, but before I can create the pivot, I need to import my data. Let us look how we can do this part with pandas.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Pandas?
Pandas is a data analysis library for Python. It offers an enormous functionality and I only use a tiny bit for my tasks. If you want to know more about pandas, you should check the official documentation.
You can install pandas with this command:
1 |
pip install pandas |
My sample data
The data I want to import looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
month;project;loc 2021.01;A;100 2021.01;B;1000 2021.01;C;1100 2021.02;A;200 2021.02;B;2100 2021.02;C;1100 2021.03;A;500 2021.03;B;2100 2021.03;C;3100 2021.04;A;1000 2021.04;B;5000 2021.04;C;4100 |
Reading CSV files
We can use the read_csv() method from pandas to create a data frame (df). The data frame is one of the basic objects in pandas on which you do all the work. If you use something else than a “,” (comma) to separate your data, you must set the delimiter argument – otherwise pandas will put all your data into one single column:
1 2 3 4 |
import pandas as pd df = pd.read_csv('input.csv', delimiter=';') print(df) |
If you run the code from above, you should get this output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
month project loc 0 2021.01 A 100 1 2021.01 B 1000 2 2021.01 C 1100 3 2021.02 A 200 4 2021.02 B 2100 5 2021.02 C 1100 6 2021.03 A 500 7 2021.03 B 2100 8 2021.03 C 3100 9 2021.04 A 1000 10 2021.04 B 5000 11 2021.04 C 4100 |
Writing CSV files
Writing your data frame to a CSV file is similar to reading them. We must specify a file name, while everything else (like the delimiter) in the call to to_csv() is optional. For most exports you want to specify index=False to prevent the index column (that every data frame has) in your output file:
1 |
df.to_csv('output_converted.csv', index=False) |
If we run this code our output file uses “,” (comma) as a delimiter:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
month,project,loc 2021.01,A,100 2021.01,B,1000 2021.01,C,1100 2021.02,A,200 2021.02,B,2100 2021.02,C,1100 2021.03,A,500 2021.03,B,2100 2021.03,C,3100 2021.04,A,1000 2021.04,B,5000 2021.04,C,4100 |
Next
We now know how to get the data into pandas and back into a CSV file. Next week we turn our data into a pivot table.
1 thought on “Python Friday #103: Working With CSV Files in Pandas”