While working on my upcoming blog post on filtering data in Pandas, I noticed a little gap in my knowledge: How can we create a DataFrame without the help of a CSV file? Let us find out what options we have.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Turn a dictionary into a DataFrame
One straight-forward way to create a DataFrame is to use a dictionary. The key of the dictionary will be the name of the column, while the value (or list of values) will be put on separate rows below that column:
1 2 3 4 5 |
import pandas as pd d = {'A': [1, 2, 3], 'B': [4, 5, 6]} df = pd.DataFrame(data=d) df |
This gives us a two column wide and 3 rows long DataFrame:
Turn a list of lists into a DataFrame
We can create a list containing other lists and turn that into a DataFrame:
1 2 3 4 5 6 7 8 |
rows = [] rows.append(['A', 11, 20]) rows.append(['B', 16, 28]) rows.append(['C', 14, 32]) column_names=['Option','Min','Max'] df = pd.DataFrame(rows, columns=column_names) df |
We are free how we name the columns and the items in the list stay in the order in which we put them into the list(s):
Turn a NumPy ndarray into a DataFrame
If we process data and already are familiar with NumPy, we can use the ndarray to turn our data into a DataFrame:
1 2 3 4 5 |
import numpy as np data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)]) df = pd.DataFrame(data, columns=['x', 'y', 'z']) df |
This takes the three values we have in each tuple and places them horizontally into our DataFrame:
Be aware that this gives you a different placement for values than if you would use a dictionary.
Turn a CSV string into a DataFrame
If we have more data or really like CSV, we can create a special string with the StringIO class and put our CSV formatted values there. We then can use the read_csv() method on that string without the need to save our values into a file:
1 2 3 4 5 6 7 8 9 10 11 |
from io import StringIO data = StringIO(""" A,B,C 1,2,3 4,5,6 7,8,9 """) df = pd.read_csv(data) df |
This code allows us to keep doing what we already know and turn CSV into a DataFrame:
Next
These 4 ways allow us to create a DataFrame in Pandas without the need of an additional CSV file. With this new knowledge we can next week experiment with the various options of Pandas to filter data in a DataFrame.
1 thought on “Python Friday #185: Creating DataFrames in Pandas”