We are back on our journey to data visualisation. Matplotlib offered us a lot of features, but especially the combination of multiple plots into one graphic is painful. With Pandas we get an abstraction of Matplotlib that works on the whole data frame. Let us explore the plotting capabilities we get in Pandas.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Preparations
To show the power of Pandas for plotting, we need a data frame with data. We can use this list of mean daily maximum temperatures (in Celsius) for the cities Bern, Oslo, and Rome:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd import numpy as np months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] temp_high_average = { "Bern" : [3.4, 5.2, 10.3, 14.5, 18.6, 22.5, 24.6, 24.2, 19.4, 14.0, 7.7, 3.8], "Oslo" : [-0.4, 0.5, 4.4, 10.1, 16.5, 20.0, 22.3, 20.9, 15.7, 9.4, 3.9, 0.0], "Rome": [11.1, 12.6, 15.2, 18.8, 23.4, 27.6, 30.4, 29.8, 26.3, 21.5, 16.1, 12.6] } df = pd.DataFrame(data=temp_high_average, index=months) |
This gives us a data frame that looks like this one:
Line plot / Line chart
If we call the plot() method on the data frame without any options, it will create a line plot:
1 |
df.plot() |
Without any additional work we get all three cities in the same plot:
If we only want to plot specific columns, we can create a list with their names and then filter the data frame:
1 2 |
selector = ["Bern", "Oslo"] df[selector].plot(kind="line") |
We now only get the cities Bern and Rome in our plot:
Bar chart
To get a bar chart of our data frame, we can use the method df.plot(kind=”bar”) or the dedicated method df.plot.bar() – both produce the same result:
1 |
df.plot.bar() |
This gives us a bar chart with the temperatures for each month for all three cities:
If we want to get a horizontal bar chart, we can us either df.plot(kind=”barh”) or df.plot.barh():
1 |
df.plot.barh() |
This turns the bar chart on the side:
There is only one annoying detail with this chart: The months start with December. If you want a list from January to December instead of December to January, you can invert the axes:
1 2 |
ax = df.plot.barh() ax.invert_yaxis() |
This fixes the order of the months to something more familiar:
Pie chart
For the pie chart we need a data frame with the categories we want in our plot:
1 2 |
data = {'Animals': {'Cats': 65, 'Dogs': 30}} df = pd.DataFrame(data=data) |
This creates us a data frame with the two categories “Cats” and “Dogs”:
We can now filter the data frame down to only one column and plot the pie chart with df.plot(kind=’pie’) or we use the df.plot.pie() method and filter by selecting a single column as the Y value:
1 2 |
df['Animals'].plot(kind='pie') # df.plot.pie(y='Animals') |
Both approaches give us the same pie chart:
Scatter plot
For the scatter plot we need a data frame with (at least) two columns:
1 2 3 |
data = {"a": [1, 2, 3, 4, 3, 2, 2.5], "b": [0, 6, 1, 2, 5, 4, 2.5]} df = pd.DataFrame(data=data) |
This creates us a data frame like this:
We need to tell the df.plot.scatter() method what column will be our X value and what is column is the Y value for the dots:
1 |
df.plot.scatter(x = 'a', y='b') |
This creates us a scatterplot with the 7 points of our data frame:
Next
We can use the plot() method and specify the kind of plot we want or use a dedicated method to visualise our data frame. Next week we look at the slightly different options we have for histograms and box plots.
2 thoughts on “Python Friday #175: Visualise Data in Pandas With Plot()”