Different types of data need different forms for a graphical representation. In this post we explore the most often used types of diagrams in Matplotlib.
This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.
Line plot / Line chart
The line chart is the diagram we used in the last posts and we can create it with this code:
1 2 3 4 5 6 7 |
import matplotlib as mpl import matplotlib.pyplot as plt x = [1, 2, 3, 4] y = [0, 6, 1, 2] fig, ax = plt.subplots() ax.plot(x, y) |
This gives us the familiar looking plot:
Bar chart
If we have categorical values, we can use this code to create a bar chart:
1 2 3 4 5 |
month = ['January', 'February', 'March', 'April', 'May', 'June'] temp_average = [3.4, 5.2, 10.3, 14.5, 18.6, 22.5] fig, ax = plt.subplots() ax.bar(month, temp_average) |
This gives us a bar chart with vertical bars:
If we switch from ax.bar() to ax.barh() we change the orientation from vertical to horizontal:
1 2 3 4 5 |
month = ['January', 'February', 'March', 'April', 'May', 'June'] temp_average = [3.4, 5.2, 10.3, 14.5, 18.6, 22.5] fig, ax = plt.subplots() ax.barh(month, temp_average) |
Pie chart
We can create a pie chart with this code:
1 2 3 4 5 |
labels = 'Cats', 'Dogs' sizes = [65, 30] fig, ax = plt.subplots() ax.pie(sizes, labels=labels) |
The result is a circle with proportional coloured segments that correspond to our values:
Scatter plot
The scatter plot draws a point for each data point we have:
1 2 3 4 5 |
x = [1, 2, 3, 4, 3, 2, 2.5] y = [0, 6, 1, 2, 5, 4, 2.5] fig, ax = plt.subplots() ax.scatter(x, y) |
Instead of lines or bars we get dots representing our data:
In the documentation for Matplotlib you find a more interesting example of the possibilities you have with scatter plots. We can set the size and the colour to produce something like this:
Histograms
A histogram is a helpful way to see how the data is distributed:
1 2 3 4 |
data = [1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 6, 6] fig, ax = plt.subplots() ax.hist(data, bins=6) |
This shows us that the numbers 3 and 5 are the most frequent in our data set:
Box plot
The box plot is a helpful way to see the distribution of data:
1 2 3 4 |
data = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6] fig, ax = plt.subplots() ax.boxplot(data) |
This gives us a plot that can abstract a lot of values while it still shows us the variability of our data:
What does the plot tell us? The whiskers (the lines on top and at the bottom) represents the maximal and minimal values, while the line in the middle of the box is the median. The distance from the median to the edge of the box and from the box to the whiskers each cover a quarter of the values:
In our example we can see that we have more smaller values than higher values. For more details to box plots and how to interpret them you should read this article.
If we want to have a horizontal box plot, we need to set the value for vert to false (vert = vertical):
1 2 3 4 |
data = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6] fig, ax = plt.subplots() ax.boxplot(data, vert=False) |
Next
With these different types of diagrams, we can illustrate most of the data we work with. If it is not enough, there are plenty more visualisations to choose from in the documentation. Next week we explore ways to customise how our plots get drawn.