Creating graphs with numerical and categorical data is something we got comfortable with over the last months. But how can we visualize a text to spot the common words and get a hint of the topic? Let us figure out how we can tackle such a challenge.
This post is part of my journey to learn Python. You find the code for this post in my PythonFriday repository on GitHub.
Install wordcloud
Word_cloud works on top of NumPy, Pillow, and Matplotlib and allows us to create word clouds. We can install it with this command:
1 |
pip install wordcloud |
Create a word cloud
For our first steps we turn the Zen of Python into a word cloud. We need our text and import Matplotlib and wordcloud to transform the text into a plot:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import matplotlib.pyplot as plt from wordcloud import WordCloud, get_single_color_func text = """ The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! """ wordcloud = WordCloud().generate(text) plt.figure() plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.show() |
This creates us a word cloud like this one:
Customise the word cloud
We can set various options for our word cloud. The most useful one in my opinion is the size, that we can influence with the parameters width and height. With max_words we can reduce or increase the number of words that are part of the word cloud:
1 2 3 4 5 6 7 8 9 10 11 |
color_func1 = get_single_color_func('deepskyblue') #color_func2 = get_single_color_func('#00b4d2') wordcloud = WordCloud(width=800, height=800, background_color="white", max_words=100, color_func=color_func1).generate(text) plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.show() |
This gives us a word cloud for the 100 most used words with a white background and blueish words:
Use a shape for the word cloud
We can use an image with a high contrast and turn that into a shape for our word cloud. For this post we use this triangle:
The heavy lifting for the mask is done by NumPy and we can use the created filter as the mask parameter in our word cloud:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import os, sys import numpy as np from PIL import Image logo_path = os.path.abspath(os.path.join(os.getcwd(), 'images', 'Triangle.png')) mask = np.array(Image.open(logo_path)) wordcloud = WordCloud(width=800, height=800, background_color="white", max_words=100, mask=mask).generate(text) plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.show() |
This creates us a word cloud in the shape of a triangle:
The created graphic might not be that spectacular, but we could find more elaborate shapes and better fitting texts to create small works of art, like the parrot on the GitHub page.
Works on the command line
If you just want a word cloud without writing any code, you can use the wordcloud_cli.exe and use the parameters to customize your image:
1 |
wordcloud_cli.exe --text 10surprises.txt --imagefile 10surprises.png --width 800 --max_words 50 |
This takes the text of my post 10 Unpleasant Surprises When Migrating From .Net 4.8 to .Net 6 and turns it into this image:
Next
With this foray into text visualisation, we have found an interesting approach to capture the essence of a written text. If this post caught your interest, I can highly recommend to explore the Gallery of Examples in the documentation.
Next week we continue with a more traditional approach to data visualization and add an interactive touch to our plots.