To create Word Cloud in Python we will use a library called wordcloud.

It can be installed by:

pip install wordcloud

Word cloud also known as:

  • tag cloud
  • wordle
  • weighted list

is a visual representation of text data.

Step 1: Create word cloud from text

For the first example we will create word cloud from a string.

If the word wight doesn't matter we can use method - wc.generate(text):

from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(background_color="white", random_state=None)

text = " Thank You Gracias Merci Danke obrigado спасибо Xie xie благодаря dhanyavaad"

wc.generate(text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()

word_cloud_python_word_text

Step 2: Create word cloud by frequency

If the word order matters then we can use the method: wc.generate_from_frequencies(freq).

This method requires dictionary of words with their frequency:

freq = {
    "Thank You": 100,
"Merci": 99,
"Danke": 98,

Below you can find full example of generating word cloud with word frequency:

from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(background_color="white")
freq = {
    "Thank You": 100,
    "Merci": 99,
    "Danke": 98,
    "obrigado": 97,
    "спасибо": 96,
    "Xie xie": 95,
    "Gracias": 94,
    "благодаря": 93,
    "dhanyavaad": 92,
    "dziękuję": 91,
    "Thank you": 90,
    "Dank u": 89
}

wc.generate_from_frequencies(freq)

plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()

the result is:

word_cloud_python_word_frequency

Step 3: Create word cloud from shape

Finally let's create a word cloud masked with a shape.

We are going to generate a word cloud from 100 ways to say "Thank you!" in 100 different languages. The code is below:

from wordcloud import WordCloud
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

mask = np.array(Image.open('rect821.png'))

wc = WordCloud(background_color="white", max_words=2000, mask=mask, contour_width=3, contour_color='steelblue')
freq = {
    "Thank You": 100,
"Merci": 99,
"Danke": 98,
"obrigado": 97,
"спасибо": 96,
"благодаря": 95,
"o ṣeun": 1,
"Ngiyabonga": 0**
}

wc.generate_from_frequencies(freq)
plt.figure(figsize=(20,10) )
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()

Below you can find the result of the code above:

Few things to have in mind when you are working with mask:

  • the image used for a mask - should have white background. If you try to use others - it will break the shape.
  • Free shapes can be taken from: pixabay - just edit and add background
  • Sometimes the size of the output image might not respect the parameters passed to WordCloud()