Top 200 @Mentions and #Hashtags used in tweets from 2017 Altmetric.com dump as Word Cloud using Python

A follow up to the Twitter profile description word cloud… I’ve created a hashtag word cloud from the 19.2 million hashtags used in the tweets collected by Altmetric.com

Top 200 Hashtags used in tweets collected by Altmetric.com
Top 200 @Mentions used in tweets collected by Altmetric.com

The Python code is VERY similar to the profile description word cloud code, however we have to turn off the ‘collocations’ option in the WordCloud module options to make it work as we expect.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Feb 10 16:19:56 2019

@author: tdbowman
"""
import io
import csv
import numpy as np
from wordcloud import WordCloud, STOPWORDS
from os import path
from PIL import Image


# current directory
currdir = path.dirname(__file__)

# from https://github.com/nikhilkumarsingh/wordcloud-example/blob/master/mywc.py
def create_wordcloud(text):

    # use cloud.png as mask for word cloud
    mask = np.array(Image.open(path.join(currdir, "cloud.png")))
    # create set of stopwords	
    #stop_words = list(STOPWORDS)
    
    # create wordcloud object
    wc = WordCloud(collocations=False,
                   background_color="white",
                   max_words=200, 
                   mask=mask,
                   width=1334,
                   height=945)
    	
    # generate wordcloud
    wc.generate(text)
    # save wordcloud
    wc.to_file(path.join(currdir, "wc_hashtags.png"))
    
if __name__ == "__main__":

    # Grab text from file and convert to list
    your_list = []
    with io.open('hashtags.csv', 'r', encoding='utf-8') as f:
        reader = csv.reader(x.replace('\0', '') for x in f)
        your_list = ','.join([i[0] for i in reader])    
    
    # generate wordcloud
    create_wordcloud(your_list)

Twitter Profile Descriptions Word Cloud from 2017 Altmetric.com Dump using Python 3

Just a quick post to display a word cloud I created as a demo for my Adv Programming students using the Python WordCloud module. I was surprised to find that “Founder” was the top occurring word in the profile descriptions. What does this tell us?

I first had to write a PHP script to grab all the tweet objects from Twitter Search API that were captured by Altmetric.com. I then stored all the unique author profiles in a separate table from the tweets, which gave me approximately 3.4 million unique Twitter users who tweeted about science as captured by Altmetric.com. Of these 3.4 million, there were approximately 2.8 million users who had some characters in their Twitter profile description field.

To return results from the description found in my MySQL table of author profiles, I used the following query because I wanted to remove hard returns and tabs from the profile descriptions.

SELECT REPLACE(REPLACE(REPLACE(TRIM(`description`), '\r', ' '), '\n', ' '), '\t', ' ') 
FROM `profiles` 
WHERE TRIM(`description`)!='';

Next, we have the actual Python 3 code I used to create the WordCloud from the 2.8 million user descriptions. You’ll note I added a few extra terms to the STOPWORDS list because I ran this multiple times and found these terms that I wanted to remove from the final version.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Feb 10 16:19:56 2019
@author: tdbowman
"""
import io
import csv
import numpy as np
from wordcloud import WordCloud, STOPWORDS
from os import path
from PIL import Image


# current directory
currdir = path.dirname(__file__)

# from https://github.com/nikhilkumarsingh/wordcloud-example/blob/master/mywc.py
def create_wordcloud(text):

    # use cloud.png as mask for word cloud
    mask = np.array(Image.open(path.join(currdir, "cloud.png")))
    # create set of stopwords	
    stop_words = ["https", "co", "RT", "del", "http", 
                  "tweet", "tweets", "twitter", "en", "el", "us", "et",
                  "lo", "will", "ex", "de", "la", "rts"] + list(STOPWORDS)
    
    # create wordcloud object
    wc = WordCloud(background_color="white",
    					max_words=200, 
    					mask=mask,
    	               	stopwords=stop_words)
    	
    # generate wordcloud
    wc.generate(text)
    # save wordcloud
    wc.to_file(path.join(currdir, "wc.png"))
    
if __name__ == "__main__":

    # Grab text from file and convert to list
    your_list = []
    with io.open('all_descriptions.csv', 'r', encoding='utf-8') as f:
        reader = csv.reader(x.replace('\0', '') for x in f)
        your_list = ','.join([i[0] for i in reader])    
    
    # generate wordcloud
    create_wordcloud(your_list)

It could use some cleanup and the image could be higher resolution, but it’s a good example for the students how to utilize Python to create a word cloud.