Twitter Profile Descriptions Word Cloud from 2017 Dump using Python 3

Just a quick post to display a word cloud I created as a demo for my Adv Programming students using the Python WordCloud module. I was surprised to find that “Founder” was the top occurring word in the profile descriptions. What does this tell us?

I first had to write a PHP script to grab all the tweet objects from Twitter Search API that were captured by I then stored all the unique author profiles in a separate table from the tweets, which gave me approximately 3.4 million unique Twitter users who tweeted about science as captured by Of these 3.4 million, there were approximately 2.8 million users who had some characters in their Twitter profile description field.

To return results from the description found in my MySQL table of author profiles, I used the following query because I wanted to remove hard returns and tabs from the profile descriptions.

SELECT REPLACE(REPLACE(REPLACE(TRIM(`description`), '\r', ' '), '\n', ' '), '\t', ' ') 
FROM `profiles` 
WHERE TRIM(`description`)!='';

Next, we have the actual Python 3 code I used to create the WordCloud from the 2.8 million user descriptions. You’ll note I added a few extra terms to the STOPWORDS list because I ran this multiple times and found these terms that I wanted to remove from the final version.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
Created on Sun Feb 10 16:19:56 2019
@author: tdbowman
import io
import csv
import numpy as np
from wordcloud import WordCloud, STOPWORDS
from os import path
from PIL import Image

# current directory
currdir = path.dirname(__file__)

# from
def create_wordcloud(text):

    # use cloud.png as mask for word cloud
    mask = np.array(, "cloud.png")))
    # create set of stopwords	
    stop_words = ["https", "co", "RT", "del", "http", 
                  "tweet", "tweets", "twitter", "en", "el", "us", "et",
                  "lo", "will", "ex", "de", "la", "rts"] + list(STOPWORDS)
    # create wordcloud object
    wc = WordCloud(background_color="white",
    # generate wordcloud
    # save wordcloud
    wc.to_file(path.join(currdir, "wc.png"))
if __name__ == "__main__":

    # Grab text from file and convert to list
    your_list = []
    with'all_descriptions.csv', 'r', encoding='utf-8') as f:
        reader = csv.reader(x.replace('\0', '') for x in f)
        your_list = ','.join([i[0] for i in reader])    
    # generate wordcloud

It could use some cleanup and the image could be higher resolution, but it’s a good example for the students how to utilize Python to create a word cloud.

Lessons from 2:AM

Last week I was lucky enough to attend the 2:AM Conference in Amsterdam. The conference was focused on altmetrics–a type of metric that is typically calculated based on scholarly communication events captured in online contexts (e.g., events in Twitter, Mendeley, Wikipedia, etc). For some time I’ve been critical of the term “altmetrics” because I had taken it to mean “alternative to citations,” but after this conference I’m not so confident in my previous position. Altmetrics is an umbrella term that we use to help describe the type of research we are doing (at least those of us that research these things), it is a buzzword that others use to talk about scholarly communication in online contexts, it is a term that the media has used, it is currently used in organizations, libraries, universities, and companies to promote scientific work, and it has become a term that somehow represents the potential for measuring impact outside of the academic machine (other than scientific impact). While it has been criticised many times in the past for being the wrong term, I am not sure there is a more appropriate term… and that is fine.  We have had suggestions including social media metrics (Haustein, Larivière, Thelwall, Amyot, & Peters, 2014), complimetrics (complimentary metrics) (Adie, 2014), influmetrics (influence metrics) (Cronin & Weaver, 1995; Rousseau & Ye, 2013), and more traditionally webometrics (Almind & Ingwersen, 1997), to name just a few, but these do not seem to be any better and also do not seem to possess that something that “alt”metrics seems to possess.

I dabble in linguistics and I believe that words are of vital importance to our ability to understand and discuss the same phenomenon (especially in science), which is why I was so adamant that “altmetrics” was the wrong term to be using. But then I took another look at the altmetrics manifesto (the 5th anniversary of this important object was celebrated at the conference) and reevaluated my own position based on my accumulated knowledge in the field, what I learned at this conference, and a closer inspection of the manifesto to come to the realization that altmetrics is fine when you think of it as an “alternative means of measuring scholarly communication.”

The conference venue was great as we were housed at the Amsterdam Science Park, a sprawling complex on the eastern side of Amsterdam.  There were quite a few attendees and the presentations and workshop were informative and thought-provoking. Many of the primary data providers, publishing companies, metrics providers, and others in this field sent representatives including Jason Priem (, Euan Adie (, William Gunn (, Greg Gordon (, Martin Fenner (, and Geoff Bilder ( In addition, the four authors of the altmetrics manifesto were in attendance to celebrate its 5th anniversary– Jason PriemDario TaraborelliPaul Groth, and Cameron Neylon.  I was able to speak with both Jason and Cameron and they were engaging, down to earth people who are great scholars and excited by the future of scholarly communication (I wasn’t able to speak with Paul or Dario at such length).

What I gleaned from 2:AM was that there was an ongoing discussion from multiple perspectives taking place regarding the ability for altmetrics to measure impact, the types of impact there might be for scholarly communication, and the importance of trust when considering the reasons behind altmeteric events. In addition, I am looking forward to be a part of a group (formed at the”theories” conference breakout session) that will write a white paper describing and defining common terms used in altmetric research for the purpose of allowing others outside of our community to understand and contribute to the ongoing work in the field. I also learned that many in the field had read our book chapter (arXiv:1502.05701) on applying citation and social theories to the understanding of altmetric events–they were very supportive of our efforts to put forth this first attempt at developing a framework for understanding altmetric events. Yet we all know that much more work needs to be done and hopefully this white paper will be a nice step in that direction.

What I also learned from listening to Jason Priem, Dario Taraborelli, Paul Groth, and Cameron Neylon was that our group has somewhat ignored the important component of the manifesto, which is talking about altmetric events as a type of  “filters” for scholarly research and communication:

No one can read everything. We rely on filters to make sense of the scholarly literature, but the narrow, traditional filters are being swamped. However, the growth of new, online scholarly tools allows us to make new filters; these altmetrics reflect the broad, rapid impact of scholarship in this burgeoning ecosystem. We call for more tools and research based on altmetrics. (Priem, Taraborelli, Groth & Neylon, para 1, 2010)

This is an important aspect that I too simply took for granted and something I need to reflect on as my understanding of these phenomena continue to grow and change.



Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A manifesto. October 26, 2010. Retrieved from




The current state of scholarly communication is in flux as various avenues for the consumption and dissemination of ideas, discussions, and research continue to be developed and then adopted by scholars. These contexts offer different affordances (Gibson, 1977), or possibilities for action provided by the platform, and offer different types of networks with which a scholar can view and interact. Affordances found in these environments typically include the ability to share information in a particular way (e.g. tweet, Facebook post, blog, comment, post links or media, hashtags, etc.), consume information, create a profile (public, private, or mixed), and connect with other users of the platform. There are various types of networks represented across these platforms, including blogs (external facing networks), Facebook and Twitter (social networks), and Wikipedia (interconnected networks). A scholar can present herself on these online platforms along a continuum ranging from personal to professional.

Problems can arise from interacting within these online contexts as the information is disseminated to a vast unknown audience, it is archivable, it is searchable, and it can be copied and removed from the context in which it was originally published (boyd, 2006). This can prove damaging to the reputation of a scholar and can lead to shame, punishment, or dismissal as seen from recent examples. In one example, a scholar who had been offered a tenure-track position at the University of Illinois, Urbana-Champaign, had this same offer rescinded after several tweets made by the individual were deemed anti-semitic in nature by the university board (Jaschik, 2014). In another example, a professor from the University of New Mexico was put on probation and given counselling after tweeting an offensive remark about Ph.D. applicants (Ingeno, 2013). There have been other examples of these types of infractions from Facebook and from blogging.

Before the rise of these massive online networks, scholars already found it difficult to manage the boundaries between their personal and professional lives. The introduction of online contexts in which a person can interact with vast audiences exacerbates the situation for scholars as they (often) already are maintaining a tenuous balance between their personal and professional identities from their time spent mentoring and teaching students in and out of the classroom. The boundaries between personal and professional are changing; what was considered personal interactions outside the classroom now have been thrust into the spotlight partially because of the new networks in which scholars interact. This relationship between the changing personal and professional boundaries of self-presentation and the size of the network and proximity of the nodes has not been adequately discussed.

Goffman (1959) discussed the acts of self-presentation and impression management in his social research as acting out a particular role for an audience and maintaining that role across time. These acts rely on various aspects including social norms, rules, and context to be effective. You could interpret Goffman’s writing in a way that suggests he considered the network and it’s significance to people in their day to day lives, as he (Goffman, 1961, p. 127) noted later that “[w]hen seen up close, the individual, bringing together in various ways all the connections that he has in life, becomes a blur.” He knew that boundary maintenance was a crucial component of self-presentation and impression management, as he divided the act of self-presentation into three different regions: front stage, back-stage, and the outside region. What he did not directly speak to was the actual size of the network and the influence this would have on the boundaries between these regions.

A graphical representation of Goffman’s Self-Presentation framework

Related to this, Mehra, Kilduff, and Brass (2001, p. 131) argued that while a large network “can enable the individual to access numerous others for information and other resources,” they warned that “[p]eople who interact with numerous others in organizations run the risk of running short of time and other resources” In addition to the time and resources used to maintain large networks, scholars run the risk of further blurring the boundaries between their personal and professional selves. I want to further investigate this relationship between networks and self-presentation and impression management and the blurring between personal and professional.



boyd, d. (2006). Friends, Friendsters, and MySpace Top 8: Writing Community Into Being on Social Network Sites. First Monday, 11 (12)(12), 1–15. Retrieved from

Gibson, J. J. (1977). The Theory of Affordances. In R. Shaw & J. Bransford (Eds.), Perceiving, Acting, and Knowing: Toward an Ecological Psychology (pp. 127–143). Hillsdale, NJ: Lawrence Erlbaum.

Goffman, E. (1959). The Presentation of Self in Everyday Life. New York: Anchor.

Goffman, E. (1961). Encounters: Two studies in the sociology of interaction. Indianapolis: The Bobbs-Merrill Company, Inc.

Ingeno, L. (2013, June 14). Outrage over professor’s Twitter post on obese students. Inside Higher Ed. Retrieved from

Jaschik, S. (2014, August). Out of a job. Inside Higher Ed. Retrieved from

Mehra, A., Kilduff, M., and Brass, D.J. (2001) The social networks of high and low selfmonitors: Implications for workplace performance. Administrative Science Quarterly, 46(1), pp. 121-146.

The ecosystem of science

I’ve been thinking a lot about what Science really means to me and what the philosophers of science have said about the system of science. I love Newton’s famous notion about “standing on the shoulders of giants,” but I don’t necessarily see it in that way… especially in my line of research investigating altmetrics and scholarly communication.

It’s a blustery evening in Finland and I am watching the trees bend and shed leaves in the strong breeze while thinking about this. It seems to me that the system of science resembles an ecosystem in which we try to make our lives meaningful and to shed light on our surroundings. We do, of course, use the work of others to view things through their eyes, but I don’t see myself standing on their shoulders and reaching for the stars. Instead I see myself as a small sapling, struggling for nourishment in a vast forest. At the same time, I view those before me, especially those marvelous minds from which I borrow, as large trees that shade me from the sun and break the harsh winds blowing over me. I see the trees of Goffman and Gibson, of Heidegger and Kant, and on and on, in my part of the forest. These solid, long standing trees protect me and nourish me, allowing me to grow and to become a tree myself.

As scholarly communication and science has changed, so too has the ecosystem. We are no longer simply trying to aspire to being the trees that provide the root system of science, we are also trying to spread and have an impact outside our forests. I feel like we are  now flowering trees, making pollen that can be carried away to the farthest fields with hopes of having an impact on our surroundings. We have evolved to make use of the technologies that have become a part of our world, to attract the attention of others so that they can carry our pollen away. A large part of this new technology and ecosystem is the internet, specifically social media and other online sources of information. Social media users are the bees that we need to spread our pollen, our information, outside of our isolated forests. What the bees are doing with this information, we don’t yet know.  But what we do know is that they can spread it faster and farther than ever before.

Through my work I hope we can figure out where our information is being spread and what kinds of impact we are having on society.

It. Is. Done.

I have finally finished my Ph.D. Yay. I graduated from the School of Informatics and Computing,  Indiana University, Bloomington at the end of July, 2015.

After seven years of contemplating social structures, norms, behaviors, communication, and the ways in which people use the affordances of social media, I was able to successfully defend my thesis in front of four of my peers and a handful of students in May, 2015 and make the required minor revisions and formatting changes to submit the final version of the document to the graduate school at the beginning of July, 2015.

It has been a long, rewarding journey and I am happy that I completed it. I have been able to travel around the world, move to two countries, and meet some extraordinary scholars, travelers, and neighbors. It’s been quite an adventure, one which I hope continues as I progress in my career as an academic. Thank you to everyone for the support and love throughout this process.

I’m now in Finland working with great scholars and looking to improve my abilities as a scholar, researcher, teacher, and coworker.


Scholarly Communication, ‘Altmetrics’, and Social Theory

In a recent book chapter (that is currently under review), my colleagues and I discuss the application of citation theories and social theories to popular media and social media metrics (so-called altmetrics) being collected by sites like,, and Plum Analytics. These metrics are being used by organizations such as libraries, publishers, universities, and others to measure scholarly impact. It is an interesting area of research in that it helps us understand how scholarly work is being consumed and disseminated in social media (and thus presumably to an audience outside of the academy).

I come to this research having dabbled in many different areas of studies beginning with neuropsychology (as an undergraduate), human-computer interaction, information architecture, and web design (as a master’s student), and finally social informatics (at the beginning of my Ph.D.), digital humanities (middle of Ph.D.), and scholarly communication and sociology (thesis work). I believe this indirect path has allowed me to consider research questions from different perspectives and allows me to apply various theoretical and methodological lenses to the same problem (as is the case for many Information Science graduates). It’s also a path that has allowed me to contribute to the data collection aspect of this work, as I’ve written several programs that have assisted in the collection and storage of huge amounts of data (hundreds of millions of tweets, publication records, etc.) on scholarly (and other) activities. These experiences have allowed me to contribute to the book chapter mentioned above, several articles and presentations, and continues to allow me to contribute to understanding scholarly communication in social and popular media venues.

I’m looking forward to finalizing my thesis and to continue to examine these social and scholarly communication issues in my current research position at UdeM and in a permanent faculty position with future colleagues.