Here we compare word clouds based on Twitter profile descriptions from 2016 Altmetric.com data dump vs Twitter profile descriptions from 2017 Altmetric.com data dump.
A follow up to the Twitter profile description word cloud… I’ve created a hashtag word cloud from the 19.2 million hashtags used in the tweets collected by Altmetric.com
The Python code is VERY similar to the profile description word cloud code, however we have to turn off the ‘collocations’ option in the WordCloud module options to make it work as we expect.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sun Feb 10 16:19:56 2019 @author: tdbowman """ import io import csv import numpy as np from wordcloud import WordCloud, STOPWORDS from os import path from PIL import Image # current directory currdir = path.dirname(__file__) # from https://github.com/nikhilkumarsingh/wordcloud-example/blob/master/mywc.py def create_wordcloud(text): # use cloud.png as mask for word cloud mask = np.array(Image.open(path.join(currdir, "cloud.png"))) # create set of stopwords #stop_words = list(STOPWORDS) # create wordcloud object wc = WordCloud(collocations=False, background_color="white", max_words=200, mask=mask, width=1334, height=945) # generate wordcloud wc.generate(text) # save wordcloud wc.to_file(path.join(currdir, "wc_hashtags.png")) if __name__ == "__main__": # Grab text from file and convert to list your_list =  with io.open('hashtags.csv', 'r', encoding='utf-8') as f: reader = csv.reader(x.replace('\0', '') for x in f) your_list = ','.join([i for i in reader]) # generate wordcloud create_wordcloud(your_list)
Just a quick post to display a word cloud I created as a demo for my Adv Programming students using the Python WordCloud module. I was surprised to find that “Founder” was the top occurring word in the profile descriptions. What does this tell us?
I first had to write a PHP script to grab all the tweet objects from Twitter Search API that were captured by Altmetric.com. I then stored all the unique author profiles in a separate table from the tweets, which gave me approximately 3.4 million unique Twitter users who tweeted about science as captured by Altmetric.com. Of these 3.4 million, there were approximately 2.8 million users who had some characters in their Twitter profile description field.
To return results from the description found in my MySQL table of author profiles, I used the following query because I wanted to remove hard returns and tabs from the profile descriptions.
SELECT REPLACE(REPLACE(REPLACE(TRIM(`description`), '\r', ' '), '\n', ' '), '\t', ' ') FROM `profiles` WHERE TRIM(`description`)!='';
Next, we have the actual Python 3 code I used to create the WordCloud from the 2.8 million user descriptions. You’ll note I added a few extra terms to the STOPWORDS list because I ran this multiple times and found these terms that I wanted to remove from the final version.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Sun Feb 10 16:19:56 2019 @author: tdbowman """ import io import csv import numpy as np from wordcloud import WordCloud, STOPWORDS from os import path from PIL import Image # current directory currdir = path.dirname(__file__) # from https://github.com/nikhilkumarsingh/wordcloud-example/blob/master/mywc.py def create_wordcloud(text): # use cloud.png as mask for word cloud mask = np.array(Image.open(path.join(currdir, "cloud.png"))) # create set of stopwords stop_words = ["https", "co", "RT", "del", "http", "tweet", "tweets", "twitter", "en", "el", "us", "et", "lo", "will", "ex", "de", "la", "rts"] + list(STOPWORDS) # create wordcloud object wc = WordCloud(background_color="white", max_words=200, mask=mask, stopwords=stop_words) # generate wordcloud wc.generate(text) # save wordcloud wc.to_file(path.join(currdir, "wc.png")) if __name__ == "__main__": # Grab text from file and convert to list your_list =  with io.open('all_descriptions.csv', 'r', encoding='utf-8') as f: reader = csv.reader(x.replace('\0', '') for x in f) your_list = ','.join([i for i in reader]) # generate wordcloud create_wordcloud(your_list)
It could use some cleanup and the image could be higher resolution, but it’s a good example for the students how to utilize Python to create a word cloud.
I’m currently reading The Whale and the Reactor by Langond Winner (1986). It was recommended to me by a colleague’s husband when discussing philosophy and IT. I’m into the 3rd chapter and it’s interesting, I would recommend it (at least, thus far). My interests in philosophy of IT stems from my early areas of focus including HCI/UX, social informatics, and digital humanities. While I’m not performing research in these areas, the readings still stay with me and I find myself trying to utilize terminology and methodologies from articles I read some time ago as I examine social media metrics (altmetrics), scholarly communication, and the academic reward system.
During my Ph.D. years, I had the great opportunity to take a course at Indiana University Bloomington where myself and another doctoral student performed close readings of Martin Heidegger, Gilles Deleuze and Félix Guattari, and other prominent scholars discussing philosophy and technology under the supervision of Dr. Ron Day and Dr. Hamid Ekbia. It was a great time and I learned a lot about myself and the way I approach philosophy of IT. While I haven’t had the opportunity to pursue this interest of philosophy of IT, I have continued to shuffle the ideas around in my head and apply what I learned from the course and readings since this time to the way in which I think about scholarly communication, altmetrics, and the academic reward system.
Our Reading List:
- I. September: Martin Heidegger — Being, the world, and techne:
- II. October: Cognition, the world and technology:
- Andy Clark: Natural-Born Cyborgs: Mind, Technologies, and the Future of Human Intelligence (Ch. 1, 2, 6)
- Paul Dourish: Where the Action Is: The Foundations of Embodied Interaction (Ch. 4)
- Lucy Suchman: Human-Machine Reconfigurations (Ch. 4, 5, 6)
- William Clancey: Situated Cognition: On Human Knowledge and Computer Representation (Ch. 1, 12)
- Phil Agre: Computing and Human Experience (Ch. 1)
- Ed Hutchins: How Does a Cockpit Remember? (a paper)
- Ekbia: Artificial Dreams (Ch. 1, Epilogue)
- III. November: Assemblages and Devices
- Féix Guattari:
- “On Machines”
- “Machinic Heterogenesis”
- Gilles Deleuze and Guattari:
- Sections from Anti-Oedipus: part 1 (all chapters); part 4 (chapters 2,3, & 4)
- A Thousand Plateaus: chapters 1, 2, 3 (difficult),4, 10, 11, and conclusion.
- Féix Guattari:
- IV. Early December:
- Gilles Deleuze, Difference and Repetition (chapter IV “Ideas and the Synthesis of Difference”)
- Gilles Deleuze and Guattari: What is Philosophy? Chapter 7 and Conclusion
- Watson, Sean. “The Neurobiology of Sorcery: Deleuze and Guattari’s Brain.” Body and Society, 4(23) (1998).
Social media metrics (or altmetrics) is a relatively new area of study examining the distribution of scholarly articles within primarily social media contexts (which I’ve discussed here before). Of interest are both metrics relating to how often an article is shared in these environments and who/why/how the agents using the online platforms disseminate and consume this information. We, as social media metrics scholars, first used traditional bibliometrics measures to examine counts of social media acts and tried to determine if these counts correlated to an increase in citations of said articles. More recently, social media metrics scholars have begun to utilize theories and methods from other disciplines, including psychology, sociology, and linguistics, to examine these acts. I was very excited to be part of a book chapter (https://arxiv.org/abs/1502.05701) that discussed applying theories from other domains to the study of altmetrics.
When examining the larger picture of the academic reward system, social media metrics could be considered a fourth leg of the stool. We have traditionally considered authorship, citations, and acknowledgements as part of the academic reward system. Yet, the ability to track social media acts relating to scholarly documents has introduced a new means of capturing the consumption and dissemination of these documents. While we do not claim that these acts equate to authorship, citations, or acknowledgments, these activities do represent some form of engagement with scholarly work.
This change in a reward system after many years of relative consistency has brought about much discussion. Part of the driving force behind adding social media metrics to the academic reward system is the notion of “societal impact.” These days, many funding agencies, universities, government entities, and (some) tenure committees are asking scholars to provide some evidence of how their research has had impact outside of academia. One way in which scholars can provide evidence of societal impact is to utilize altmetric-related counts. But, this notion of societal impact is highly contested and there are many different definitions of “societal impact” available.
What I’m now thinking about and trying to consider is a way to discuss these changes in scholarly communication and the academic rewards system utilizing what authors have discussed in the philosophy of IT literature. I believe that useful insights and vocabulary from the philosophy of IT literature can allow us to think about the academic reward system from a new perspective and critically discuss the impact social media and technology has had on the actors and acts performed within the academic system. Going back to The Whale and the Reactor book, Winner seems to focus (so far) on technology and power, which I have a feeling will provide useful insight into how I think about social media metrics (altmetrics) and the academic reward system. It brings to mind Latour’s Actor Network Theory and the agency of technology in a system.
More to come soon…
Last week I was lucky enough to attend the 2:AM Conference in Amsterdam. The conference was focused on altmetrics–a type of metric that is typically calculated based on scholarly communication events captured in online contexts (e.g., events in Twitter, Mendeley, Wikipedia, etc). For some time I’ve been critical of the term “altmetrics” because I had taken it to mean “alternative to citations,” but after this conference I’m not so confident in my previous position. Altmetrics is an umbrella term that we use to help describe the type of research we are doing (at least those of us that research these things), it is a buzzword that others use to talk about scholarly communication in online contexts, it is a term that the media has used, it is currently used in organizations, libraries, universities, and companies to promote scientific work, and it has become a term that somehow represents the potential for measuring impact outside of the academic machine (other than scientific impact). While it has been criticised many times in the past for being the wrong term, I am not sure there is a more appropriate term… and that is fine. We have had suggestions including social media metrics (Haustein, Larivière, Thelwall, Amyot, & Peters, 2014), complimetrics (complimentary metrics) (Adie, 2014), influmetrics (influence metrics) (Cronin & Weaver, 1995; Rousseau & Ye, 2013), and more traditionally webometrics (Almind & Ingwersen, 1997), to name just a few, but these do not seem to be any better and also do not seem to possess that something that “alt”metrics seems to possess.
I dabble in linguistics and I believe that words are of vital importance to our ability to understand and discuss the same phenomenon (especially in science), which is why I was so adamant that “altmetrics” was the wrong term to be using. But then I took another look at the altmetrics manifesto (the 5th anniversary of this important object was celebrated at the conference) and reevaluated my own position based on my accumulated knowledge in the field, what I learned at this conference, and a closer inspection of the manifesto to come to the realization that altmetrics is fine when you think of it as an “alternative means of measuring scholarly communication.”
The conference venue was great as we were housed at the Amsterdam Science Park, a sprawling complex on the eastern side of Amsterdam. There were quite a few attendees and the presentations and workshop were informative and thought-provoking. Many of the primary data providers, publishing companies, metrics providers, and others in this field sent representatives including Jason Priem (impactstory.org), Euan Adie (altmetric.com), William Gunn (mendeley.com), Greg Gordon (ssrn.com), Martin Fenner (niso.org), and Geoff Bilder (crossref.org). In addition, the four authors of the altmetrics manifesto were in attendance to celebrate its 5th anniversary– Jason Priem, Dario Taraborelli, Paul Groth, and Cameron Neylon. I was able to speak with both Jason and Cameron and they were engaging, down to earth people who are great scholars and excited by the future of scholarly communication (I wasn’t able to speak with Paul or Dario at such length).
What I gleaned from 2:AM was that there was an ongoing discussion from multiple perspectives taking place regarding the ability for altmetrics to measure impact, the types of impact there might be for scholarly communication, and the importance of trust when considering the reasons behind altmeteric events. In addition, I am looking forward to be a part of a group (formed at the”theories” conference breakout session) that will write a white paper describing and defining common terms used in altmetric research for the purpose of allowing others outside of our community to understand and contribute to the ongoing work in the field. I also learned that many in the field had read our book chapter (arXiv:1502.05701) on applying citation and social theories to the understanding of altmetric events–they were very supportive of our efforts to put forth this first attempt at developing a framework for understanding altmetric events. Yet we all know that much more work needs to be done and hopefully this white paper will be a nice step in that direction.
What I also learned from listening to Jason Priem, Dario Taraborelli, Paul Groth, and Cameron Neylon was that our group has somewhat ignored the important component of the manifesto, which is talking about altmetric events as a type of “filters” for scholarly research and communication:
No one can read everything. We rely on filters to make sense of the scholarly literature, but the narrow, traditional filters are being swamped. However, the growth of new, online scholarly tools allows us to make new filters; these altmetrics reflect the broad, rapid impact of scholarship in this burgeoning ecosystem. We call for more tools and research based on altmetrics. (Priem, Taraborelli, Groth & Neylon, para 1, 2010)
This is an important aspect that I too simply took for granted and something I need to reflect on as my understanding of these phenomena continue to grow and change.
Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A manifesto. October 26, 2010. Retrieved from http://altmetrics.org/manifesto
The current state of scholarly communication is in flux as various avenues for the consumption and dissemination of ideas, discussions, and research continue to be developed and then adopted by scholars. These contexts offer different affordances (Gibson, 1977), or possibilities for action provided by the platform, and offer different types of networks with which a scholar can view and interact. Affordances found in these environments typically include the ability to share information in a particular way (e.g. tweet, Facebook post, blog, comment, post links or media, hashtags, etc.), consume information, create a profile (public, private, or mixed), and connect with other users of the platform. There are various types of networks represented across these platforms, including blogs (external facing networks), Facebook and Twitter (social networks), and Wikipedia (interconnected networks). A scholar can present herself on these online platforms along a continuum ranging from personal to professional.
Problems can arise from interacting within these online contexts as the information is disseminated to a vast unknown audience, it is archivable, it is searchable, and it can be copied and removed from the context in which it was originally published (boyd, 2006). This can prove damaging to the reputation of a scholar and can lead to shame, punishment, or dismissal as seen from recent examples. In one example, a scholar who had been offered a tenure-track position at the University of Illinois, Urbana-Champaign, had this same offer rescinded after several tweets made by the individual were deemed anti-semitic in nature by the university board (Jaschik, 2014). In another example, a professor from the University of New Mexico was put on probation and given counselling after tweeting an offensive remark about Ph.D. applicants (Ingeno, 2013). There have been other examples of these types of infractions from Facebook and from blogging.
Before the rise of these massive online networks, scholars already found it difficult to manage the boundaries between their personal and professional lives. The introduction of online contexts in which a person can interact with vast audiences exacerbates the situation for scholars as they (often) already are maintaining a tenuous balance between their personal and professional identities from their time spent mentoring and teaching students in and out of the classroom. The boundaries between personal and professional are changing; what was considered personal interactions outside the classroom now have been thrust into the spotlight partially because of the new networks in which scholars interact. This relationship between the changing personal and professional boundaries of self-presentation and the size of the network and proximity of the nodes has not been adequately discussed.
Goffman (1959) discussed the acts of self-presentation and impression management in his social research as acting out a particular role for an audience and maintaining that role across time. These acts rely on various aspects including social norms, rules, and context to be effective. You could interpret Goffman’s writing in a way that suggests he considered the network and it’s significance to people in their day to day lives, as he (Goffman, 1961, p. 127) noted later that “[w]hen seen up close, the individual, bringing together in various ways all the connections that he has in life, becomes a blur.” He knew that boundary maintenance was a crucial component of self-presentation and impression management, as he divided the act of self-presentation into three different regions: front stage, back-stage, and the outside region. What he did not directly speak to was the actual size of the network and the influence this would have on the boundaries between these regions.
Related to this, Mehra, Kilduff, and Brass (2001, p. 131) argued that while a large network “can enable the individual to access numerous others for information and other resources,” they warned that “[p]eople who interact with numerous others in organizations run the risk of running short of time and other resources” In addition to the time and resources used to maintain large networks, scholars run the risk of further blurring the boundaries between their personal and professional selves. I want to further investigate this relationship between networks and self-presentation and impression management and the blurring between personal and professional.
boyd, d. (2006). Friends, Friendsters, and MySpace Top 8: Writing Community Into Being on Social Network Sites. First Monday, 11 (12)(12), 1–15. Retrieved from http://www.firstmonday.org/issues/issue11_12/boyd/index.html
Gibson, J. J. (1977). The Theory of Affordances. In R. Shaw & J. Bransford (Eds.), Perceiving, Acting, and Knowing: Toward an Ecological Psychology (pp. 127–143). Hillsdale, NJ: Lawrence Erlbaum.
Goffman, E. (1959). The Presentation of Self in Everyday Life. New York: Anchor.
Goffman, E. (1961). Encounters: Two studies in the sociology of interaction. Indianapolis: The Bobbs-Merrill Company, Inc.
Ingeno, L. (2013, June 14). Outrage over professor’s Twitter post on obese students. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2013/06/04/outrage-over-professors-twitter-post-obese-students
Jaschik, S. (2014, August). Out of a job. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2014/08/06/u-illinois-apparently-revokes-job-offer-controversial-scholar
Mehra, A., Kilduff, M., and Brass, D.J. (2001) The social networks of high and low selfmonitors: Implications for workplace performance. Administrative Science Quarterly, 46(1), pp. 121-146.
I’ve been thinking a lot about what Science really means to me and what the philosophers of science have said about the system of science. I love Newton’s famous notion about “standing on the shoulders of giants,” but I don’t necessarily see it in that way… especially in my line of research investigating altmetrics and scholarly communication.
It’s a blustery evening in Finland and I am watching the trees bend and shed leaves in the strong breeze while thinking about this. It seems to me that the system of science resembles an ecosystem in which we try to make our lives meaningful and to shed light on our surroundings. We do, of course, use the work of others to view things through their eyes, but I don’t see myself standing on their shoulders and reaching for the stars. Instead I see myself as a small sapling, struggling for nourishment in a vast forest. At the same time, I view those before me, especially those marvelous minds from which I borrow, as large trees that shade me from the sun and break the harsh winds blowing over me. I see the trees of Goffman and Gibson, of Heidegger and Kant, and on and on, in my part of the forest. These solid, long standing trees protect me and nourish me, allowing me to grow and to become a tree myself.
As scholarly communication and science has changed, so too has the ecosystem. We are no longer simply trying to aspire to being the trees that provide the root system of science, we are also trying to spread and have an impact outside our forests. I feel like we are now flowering trees, making pollen that can be carried away to the farthest fields with hopes of having an impact on our surroundings. We have evolved to make use of the technologies that have become a part of our world, to attract the attention of others so that they can carry our pollen away. A large part of this new technology and ecosystem is the internet, specifically social media and other online sources of information. Social media users are the bees that we need to spread our pollen, our information, outside of our isolated forests. What the bees are doing with this information, we don’t yet know. But what we do know is that they can spread it faster and farther than ever before.
Through my work I hope we can figure out where our information is being spread and what kinds of impact we are having on society.
I have finally finished my Ph.D. Yay. I graduated from the School of Informatics and Computing, Indiana University, Bloomington at the end of July, 2015.
After seven years of contemplating social structures, norms, behaviors, communication, and the ways in which people use the affordances of social media, I was able to successfully defend my thesis in front of four of my peers and a handful of students in May, 2015 and make the required minor revisions and formatting changes to submit the final version of the document to the graduate school at the beginning of July, 2015.
It has been a long, rewarding journey and I am happy that I completed it. I have been able to travel around the world, move to two countries, and meet some extraordinary scholars, travelers, and neighbors. It’s been quite an adventure, one which I hope continues as I progress in my career as an academic. Thank you to everyone for the support and love throughout this process.
I’m now in Finland working with great scholars and looking to improve my abilities as a scholar, researcher, teacher, and coworker.