Everyday altmetrics

Show me the numbers

A while ago I was contacted by an academic person from one of the Schools of Aalto University. They wanted to have a quick look on how their publications from last year had performed altmetrics-wise.

We do not yet have any commercial solution that would show this. My snapshots from 2014, 2015, 2016-2017, and 2018 do not update themselves. At some point I had in mind to build a database driven application but there’s only so much time.

Yet what we do have, is our CRIS which is as much up-to-date as this type of service can be. Instructions to show the Altmetric badge are embedded in every page of the portal, so whenever there is something to show, the iconic doughnut is rendered. However, this is the grass root view to one article only. What we need is to go one or more levels up in an organization tree.

To build a report that bundles research output by their home of origin (department, research group, or centre of excellence) – “managed by” in CRIS parlance – I first need a boilerplate CRIS report on publications: School, department, research group, DOI, title, etc. Run it, open the result in Excel, and save as HTML. The goal here is to get the Altmetric badge to appear (or not) on every relevant row of the HTML table, next to a DOI. For this I need to add two things: link to Altmetric’s Javascript file, and a placeholder div element with some attributes. How to do this?

There are obviously many options from command line tools to programming languages. Here I show some solutions with sed, awk, XSLT, Javascript, and Python. Thanks to the solid programming skills of our new roommate, the solution we actually delivered (in 15 minutes) was the Python one. Here is an example of what the result looked like. A quick and dirty solution but does the job.

Forget the numbers, show me the tweets!

As useful as the metrics themselves can be, the real thing lies in the human action. Who said what about the article? In which manner? What sort of interests do these Who have? What we need is a way to represent data from Twitter, this social media giant that has become important also in communicating about science. Altmetric kindly shows latest tweets for free, but to see all of them you’d a need a license.

While at it, I’d like to mention that a good and timely read on the What, Where, How, When and Who of academic Twitter is the Altmetric blog mini series by guest authors Stefanie Haustein, Germana Barata, Rémi Toupin and Juan Pablo Alperin.

So, let’s put our focus on Aalto University publications since 2017.

With one of our CRIS standard reports I get the listing of all DOIs. FYI, roughly 80% of all Aalto University publications since 2017 have got a DOI, which is not bad.

With this list of DOIs, I then turn to the rich source of CrossRef Event Data. The work is very easy thanks to the crevents R client by the awesome people from rOpenSci. In no time (read: a weekend-ish) I had the data ready, including the status IDs of the tweets. Feeding those to the lookup_statuses function of rtweet, I get back a whopping amount of information on each tweet. The most tricky task here (for me) was to understand the data model of retweets. Anyway, from then on it was fairly easy to build, on top of the tweets, a standard Shiny web application where the user can drill into the organization. Thanks to the advanced features of DT, rows can be sorted and filtered interactively by column.

Note that I left out those articles that CrossRef did not return any tweet info about.

To add a little metrics sugar I present, from each selected unit: the most tweeted article; median number of tweets; the tweet with the longest life span so far; and the median life span.

Some ideas for poking around:

  • sort Time span, or adjust the slicer in the filter. Span shows the time difference in days between the first and the last/latest tweet to that article. Note that a single tweet shows as 0.
  • Description is the About text of the Twitter screen name aka account. Try e.g. different occupations, or a substring of, like journalist, professor, dr., teacher, hashtags such as #health or #brain, or emojis like ⚽, 🚁 or 🇨🇷
  • Location could potentially be of interest but tweeters need to opt in to use the service, which BTW is a good thing
  • you can use multiple filters at the same time. For example, you might ask yourself:”Which professors (or accounts claiming to be one) whose tweets span over a month, have the most followers?”

Sometimes the search box serves better than filters. For example, to find all about energy be the word then in articles, tweets, screen names, or descriptions – use search.

CrossRef Event Data is a warmly welcomed service! Besides Twitter, other interesting data sources are e.g. Wikipedia and Newsfeed.

Note that CrossRef and Altmetric can return different results. I haven’t done any thorough comparison but one particular article got my attention. The Altmetric badge knows that there are a lot of tweeters on this, yet CrossRef finds only few. Turned out that preprints (ArXiv in this case) are not that well covered by CrossRef.

Half of all tweets are retweets, and the average document [in Altmetric] has a tweet span of 81 days. My small sample follows these patterns quite well: retweets 63%, median tweet life span 81.8 days.

R code is available at GitHub.

Posted by Tuija Sonkkila

About Tuija Sonkkila

Data Curator at Aalto University. When out of office, in the (rain)forest with binoculars and a travel zoom.
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

Comments are closed.