Last week, I had the opportunity to attend the two-day 18th Nordic Workshop on Bibliometrics and Research Policy 2013 at KTH Royal Institute of Technology in Stockholm. Tweets from the workshop as a short Storify story.
Together with my colleagues Irma and Eva, we had a 20 minute presentation on Altmetrics, see our slides at SlideShare. If you are interested in the technical bits and pieces, here is the R code for collecting the data, and building an interactive web app with the Shiny web framework.
Some remarks on the app, procedure, limitations etc
- EDIT 19.12.2013: ImpactStory widgets will cease to work mid-January 2014, so I’ll edit the code accordingly. ImpactStory is concentrating in profiles.
- the total number of Aalto University journal articles published in 2007-2012, indexed by Thomson Reuters Web of Science and having a DOI, is around 8000
- with these DOIs (and publication year, and number of cites) at hand, I queried first the Altmetric API. The result was a bunch of raw counts from roughly 5% of the publications. Note that the whole concept of altmetrics is just few years old, and services are even younger, so low percentages are no surprise really
- from all metrics returned, I selected a handful, and merged them with the WoS data by the DOI
- at first, I had in mind of making a similar query, with the whole set, at the ImpactStory API as well. I did try. At 18%, something went wrong (bad data), and the query stopped. After a couple of retries I gave up, and decided to continue with only those publications that Altmetric had found data to. This much smaller query went through just fine. Again, I selected some metrics from the ImpactStory result, and merged them with previous data
- as a third service, I queried also the PLoS ALM API. Selection. Merge.
- empty cells mean that the score is not available. In other words, the API has returned an NA value. For those of you who have seen an earlier version of this app, note that in here NA’s are not replaced by zeros like before
- because the Altmetric data acted as a kind of a starting point of my application, I choose their site as the landing page for the articles. The link text is a shorthand of their URL
- most of the data dates back to late October. The only notable exceptions are ImpactStory badges that are rendered as the last column of the HTML table
- there are two types of numeric data: raw counts like pageviews or saves or tweets, and the Altmetric score. So, even though you can choose a stacked version from the interactive barchart below right, it doesn’t mean you always should
- the idea behind this whole exercise was to lay out a smorgasbord of various metrics, an interactive web sandbox for looking at the numbers and badges and scores, and maybe do some preliminary observations. Like Scott Chamberlain has recently shown (PDF), there are differences and inconsistencies in score counts provided by different altmetrics aggregators, and I was interested to see how these might surface within our publications. The providers are not denying this fact; they gather metrics independently, sometimes via different sources like in the case of tweets. The master data, as it were, is somewhat of a moving target
There were few things on the app I pointed at during my talk at KTH.
If you choose Mendeley as the Top10 metrics, and compare counts with WoS citations, you’ll notice that the figures are somewhat on the same level. Of course, the sample here is just 10. Still, proper studies have shown that there is a moderate correlation between Mendeley reader counts and Web of Science citations. Now, based on this, we might predict that the newest article in my data, published in 2012 and so a newcomer from the WoS perspective – citations tend to accumulate slowish – will get more citations in the future.
Choose Topsy tweets as Top10, and select the Altmetric score to be shown at the same time. Note that you need to click the Top10 metrics checkbox too, otherwise it will be rendered as a column of NA’s (a shortcoming of my app). The tweet count is provided by Topsy via ImpactStory, as indicated by the acronym. The first article has gained quite a bit of attention. The ImpactStory badges show that, so far, compared to a random set of articles indexed by Web of Science and published the same year as this one, it is highly cited and saved by scholars (blue) and highly discussed by public (green).
The Altmetric score of this article is 182. If you’d choose the Altmetric score as Top10, you’d notice that this is in fact the highest value in my whole sample data. What is the article about? Have a look at it. I think you’ll immediately understand why it has been popular. But wait, is this the whole story? Altmetric explains that their score reflects both the quantity and the quality of attention. Scholarly attention weighs more than the public one given the distinction can be made, which is not always the case of course. So, we don’t know the algorithm behind the score. But, if we are to trust Altmetric, and why wouldn’t we, this article do is ranked relatively high among the scholarly community. The ImpactStory badges are not telling a different story either.
182 is just a number, and even in the Altmetric database it isn’t a particularly big one. This blog post tells about the most popular paper on the Internet, according Altmetric.
During my KTH talk, there were still few buggy values in the web app table. I had also found out that some services were no longer collected by the providers or that they were outright history, like Postgenomic. For those of you interested, the now depreciated version is available here.
The most obvious sign of bugs was that the Twitter counts returned by the Altmetric API were oddly low. So where the counts in the column Any type of posts. The problem turned out to be in my version of the rAltmetric R package. When I compared the version I had installed from the nearest CRAN mirror – CRAN is a network of servers via which R packages are traditionally distributed – and the version at the GitHub development repo, I saw that the GitHub version had some changes in those functions that parse the data. When I installed that version with
and ran the code again, the problem was solved, and scores are now correct. Fine! Anyway, although I’ve been delivering bad data – sorry about that – this has been a useful altmetrics lesson with R.
As you’ll notice, the Altmetric score was still 200 in my old table. Although, as I explained above, there were some buggy values in the table, to the best of my knowledge this score has been OK all the time. So why is it now just 182? We cannot say for sure. There are at least two possible explanations. Firstly, it may be that one of the scores is or was faulty in the Altmetric database. Exceptions do occur, sometimes data can be missing, etc. Note that maybe the fact that the score value now equals the number of tweeters, is just a coincidence. But – it may also be the case that the quality of attention, as it were, has dropped. Unfortunately, I don’t have my earlier data sets available any longer, which is a shame, only the final values.
While at it, I had a look at the timestamps that Altmetric returns. They are three: published_on, added_on and last_updated. The R code below transform these from UNIX time to something we humans can understand.
> as.POSIXct(metric_data$published_on, origin="1970-01-01")
 "2012-01-01 EET"
> as.POSIXct(metric_data$added_on, origin="1970-01-01")
 "2012-11-07 14:34:54 EET"
> as.POSIXct(metric_data$last_updated, origin="1970-01-01")
 "2013-10-22 23:06:43 EEST"
It seems that published_on is always 2012-01-01 so it does not refer to the publication of the scientific output but rather is a kind of default value. Added_on is the date when metrics about the DOI has been collected by Altmetric for the first time. The preprint of the article in question – the one with the highest Altmetric score – was submitted to ArXiv on 2012-01-27. Scrolling the tweets on the Altmetric landing page, where the tweets are in non-chronological order, I notice that there are tweets in January 2012 already. At Topsy you can search tweets by the title of the article. If you choose All time, you can sort the result by oldest first, and see that the first tweet was sent on 2012-01-29. No surprise that Altmetric didn’t collect that tweet right away; they were not in business yet. Finally, the last_updated timestamp tells that some metrics have been modified 22 October.
Let’s add to the web app table all those metrics that Altmetric shows on their page: Twitter (via Altmetric, which returns the number of tweeters not tweets), Facebook, Science news outlets, Blog posts, Google+, Mendeley and CiteULike.
All other scores are almost exactly the same as they were earlier except that the Twitter and blog counts of late October are missing. The tweet count via Topsy is the same though. The Altmetric API returns some stats about how the score of the DOI has developed over time, but all the changes there are positive. So, although I cannot prove anything, perhaps we can make an educated guess and say that changes in the number of blog posts or tweets by general public (or a bot) may have reduced the Altmetric score from 200 to 182.