How the most useful background literature can be found?

Introduction

Knowing and following what others have written about (and around) your own research topic is the basic requirement for any academic project. But how should this be done? Some things are obvious: this task involves, at least 1) searching for possible texts to read; 2) scanning them to filter the good ones out from the mass of less relevant ones; 3) reading the most promising ones; and 4) building syntheses about this mass of texts.

The list is rather simple, but it is clear a lot of complexity underlies these steps. Some of the questions are:

What tools should be used for finding texts?
What types of texts are out there?
What indicators reveal what texts are more credible than others?
How these texts can be downloaded?
How should the article collection be maintained?
How can the synthesis be generated out of a collection of texts?

This blog post focuses only on questions 1–3 and a bit on question 4. I have answered to question 4 also separately in a different post. The others may be looked at later.

1. What tools should be used for finding texts?

At least the following ways exist for finding literature:

1. Standard Google search helps you find a very eclectic mix of texts that have a varying quality and underlying intentions. Very few of the papers found this way are academic papers. Instead they can be, for example, memos written by thinktanks and lobbyist groups, governmental bodies’ reports, press releases, essays written by students. Occasionally, also academic papers can be found with standard Google search, but what one finds is very unpredictable.

While what one finds using this method may be useful, it is usually best to regard contents found this way more as “data” rather than as research knowledge.

2. Google Scholar. This is the best tool for exploratory search for literature: for those situations where you want to find out “what is out there”. Using Google Scholar is like using standard search, but the results are different: they are from academic sources, such as journals, conferences, and books. The search results also contain additional information that help you interpret which papers are better to investigate in more detail

Because Google Scholar is a great tool, its use is covered in more detail below.

3. Snowballing. Academic papers always contain a section called References that lists all the other research that has been cited. By reading the paper and finding out what earlier works are cited, and what the writers say about them, it is possible to get to the sources of knowledge. This helps you find the “must-reads” of the research topic. The problem is that snowballing works only backwards in time: it does not help you find the most recent research.

4. Content alerts. It is possible to ask journals and conferences to send email to you every time they publish a new issue. This is a good way for staying up to date on the most recent research. The problem is that this brings a lot of email to your mailbox, and every issue does not contain articles that you would be interested about. Google Scholar, however, lets the user create keyword-based alerts: send you email every time it finds new research that matches given keywords. You can turn on this feature by clicking on Create alert button in Google Scholar in the left side of the screen (see the image below).

For a PhD students and researchers who need to keep themselves up to date about a research area over a long period of time, this is an essential feature to use.

5. Databases. EBSCOHost, Proquest, ABInform, IEEExplore, Scopus, ACM Digital Library and other databases are great for systematic literature reviews when you know exactly what keywords to use and what journals and conferences to include in your search. But because each database only covers certain journals/conferences, and usually disregards books completely, they are not optimal for exploratory search for knowledge. For that, Google Scholar is much better and nicer to use too.

2. What types of texts are out there?

Given that I recommended Google Scholar as the primary tool for searching texts, I will focus on its use from now on more than on the others.

Academic peer-reviewed papers

In the above, the main characteristic that I mentioned as the difference between the standard Google search and Google Scholar search was that the latter one finds only “academic papers” instead of just any texts or search hits. It is therefore important to define what is unique in academic papers.

The main characteristic of an academic paper is that it is “peer reviewed”. This means that there is a particular editorial process (“review process”) that the paper has undergone before it has been published in a journal or a conference. In this process, It has been examined by a jury of researchers and the authors have had to improve the paper until it has met the necessary quality requirements. Without an exception, the process includes at least one cycle of improvements: the authors have first sent (“submitted”) their paper for review, the members of the jury have evaluated it by writing statements about it, and the authors have been asked improve the paper. Alternatively, the authors have received a “reject” meaning that this process is terminated, and they have to find a different journal/conference that may be willing to publish the paper. They have to start the process again with that other outlet. If the paper, however, was considered promising enough, the second cycle starts when the improved version is received from the authors. The reviewers will evaluate whether the changes are sufficient, and provide further comments. In conferences, one cycle is common; in journals at least two cycles is the norm. In every stage, the possibility of a reject is always possible.

Usually the review process is “double-blind”: the authors do not know who will read their paper, and the members of the jury (i.e., “reviewers”) do not know whose paper they are reading. The communication between the authors and the reviewers is handled by an editor who is a senior researcher in the field, and is responsible for keeping up high standard of this process. The blindness increases the neutrality of the process: even famous academics’ papers can be rejected, and the reviewers do not need to face the consequences of furious authors who are angry at the rejection of their paper. Most papers are rejected; good conferences typically reject 70-75% of the submissions, for example. Top journals reject a larger percentage than that.

The heaviness of the whole process makes paper publishing a slow business. To publish a paper in a good journal often takes at least 2 years, with 3–4 cycles of improvement. But it ensures much better quality for the content, compared to papers that have not had a review process. Thus puts the academic papers apart from other materials that standard Google search can offer.

Books and book chapters

Books are another common type of academic texts. They exist in two kinds: full books that have been written by the same group of people from the beginning to the end, and edited collections where different chapters have been written by different authors. Edited collections have editors who have gathered the texts together and have usually had at least some form of peer review process in the book chapters’ preparation.

Other sources of academic-like texts

There are also semi-academic papers: ones that have been written by researchers, but which have not undergone the review process. These include research institutions’s “white papers” and technical reports, as well as texts that accompany presentations given in research seminars and workshops. These texts are usually published only in a website, instead of in a journal or a conference proceedings.

There are also papers that have been submitted for a review, but which have also been saved in a public repository such as Arxiv, Biorxiv, Citeseer or SSRN. Although doing so breaks the blind review policy, in some fields of science this is accepted and widely used practice. One of the reasons for this practice is the competition within the scientific field: researchers compete for being the first ones to make a certain finding. They do not want to wait the 2 years in the review process before they can tell about the finding. They may also fear that an anonymous reviewer steals their idea, replicates the study, and publishes it as their own. Public archiving protects authors from that.

Finally, papers are also available from ResearchGate and Academia.edu. These are self-archiving repositories where researchers sometimes upload copies of their published works, or where they just publish their research, thereby bypassing the review process. The quality of the content in ResearchGate and Academia.edu varies wildly, and needs to be verified: has the paper been published somewhere, or has it been only uploaded here?

3. What indicators reveal what texts are more credible than others?

It so far seem that just using Google Scholar ensures that every text has the required quality and can be used as a good piece of literature. The truth is not that simple: there are conferences and journals with different levels of quality. Some papers, even if they are per-reviewed, have low quality. Using just any source that one finds may lead to 1) misleading directions; 2) unnecessary amount of work.

There are three simple indicators for finding out which paper is more worthwhile to read than others:

The exact topic of the paper

This is the simple one: it is better to read papers whose titles and abstracts have a good fit with the information that one is looking for. Google Scholar presents the titles of the papers very clearly. In addition, the abstract of the paper can be inspected by clicking at the title. It either shows a popup window or takes the user to the publisher’s website.

The number of citations

Good papers end up usually cited more often than others by other researchers. The number of citations is the total count of all the other papers that cite a given paper. In the following screenshot, for example, Google Scholar tells that the 4th search result has been cited 2325 times by other researchers while other papers have been cited much less. This tells that Dorst and Cross’s paper published in Design Studies is probably more appreciated by researchers than other papers, when it comes to “framing in design process” as the topic.

Screenshot of a Google Scholar search result

Example of a search result in Google Scholar.

Citation count is a good indicator for choosing which papers are “must reads” and give the most relevant information.

The quality of the journal or conference

Although the citation count is a good indicator, it works poorly especially in the evaluation of the importance of very recent research. Recent publications have not had a chance to accumulate citations yet, and seem therefore less relevant. In addition, sometimes there are no publications that would be highly cited, because the research area that the user is interested about is very particular and not much researched.

Then the user should look at the quality of the journal or conference that has published the research. For most journals, it is possible to find what its impact factor is. It is a value that is computed by the number of citations that the papers in the journal gather on average over time. Clarivate Analytics’ JCR (Journal Citation Records) is the most often used impact factor service. It was earlier known as Thomson Reuters. JCR is not publicly accessible: one needs to access it through an university library. Aalto University users can click here to access JCR.

The impact factors for journals range from 0 to several dozens. For example, in the top, impact factors for New England Journal of Medicine, Lancet, Nature and Science are currently 75, 60, 43 and 42, respectively. The problem with the impact factors is that in other fields the best journal may have much lower impact factors. In HCI, Human-Computer Interaction has the highest impact factor, which is currently 4.2. In design research, Design Studies is the leading journal, and its impact factor is 2.8. Design Issues – another good one – is not listed at all, surprisingly. These differences do not mean that design or HCI journals would be of poorer quality than natural science journals – fields cannot be compared based on their journals’ impact factores. Many factors affect the impact factor, including the publication volume in the field, peer competition, the status of conferences or books as reputable publishing outlets, and the centeredness of the field around only a handful of journals, for example.

All this just means that impact factors are meaningful only if one knows already what the range the values is in a given field. In addition, Clarivate Analytics’ JCR does not provide impact factors for conferences, which makes its relevance to HCI much less meaningful.

A better approach, at least in Finland, is to use Finland’s own academic ranking system called “JUFO” – short for “Julkaisufoorumi”. It ranks every journal and conference using 4 levels:

3 = the top journal/conference in its own field
2 = a really good journal or conference
1 = other journals and conferences that have a peer review process
0 = journals and conferences that are known to exist but which cannot prove that they follow the sufficient academic review standards

Generally speaking, any paper published in journal or conference of level 2 or 3 has content that can be considered seriously. Many level 1 outlets are also really good, but there the quality varies a lot. Level 0 conferences and journals should not be used as references. JUFO can be accessed here.

Summary

When you search for literature, you can use the following process:

Try different kinds of search terms in Google Scholar. Often you do not manage to use the best search terms at the first attempt.
When you seem to be getting promising results, look at the titles and the citation counts: they tell which papers are 1) best matches with your interests and 2) most valued by other researchers.
If all citation counts are low, look at the outlets: which conferences and journals have published these works? Prioritise ones whose JUFO ratings are 2 or 3. Consider also ones that have a JUFO 1 rating.
Download every promising paper on your computer.

4. How these texts can be downloaded?

The last step above involves a challenge that will be addressed in the blog post: a vast majority of academic papers is not freely available. Instead they are available from publishers who sell them to universities with a subscription fee. To see, download and read them, one needs to use a university’s authenticated Internet connection.

I have written about this separately too, but my quick advices are to 1) look for links that Google Scholar marks with [PDF] – those are freely accessible papers; 2) tunnel your internet traffic through a university’s VPN service. That makes the publishers open most of the doors for you. Then you can find links with “sfx” in them – they are contents that your university has subscribed; 3) use your university’s article search service: copy the paper title and paste it to the university’s search engine. If the paper can be accessed, you get a link where you can download the paper. Here is the link to Aalto university’s search interface.

See how the same search result page as above has changed when I have used VPN:

Screenshot of Google Scholar's results when using VPN connection

Read also my other blog post to find out how to access and download articles when they are not openly accessible.

Writing about Design

Principles and tips for design-oriented research