Bangor et al. (2008) An Empirical Evaluation of the System Usability Scale

Bookmark and Share

Bangor et al. provide a nice description of System Usability Scale (SUS). Their article is based on almost 10 year’s worth of SUS data (more than 200 SUS studies). SUS is an interesting tool. It provides a reference score for participants subjective view of a products or services usability. SUS includes 10 questions that are analyzed as a whole. The scores for individual questions are not necessary meaningful yet the score of the whole SUS questionnaire reflects the participants view of the product’s usability. According to Bangor et al. SUS is not biased against certain types of user interfaces or against gender.

Bangor et al. identify six major usages for SUS:

  1. Providing a point estimate measure of usability and customer satisfaction
  2. Comparing different tasks within the same interface
  3. Comparing iterative versions of the same system
  4. Comparing competing implementations of a system
  5. Competitive assessment of comparable user interfaces
  6. Comparing different interface technologies

Bangor, Aaron , Kortum, Philip T. and Miller, James T.(2008) ‘An Empirical Evaluation of the System Usability Scale’, International Journal of Human-Computer Interaction, 24: 6, 574 — 594 LINK

Posted by Petri

This entry was posted in Journal article. Bookmark the permalink.

3 Responses to Bangor et al. (2008) An Empirical Evaluation of the System Usability Scale

  1. Pingback: UI再進化-使用者介面檢測服務成果交流會 | Michael Hudson

  2. GGHF says:

    We used SUS in our IDE project to have some kind of quantative data identifying the effects of the improvements we made to the system. I think the most interesting part of the Bangor et al. article was the meaning of different scores. Even though the improvement was less than 20%, with the framework provided by this study we were able to give a meaning to the improved score. This also helped us to communicate the meaning to the customer. Even with the arguable payback and correctness of the SUS, our experience of it was positive as a whole. At least it provides some direction even when you are unable to perform tests with other mesurable parameters, such as error rate.

    +1, GGHF likes this

  3. Joakim says:

    Bagor, Kortum and Miller has written an article on the System Usability Scale (SUS) that is well worth reading for anyone using or intending the use SUS in usability evaluations. SUS measures perceived usability by asking users to rate ten statements about the usability of a product on a scale of 1 to 5, generating a combined score of 0–100. Using their slightly modified version of SUS, the authors analyze data from more than 2300 surveys in over 200 studies. They notice that SUS scores on a per study basis are mostly limited to the range of 50–100, with only 6 % of scores below 50. The study means for the first, second, third and fourth quartiles are 62, 71, 79 and 94, respectively.
    The authors also analyze the means and standard deviations of scores for the individual SUS statements (adjusted so that a higher score always means a more positive rating). They notice that some statements, especially the statements written in negative form, tend to get lower scores. This suggests the possibility of some sort of bias. However, all ten statements are very high correlated, and factor analysis generates only one significant factor. This indicates that it is useless to look at the scores of SUS statements individually, even though stakeholders often request that.
    Bagor et al. analyze how SUS scores are affected by user interface type. The mean SUS scores in their data are highest for desktop GUIs (75), lower for interactive voice response systems (74) and customer premise equipment (72), still lower for web sites and applications (68) and cell phone equipment (67), and lowest for combined web and voice response interfaces (60). They also notice a significant but not very strong negative correlation of SUS scores with age (i.e. older people tend to give lower scores), but no correlation with gender.
    Finally, the authors discuss the issue of assigning adjective labels to SUS scores, so that they are easier to comprehend for stakeholders. Based on SUS surveys where participants rated the overall user-friendliness by one of seven labels (in addition to the standard SUS statements), as well as the quartile scores presented earlier, Bagor et al. suggest a scale where scores below 50 are unacceptable, scores above 70 are acceptable, and scores between 50 and 70 are marginal, which means that the need for improvements should be based on the usability goals of the product.
    Even though designers and developers seldom benefit much from a single metric, representing the usability as a single number that can be compared among products or over time is often critically important for managers and other stakeholders that don’t care or have time for the subtleties of usability. In a recent project I participated in, the product manager especially thanked us for our SUS and success rate metrics, and mentioned that she will try to use these to influence upper management. Although there are several good alternatives to SUS, it is useful for the sake of comparison to have a single widely used metric, and based on the article by Bagor et al., SUS appears to be suitable for this purpose.

Leave a Reply