An Actionable And Reliable Usability Questionnaire With Only 7 Items: Inuit — Smashing Magazine

By admin
An Actionable And Reliable Usability Questionnaire With

Only 7
CARDINAL

Items: Inuit


16 min
TIME

read

Share on Twitter, LinkedIn

Inuit (short for “

Interface Usability Instrument
ORG

”) is a new questionnaire you can use to assess the usability of your user interface. It has been designed to be more diagnostic than existing usability instruments like, e.g.,

SUS
ORG

and for use with machine learning, all the while asking fewer questions than other questionnaires. This article explores how and why (short for “terfacesabilitynstrumen”) is a new questionnaire you can use to assess the usability of your user interface. It has been designed to be more diagnostic than existing usability instruments like, e.g.,

SUS
ORG

and for use with machine learning, all the while asking fewer questions than other questionnaires. This article explores how and why

Inuit
ORG

has been developed and why we can be sure that it actually measures usability, and reliably so. has been developed and why we can be sure that it actually measures usability, and reliably so.

A lot of contemporary usability evaluation relies on easily measurable and readily available metrics like conversion rates, task success rates, and time on task, even though it’s questionable how well these are suited for reliably capturing a concept as complex as usability in its entirety.

The same holds for user experience. When an instrument is used to measure usability, e.g., in controlled user studies or via live intercepts, it’s often the simple single ease question, which is generally not a bad choice, but has its limits.

Note: For more information on usability evaluation, you can check the article “

Current Practice in Measuring Usability: Challenges to Usability Studies and Research
WORK_OF_ART

” by

Kasper Hornbæk
PERSON

and “

Growth Marketing Considered
WORK_OF_ART

Harmful” by

Maximilian Speicher
PERSON

.

Ultimately, when you intend to precisely and reliably measure the usability of a digital product, there’s no way around a scientifically well-founded instrument or, in everyday terms, a “questionnaire.” The most famous one is probably

SUS
ORG

, the System Usability Scale, but there are also some others around.

Two
CARDINAL

examples are UMUX,

the Usability Measure for User Experience
ORG

, and

SUMI
ORG

,

the Software Usability Measurement Inventory
ORG

.

To join this party, in this article, we introduce

Inuit
NORP

(

the Interface Usability Instrument
ORG

), a new usability questionnaire. We will share how and why it was developed and how it’s different from the questionnaires mentioned above.

To immediately cut to the chase: With a scale from

1
CARDINAL

(“completely disagree”) to

5
CARDINAL

(“completely agree”),

Inuit
ORG

looks as follows. The parts in square brackets can be adapted to your specific interface, e.g., products in an online shop, articles on a news website, or results in a search engine.

Q1 I found [the information] I was looking for. Q2 I could easily understand [the provided information].

Q3
ORG

I was confused using [the interface]. Q4 I was distracted by elements of [the interface].

Q5 Typography
ORG

and layout added to readability. Q6 There was too much information presented in too little space. Q7 [My desired information] was easily reachable.

The Inuit metric (a score between

0 to 100
CARDINAL

, analogous to

SUS
ORG

) can then be calculated as follows:

(Q1 + Q2 +

Q5
ORG

+ Q7 – Q3 – Q4 – Q6 + 11) *

100⁄ 28
CARDINAL

Why

11
CARDINAL

and

28
CARDINAL

?

We have

seven
CARDINAL

items rated on a scale from

1 to 5
CARDINAL

, but for some (Q1, Q2,

Q5
ORG

,

Q7
ORG

),

5
CARDINAL

is the best rating, and for some (

Q3
ORG

,

Q4
DATE

,

Q6
ORG

),

1
CARDINAL

is the best rating. Hence, we need to subtract the latter from

6
CARDINAL

when we add up everything: Q1 + Q2 +

Q5
ORG

+ Q7 + (

6
CARDINAL

-Q3) + (

6-Q4
DATE

) + (6-Q6) = Q1 + Q2 +

Q5
ORG

+ Q7 – Q3 – Q4 – Q6 + 18. This gives us an overall score

between 7 and 35
CARDINAL

.

Now, we want to normalize this to a score between

0 and 100
CARDINAL

. For this, we

first
ORDINAL

subtract

7
CARDINAL

for a score between 0 and

28
CARDINAL

: Q1 + Q2 +

Q5
ORG

+ Q7 – Q3 – Q4 – Q6 + 18 – 7 = Q1 + Q2 +

Q5
ORG

+ Q7 – Q3 – Q4 – Q6 + 11. Finally, for a score between

0 and 100
CARDINAL

, we need to divide everything by

28
CARDINAL

and multiply by

100
CARDINAL

: (Q1 + Q2 +

Q5
ORG

+ Q7 – Q3 – Q4 – Q6 + 11) *

100/28
CARDINAL

.

You might have noticed that compared to, e.g.,

SUS
ORG

with

10
CARDINAL

,

Inuit
NORP

consists of

only 7
CARDINAL

questions. Apart from that, it has

two
CARDINAL

more advantages:


Inuit
ORG

has been designed to provide training data for machine-learning models that can then automatically predict usability from user interactions or web analytics data.

has been designed to provide that can then automatically predict usability from user interactions or web analytics data. Its items (i.e., the questions) are diagnostic, at least to a certain degree. This means you see what’s wrong with your interface simply by looking at the results from the questionnaire. Have a bad rating for readability (

Q5
ORG

)? You should make the text in your interface more readable.

Now, at this point, you can either accept all this and simply get going with

Inuit
NORP

to measure the usability of your digital product (we’d be delighted). Or, if you’re interested in the details, you’re very welcome to keep reading (we’d be even more delighted).

“So, Why Did You Develop Yet Another Usability Questionnaire?”

You probably already guessed that

Inuit
ORG

wasn’t developed just for fun or because there aren’t enough questionnaires around. But to answer this, we have to reach back a bit.

In

2014
DATE

,

Max
PERSON

was a

Ph.D.
WORK_OF_ART

student busy working on his dissertation. The goal of it all was to find a way to determine the usability of an interface automatically from users’ interactions, such as what they do with the mouse cursor and how they scroll, rather than making participants in a user study fill out pages and pages of questions. Additionally, the cherry on top should be to also automatically propose optimizations for the interface (e.g., if user interactions suggest the interface is not readable, make the text larger).

To be able to achieve this, however, it was

first
ORDINAL

necessary to determine if and how well certain interactions (mouse cursor movements, mouse cursor speed, scrolling behavior, and so on) predict the usability — or rather its individual aspects — of an interface. This meant collecting training data through users’ interactions with an interface and their usability assessments of that interface. Then, one could investigate how well (combinations of) tracked interactions predict (aspects of) usability using regression and/or machine-learning models. So far, so good, as far as the theory is concerned.

In practice,

one
CARDINAL

important decision that would have huge implications for the project was how to collect the usability assessments mentioned above when gathering the training data. Since usability is a latent variable, meaning it can’t be observed directly, a proper instrument (i.e., a questionnaire) is necessary to assess it. And the most famous one is undeniably

the System Usability Scale
LAW

(SUS). It should’ve been an obvious choice, shouldn’t it?

A closer look showed that, while

SUS
ORG

would be perfectly well suited to train statistical models to infer usability from interactions, it simply wasn’t the perfect fit. This was the case mainly for

two
CARDINAL

reasons:


First
ORDINAL

, many questions contained in

SUS
ORG

(“I think that I would like to use this system frequently,” “I found the various functions in this system were well integrated,” and “I felt very confident using the system,” among others) describe the effects of good or bad usability — users feel confident because the system is well usable and so on. But they don’t describe the aspects of usability that cause them, e.g., bad understandability. This makes it difficult to know what should be done to make it better. What exactly should we change to make users feel more confident? The questions are not diagnostic or “actionable” and require further qualitative research to uncover the causes of bad ratings. It’s the same for UMUX and

SUMI
ORG

.

Second
ORDINAL

, with just

10
CARDINAL

items,

SUS
ORG

is already a very small questionnaire. However, the fewer items, the less friction and the more motivated users are to actually answer. So, is

ten
CARDINAL

really the minimum, or would a proper questionnaire with fewer items be possible?

With these considerations in mind,

Max
PERSON

went on and ultimately developed

Inuit
ORG

, the instrument presented in the introduction. He ended up with

seven
CARDINAL

items that were better suited for the needs of his

Ph.D.
WORK_OF_ART

project and more actionable than those of

SUS
ORG

.

“How do you know this actually measures usability?”


Inuit
ORG

was developed in a

two
CARDINAL

-step process. The

first
ORDINAL

step was a review of established guidelines and checklists with

more than 250
CARDINAL

rules for good usability, which were filtered based on the requirements above and resulted in a

first
ORDINAL

draft for the new usability instrument. This draft was then discussed and refined in expert interviews with

nine
CARDINAL

usability professionals.

The final draft of

Inuit
ORG

, with the

seven
CARDINAL

factors informativeness (Q1), understandability (Q2), confusion (

Q3
ORG

), distraction (

Q4
DATE

), readability (

Q5
ORG

), information density (

Q6
PRODUCT

), and reachability (

Q7
ORG

), was evaluated using a confirmatory factor analysis (

CFA
ORG

).


CFA
ORG

is a method for assessing construct validity, which means it “is used to test whether measures of a construct are consistent with a researcher’s understanding of the nature of that construct” or “to test whether the data fit a hypothesized measurement model.”

— Wikipedia

Put very simply, by using a

CFA
ORG

, we can check how well a theory matches the practice. In our case, the “construct” or “hypothesized measurement model” (theory) was

Inuit
ORG

, and the data (practice) came from a user study with

81
CARDINAL

participants in which

four
CARDINAL

news websites were evaluated using an

Inuit
NORP

questionnaire.

In a

CFA
ORG

, there are various metrics that show how well a construct fits the data.

Two
CARDINAL

well-established ones are CFI, the comparative fit index, and RMSEA, the root mean square error of approximation — both range from

0 to 1
CARDINAL

.

For CFI,

0.95
CARDINAL

or higher is “accepted as an indicator of good fit” (

Wikipedia
ORG

).

Inuit
ORG

’s value was

0.971
CARDINAL

. For RMSEA, “values

less than 0.05
CARDINAL

are good, values

between 0.05
CARDINAL

and

0.08
CARDINAL

are acceptable” (

Kim et al.
PERSON

).

Inuit
ORG

’s value was

0.063
CARDINAL

. This means our theory matches the practice, or

Inuit
ORG

’s questions do indeed measure usability.

Case Study

#1
CARDINAL


Inuit
ORG

was

first
ORDINAL

put into practice in

2014
DATE

at

Unister GmbH
ORG

, which at that time ran travel search engines like fluege.de and reisen.de, and was developing an entirely new semantic search engine. The results page of this search engine, named BlueKiwi, was evaluated in a user study with

81
CARDINAL

participants using

Inuit
ORG

.

In this

first
ORDINAL

study, the overall score averaged across all participants was

59.9
CARDINAL

. Ratings were especially bad for informativeness (Q1), information density (

Q6
PRODUCT

), and reachability (

Q7
ORG

). Based on these results, BlueKiwi’s search results page was redesigned.

Among other things, the number of advertisements was reduced (better reachability), search results were displayed more concisely (better informativeness), and everything was more clearly aligned and separated (better information density). See the figure below for the full list of changes.

Adjustments made to the search results page based on the

Inuit
NORP

results. (Image source: www.researchgate.net) (Large preview)

After the redesign, we ran another study, in which the overall

Inuit
ORG

score increased to

67.5
CARDINAL

(

+11%
PERCENT

), with improvements in every single one of the

seven
CARDINAL

items.



Why Wait 9 Years To Write This Article
WORK_OF_ART

?”

There were various factors at play.

One
CARDINAL

was what’s called the research–practice gap. It’s often difficult for academic work to gain traction outside the academic community.

One
CARDINAL

reason for this is that work that is part of a

Ph.D.
WORK_OF_ART

project is often a little neglected after it has served its purpose — being published in a research paper, included in a thesis, and presented at a

Ph.D.
WORK_OF_ART

defense — which is pretty much exactly what happened to

Inuit
NORP

.

Case Study

#2
CARDINAL

Another factor, however, was that we wanted to put the instrument into practice in a real-world industry setting over a longer period of time

first
ORDINAL

, and we got the chance to do that only relatively recently.

We ran a longitudinal study over a period of almost two years in which we ran

quarterly
DATE

benchmarks of multiple e-commerce websites using both

SUS
ORG

and

Inuit
ORG

, with a total of

6,368
CARDINAL

users. The results of these benchmarks were included in the dashboard of product KPIs and regularly shared with the team of

6
CARDINAL

product managers. After

roughly two years
DATE

of conducting and sharing benchmarks, we interviewed the product managers about their use of the data, challenges, wishes, and potential for improvement.

What a high-level analysis showed was that all of the product managers, in

one
CARDINAL

way or another, described

Inuit
ORG

as more intuitive to understand, less abstract, and more actionable compared to

SUS
ORG

when looking at both instruments as a whole.

They found most of

Inuit
ORG

’s items more specific and easier to interpret and, therefore, more relevant from a product manager’s perspective.

SUS
ORG

, in contrast, was described as, e.g., “good for [the] overall score” and the bird’s eye view. Virtually all product managers, however, wished for even more specific insights into where exactly on the website usability problems occur.

One
CARDINAL

suggested building an optimal instrument by combining certain items from both

SUS
ORG

and

Inuit
NORP

.

As part of the analysis, we computed

Cronbach
PERSON

’s α for

Inuit
ORG

(based on

3190
CARDINAL

answers) as well as

SUS
ORG

(based on

3178
CARDINAL

answers).


Cronbach
PERSON

’s α is a statistical measure for the internal consistency of an instrument, which can be interpreted as “the extent to which all of the items of a test measure the same latent variable [i.e., usability].”

— Wikipedia

Values of

0.7
CARDINAL

or above are generally deemed acceptable.

Inuit
ORG

reached a value of

0.7
CARDINAL

;

SUS
ORG

a value of

0.8
CARDINAL

.

To top things off,

Inuit
NORP

and

SUS
ORG

showed a considerable (

Pearson
PRODUCT

’s r =

0.53
CARDINAL

) and highly significant (p < 0.001) correlation when looking at overall scores aggregated over the different e-commerce websites and tasks the study participants had to complete.

In layman’s terms, When the

SUS
ORG

score goes up, the

Inuit
NORP

score goes up; when the

SUS
ORG

score goes down, the

Inuit
NORP

score goes down. Both questionnaires measure the same thing (with a very, very rough approximation of

INUIT
ORG

=

0.6 × SUS + 17
QUANTITY

).

Since these

first
ORDINAL

results were so encouraging, we decided to write this general, more practice-oriented overview article about

Inuit
ORG

now. A deeper analysis of our big dataset, however, is yet to be conducted, and our current plan is to report findings in much more detail separately.

“Why Do You Think Inuit Is Better Than SUS?”

We don’t think so (or that it’s better than any scientifically founded usability instrument, for that matter). There are many ways to measure the same latent variable, in this case, usability. Both questionnaires,

SUS
ORG

and

Inuit
ORG

, have proven that they can measure the usability of an interface. Still, they were developed in different contexts and with different goals and requirements in mind.

So, to address the question of when it’s better to use which, as true researchers, we have to say “it depends” (annoying, isn’t it?).


SUS
ORG

, which has been around since

the 1990s
DATE

, is probably the most popular and well-established usability instrument. It’s been studied and validated over and over, which

Inuit
ORG

, of course, can’t compete with yet and still has a long way to go. If the goal is to compare scores at a high level and even tap into public benchmark numbers for orientation,

SUS
ORG

would be preferable.

However, by design,

Inuit
ORG

has

two
CARDINAL

advantages over

SUS
ORG

:


Inuit
ORG

has

only seven
CARDINAL

items and is still a “complete” usability instrument.


30%
PERCENT

fewer questions can be a major factor when it comes to motivating users to fill out a questionnaire. Assuming that a big part of remote online studies is done quickly in passing and with short attention spans, designing efficient studies that generate reliable output and minimize effects like participant fatigue can be a major challenge for researchers.

Inuit
ORG

’s items have been specifically designed to be more actionable for practitioners and lend themselves better to manual analysis and inferring potential interface optimizations.

As we’ve learned in our

second
ORDINAL

case study, talking to actual product managers revealed that for them, the results of a usability assessment should always be as specific as possible. Comparing the items of both,

Inuit
NORP

points to more concrete areas to improve than

SUS
ORG

, which was perceived as rather vague.



Where Can I Use Inuit
WORK_OF_ART

?”

Generally, in any scenario that involves an interface and a task — either defined by you or the user themselves. In the studies mentioned and described above, we could demonstrate that

Inuit
ORG

works well in controlled as well as natural-use settings and with news websites, search engines, and e-commerce shops.

Now, of course, we can’t evaluate

Inuit
ORG

with any possible kind of interface, and that is part of the reason for this article.

Inuit
ORG

has been around and publicly available since

2014
DATE

, and we have no idea if and how it has been used by other researchers, but if you do, please let us know about it. We’d be thrilled to hear about your experience and results.

The questions presented at the beginning of the article are relatively focused on finding information because that’s where

Inuit
ORG

is historically coming from and because most of the things users do involve the finding of information of some kind. (Please keep in mind that information doesn’t have to be text. On the contrary, most information is non-textual.) But those questions can be adapted as long as they still reflect the underlying aspects of usability, which are informativeness, understandability, confusion, distraction, readability, information density, and reachability.

Say, for instance, you want to evaluate a module from an e-learning course, e.g., in the form of an annotated video with a subsequent quiz. To accommodate the task at hand,

Q1
DATE

could be rephrased to “I had all the information necessary to complete the module” and

Q7
ORG

to “All the information necessary to complete the module was easily reachable.”

Conclusion

There are plenty of usability questionnaires out there, and we have added a new one to the pool —

Inuit
NORP

. Why? Because sometimes, you find yourself in a situation where none of the existing questionnaires is the perfect fit.

Inuit
ORG

has been designed to be more diagnostic than existing usability instruments like, e.g.,

SUS
ORG

and for use with machine learning, all the while asking fewer questions than other questionnaires. So, if any of this seems relevant to your use cases or context of work, why not give it a try?

From a scientific and statistical point of view, in a confirmatory factor analysis (

CFA
ORG

),

Inuit
ORG

has demonstrated that its questions do indeed measure usability. On top of that, it’s consistent and correlates well with

SUS
ORG

, based on data from a large-scale, longitudinal user study.

Note: If you want to dive deeper into the science behind

Inuit
ORG

, e.g., how exactly the items/questions were chosen, you can read the corresponding research paper “

Inuit: The Interface Usability Instrument
WORK_OF_ART

,” which was presented at the

2015
DATE

HCI International Conference. If you want to learn more about how

Inuit
NORP

can be used to train machine-learning models, read “

Ensuring Web Interface Quality through Usability-Based Split Testing
WORK_OF_ART

.” And finally if you want to see how

Inuit
NORP

can be used as the basis for a tool that automatically proposes optimizations for an interface, you can refer to “S.O.S.: Does Your Search Engine Results Page (SERP) Need Help?” which was presented at the

2015
DATE

ACM Conference on Human Factors in

Computing Systems
ORG

.

References

Originally published by

Copyright ©
LAW


2023
DATE

by

Maximilian Speicher &
ORG


Johanna Jagow
PERSON

.Originally published by

Smashing Magazine
ORG

(yk)