Measuring the Quality of Interaction

TI RBI

A. Measuring the Quality of Interaction

Thus, you surely have to ask what a subject anticipates from an application. E.g. should it be interesting, helpful, entertaining, sympathetic, powerful, secure, foolproof, instructive, open source, have a reliable future, etc.

potential procedure:

1. show all the functionality without doing them. (Or omit this step)

2. Let the subjects describe what they could do, they should make a proposal what they anticipate of such a system.

3. Let the subjects act.

4. Ask the subjects how their plan correspond to the realisation.

The quality then is not determined by the accordance between 2. and 4., but somehow different, more in relation to the above questions. Is there potential for getting new ideas?

Does the subject learn something? Did he know something from somewhere else? Can the application be used somewhere else? Is it fun to be irritated? Is it awkward to have fun?

Is it patronising? Is it fun to entrap the system or does that break something? Or, more general, is the misuse of functionality interesting?

Participant 5

Nun wollt ihr wissen ob und wie Nutzer mit dem Ding klarkommen,

If you have “real” users, assemble a task and let them (unexperienced users) solve it.

I.e.,sticking to the example of the photoshop plugin:

• open an image

• add a layer

• apply plugin xyz on it (I don’t have a clue of graphic software) Then, ask questions like

• How easy were the single steps for you and why?

• Was it difficult to get the meaning of the adjustments out of the GUI?

• etc.

Different Approach: Instead of evaluating the questions afterwards, you could sit next to the user and ask him to think aloud, i.e. as soon as something is unclear or they find something good/bad they should tell it, but you do not respond, but only note it down.

sozusagen wie gut die Art der Interaktion funktioniert.

With the right questions (see above) this is also possible.

Participant 6

I would test in two different ways. First, I would vary the interaction structure. Second, I would vary the tasks to be solved. Ask people which interaction they prefer according to the single tasks.

But first of all, just look at people interacting with the system and learn something in the sense of grounded theory.

A.1. Replies

Participant 7

Give a group of participants a defined set of tasks and look how they perform. At best, one has two GUI or such, which allow to compare the performance and the questionnaire’s analysis. Otherwise, one has to get a way on how one evaluates the approaches of the participants, and if one can see parts were it crashes more than needed.

Participant 8

I would give the user a Task, e.g. “Create a new picture with size 800*600px”. Then measure the number of clicks and the needed time. As an addition, the user may give a rating from 1-6 regarding his opinion concerning the menu navigation.

Participant 9

Generally, I see it like Joel Spolsky in “User Interface Design for Programmers”:

The next step is to test your theories. Build a model or prototype of your user interface and give some people tasks to accomplish. The model can be extremely simple: sometimes it’s enough to draw a sloppy picture of the user interface on a piece of paper and walk around the office asking people how they would accomplish x with the “program” you drew.

As they work through the tasks, ask them what they think is happening. Your goal is to figure out what they expect. If the task is to “insert a picture,” and you see that they are trying to drag the picture into your program, you’ll realize that you had better support drag and drop. If they go to the Insert menu, you’ll realize that you had better have a Picture choice in the Insert menu. If they go to the Font toolbar and replace the word “Times New Roman” with the words “Insert Picture”, you’ve found one of those old relics who hasn’t been introduced to GUIs yet and is expecting a command-line interface. How many users do you need to test your interface on? The scientific approach seems like it would be “the more, the better.” If testing on five users is good, testing on twenty users is better! But that approach is flat-out wrong. Almost everybody who does usability testing for a living agrees that five or six users is all you need. After that, you start seeing the same results again and again, and any additional users are just a waste of time. The reason being that you don’t particularly care about the exact numerical statistics of failure. You simply want to discover what “most people” think. You don’t need a formal usability lab, and you don’t really need to bring in users “off the street”–you can do “fifty-cent usability tests” where you simply grab the next person you see and ask them to try a quick usability test. Make sure you don’t spill the beans and tell them how to do things. Ask them to think out loud and interview them using open questions to try to discover their mental model.

If you like it a bit more formal, I can think of the following methods with users: Benchmark-Tests: user gets a task and is evaluated for efficiency/performance and asked for comfort/-transparency. There are different phases, where investigators may intervene–this needs some time to prepare and is rather formal; in addition, user tend to feel like a guinea pig.

Thinking Aloud: user says what he currently wants to do and what he thinks of when he sees the GUI: fast and cheap. The output is excellent. Highly depends on the users. Some talk too much, others do not say a word without an actual request.

A. Measuring the Quality of Interaction

Constructive Interaction: Like think aloud, but with two participants. Also very interesting, but leads to unusable findings, if there is a bad user pair (e.g. one participant gets intimidated by the other, or one of them always asks the investigator).

Even without users, it is possible to test GUI’s with certain heuristics (and/or with use-cases).

The Information Center for Social Science has an interesting report on evaluating software:

Marcus Hegner, Methoden zur Evaluation von Software, Mai 2003, IZ-Arbeitsbericht Nr.

Participant 10

1. user experience: Let people play with it and ask them. let them rate criteria on a 1-10 scale:

2. how convenient to use?

3. how fast can you handle things with it?

4. is it fun to use?

5. possibly ask for an open comparison or the-like?

6. Comparison with alternative approaches: assign a task to do with own software (A) or with an alternative one (B) and measure time.

7. then, ask again how convenient/efficient the users found the work with A resp. B.

8. which measured criteria correlate how well with the answers? (is ‘experienced’ effi-ciency really fast, etc.)

Participant 11

There are various Factors:

Learnability How fast learns the user the interaction, how does the reaction time changes over time of usage, in which frequency errors occur (wrong usage, difference between intended and obtained usage) (-> error-rate, reaction rate)

Cognitive Load how much needs the interaction cognitive control (to be measured by perfor-mance bumps in parallel secondary tasks). Related with the potential to be automated.

Memory Intensity how much memory is used to perform an action (RAM). E.g. Emacs needs the learning of many shortcuts until one can work with it in a suggestive way.

The more Memory is consumed, the slower is the learning curve, perhaps at better performance. . .

Flow-Experience Interactions that cause a flow of action such that the agent is dissolving in the process, to be captured by flow.questionaires after Reichberg, at spontaneous interruptions of interactants

Stress test the more exhausting an interaction, the more will the interaction cause stress, it can be captured by physical measurements, from cutaneous electrical resistance to pulse rate to, etc. or by the impact of psychic distractors.

A.2. Generated Categories

Tiredness like above, another aspect of pressure by an interaction

Fun/Motivation motivating interactions should find a subjectively more positive rating of the users, to be measured possibly by mapping associations (e.g. inkblots as good/bad animals)

Latency/Dead-times (connected with Flow) If there are Dead-times in Interaction processes (e.g. pauses between action and reaction) these may break the flow subjectively, this will affect flow and motivation (estimation: the more latency, the unpleasant the quality)

Modality allocations / Naturalness Interactions in real contexts always address a mixture of various modalities in a harmonic balance–a from “natural examples” strongly diverging concentration on only one modality could argue in favour of a bad interaction quality.

[. . . ]

Most of these aspects can be operationalised into comparison experiments between two inter-actions. For an absolute interaction quality scale I cannot see an all-too easy definition.

Participant 12

Qualitative (Though, I haven’t got a clue of it) Or something like

1. reaction test

2. let them work 8 hours with the tool 3. reaction test

Thesis: the higher the cognitive load, the more tired the subjects.

A.2. Generated Categories

scenario>workshop scenario>task method>qualitative

method>ethnomethodology

method>ethnomethodologic evaluation method>grounded theory

method>User experience method>let describe method>let act method>Play and Ask

method>ask for correspondence between action and imagination method>ask for anticipations regarding the application method>quantitative

method>thinking aloud

method>constructive interaction

method>test for cognitive load with a stress test

Im Dokument Tangible auditory interfaces : combining auditory displays and tangible interfaces (Seite 195-200)