• Keine Ergebnisse gefunden

The Home-Tour: Jointly Building Up a Model of the Environment

WITAS UAV

6.1 The Home-Tour: Jointly Building Up a Model of the Environment

6 Preliminary Scenarios

This chapter describes the preliminary scenarios that have not yet been implemented based on suggested approach, but with the previously used dialog system. However, they provided plenty of use cases that assisted in gaining knowledge of the robotics domain, and of its specific challenges. Of course, the scenarios described in this chapter, and their underlying concepts, were not realized by myself alone, but in collaboration with colleagues: The Home-Tour scenario was developed in collaboration with Marc Hanheide, Frederic Siepmann, Elin Topp and Torsten Spexard, and the Curious Robot and the CeBit scenario and was developed in collaboration with Christof Elbrechter, Robert Haschke, Ingo Lütkebohle and Lars Schillingmann.

Although the scenarios and the hardware platforms they are running on are very different, three overarching themes can be identified: First, both scenarios deal with learning through interaction. Second, the scenarios have in common that they rely on a mixed-initiative dialog strategy. In particular, two facets of the robot’s task initiative have been explored:

how it facilitates learning, and how it facilitates the interaction as such. Third, from a technical point of view, they all rely on the Task State Protocol for communication between the dialog system and the back-end.

Figure 6.1: A scene from the Home-Tour scenario: The robot is taking the initiative in order to verify its (incorrect) hypothesis which leads the human to correct it. This comic was created by Florian Lier.

a lot on the human’s initiative in a rather command-style fashion [LKF+04, LHW+05]. In particular, the human needed to show the robot around, explicitly demonstrating rooms and objects one after another. Thus, the human fully controls what the robot learns. As the scenario targets unexperienced users that only have very limited knowledge about the system and its internal model, the model they intuitively teach by their own initiative may be sparse and possibly erroneous.

Consequently, the main idea of the new iteration of the Home-Tour is to facilitate learning by mixed-initiative interaction. In particular, it focuses on the question of how the learning process can be improved by equipping the robot with abilities that enable active learning, so that the robot is capable of actively provoking a situation that provides new information instead of passively waiting until such a situation presents itself. This obviously has the potential to speed up and optimize the learning process.

Accordingly, on the one hand, the user can still take the active role and teach the robot rooms explicitly or check the learning process by asking monitoring questions. However, the more crucial aspect is that the robot is able to take initiative on its own and to advance the learning process by verifying existing information resolving uncertainty. This makes learning an interactive process, in the course of which information can continuously be refined. A typical scene from the Home-Tour in which an incorrect hypothesis is corrected is shown in figure 6.1. Table 6.1 lists example dialogs for each of the mentioned use cases.

6.1 The Home-Tour: Jointly Building Up a Model of the Environment 105

Initiative Situation Example dialog Human Room teaching H: This is the kitchen!

R: Kitchen. I will have a look at it.

R: (looking) R: OK!

Human Test questions H: Where are we?

R: We are in the kitchen.

Robot Verifying hypothesis R: We just entered the hallway, right?

a) Human agrees H: Yes.

R: OK!

b) Human corrects

H: No, this is the living room.

R: Living room. OK!

c) Human disagrees without providing correction H: No!

R: What room is it?

H: The living room R: Living room. OK.

Robot Resolving uncertainty R: Are we still in the living room?

a) Human agrees H: Yes.

R. OK.

b) Human corrects H: No, this is the kitchen.

R: Kitchen. OK!

c) Human disagrees without providing correction H: No!

R: What room is it?

H: The kitchen.

R: Kitchen. OK.

Table 6.1: Example dialogs in the Home-Tour.

6.1.2 System Overview

The platform used in this scenario is the mobile robot BIRON (Bielefeld Robot Companion) shown in figure 6.2. BIRON is equipped with several sensors that allow perception of the current situation as a basis for interaction. In particular, it uses a pan-tilt camera for perception of the human interaction partner, stereo microphones and speakers for speech interaction, and a laser scanner that provides data for computing a representation of its environment.

The two major software components of the system are the dialog system – the grounding-based Sunshine Dialog described in section 5.1 – and a component for Human Augmented

Pan-Tilt Camera (Sony EVI) Touchscreen Stereo Microphones Gesture Camera Speaker

Laser-Scanner Two Laptops mounted backside 2.0 Ghz Core2duo

Figure 6.2: The BIRON platform Figure 6.3: Spurious detec-tions in the Home-Tour scenario (from [Top08]).

Mapping (HAM) which has been developed by Elin Topp [Top08] at KTH and integrated with the BIRON system in the EU project Cogniron. The HAM component maintains a representation of the robot’s spatial environment that integrates a robotic map with human concepts that are communicated via the dialog system, such as a room1. The representations are calculated based on laser range data gathered during a 360° exploration turn. The HAM system continuously publishes hypotheses about the room the robot currently is in. The dialog system reacts to the hypotheses, resulting in the robot taking initiative. In case of human initiative, it is the dialog system that creates a room hypothesis, and the HAM system reacts to that either by creating a new concept (which involves an exploration turn), or by correcting an existing concept.

The coordination between the dialog system and the HAM component relies on a first version of the Task State Protocol, with the admissible state sequence initiated, ac-cepted/rejected,completed/failed. Its initial purpose was to give the user feedback before the robot starts its slow-going exploration turn (“I will have a look at it”), triggered by the accepted state, because it has been observed in previous user studies that some users were irritated by the robot turning away. The usage of the task states, however, differ in two major aspects from their later usages.

First, tasks were often accepted and completed by different components. For example, when the human demonstrates a room, new room learning task is initiated by the dialog

1 Technically, the HAM system distinguishes betweenregions andlocations. Regions can be roughly defined as rooms, whereas locations can be defined as larger objects that are contained in the regions.

The described integration of dialog system and HAM applies only to region learning.

6.1 The Home-Tour: Jointly Building Up a Model of the Environment 107

system based on the human’s utterance. The task is accepted by the arbitration component now, indicating that it allows the HAM system to take control of the robot’s driving motors. This triggers HAM to start the exploration by turning the robot around. At the same time, the dialog system is triggered to give verbal feedback about the processing state (“I will have a look at it.”). As soon as the exploration is finished, it is the HAM system that sets the state completed. In consequence, the hardware arbitration is reset, the robot memorizes the new representation and the dialog system gives another verbal confirmation.

In contrast, the later usage of the Task State Protocol, in particular Lütkebohle’s toolkit implementation [LPP+11], promoted a rather client-server based view, where a task is initiated by the client, can be observed by an arbitrary number of components which may react to state changes, but is executed by only one server. This is because knowing the components that are entitled to modify a specific task facilitates detection and recovery of race conditions. Considering this convention, the triadic interaction between HAM, dialog system and arbitration would rather be split up in several subtasks.

The second difference pertains to the task state transitions. The HAM component contin-uously updates the room hypothesis, which is tracked by the dialog system. If the current hypothesis changes, or becomes uncertain, the dialog system addresses the human for clarification. This requires the dialog system to maintain a history of hypotheses, and to use it as a basis for decision making. With the explicitupdate and update accepted/failed transitions, which were added later, the dialog system would be able to register directly on hypothesis changes, without maintaining their history.

6.1.3 Evaluation: Analyzing a Test Run

The system has not been evaluated in a real user study, but a test run has been analyzed as a proof of concept of the integration approach and the mixed-initiative interaction strategy. The evaluation focuses primarily on the performance of the HAM component.

The data analysis has been carried out and published by Elin Topp [Top08], but it also allows to draw conclusions regarding the interaction strategy.

The test run was conducted in an office environment at Bielefeld University that can be compared to a part of an apartment, consisting of two adjacent rooms and a hallway. The rooms were labeled as “living room”, “kitchen” and “hallway”, respectively. The kitchen can be reached both from the hallway and the living room. This allows to do laps through the rooms, so that the system behavior can be investigated also when a known room is re-entered. Figure 6.3 shows the layout of the environment in which the test run was conducted.

The test run starts in the living room (the lower one in figure 6.3), then human and robot pass from the kitchen to the hallway, enter the living room again, and finally go from the living room to the kitchen a second time, where the example tour is finished. The human starts interaction by labeling the living room. Subsequently, when leaving the living room, the robot takes over initiative and requests the new room label (“We just left the living

room, right?”). The remaining interaction is driven mainly by the robot’s clarification questions.

Overall the robot asked 12 times for a confirmation of a hypothesized transition. The respective locations are marked gray in figure 6.3. Five of these actually occurred in a situation where human and robot just left the room (4, 5, 7, 9, 12). Four more can be explained by human and robot being close to door passages (1, 2, 6, 11). The locations of the remaining three clarifications questions (3, 8, 10) are not plausible and due to a spurious hypothesis. In case of question 3 and 10, the robot’s hypothesis has become uncertain and needs to be confirmed by the human (e.g. “Are we still in the kitchen?”), but in case of question 8, the robot has an incorrect hypothesis and assumes that they had left the living room (“We just left the living room, right?”).

In general, this test run shows the systems’s ability to build up a model of the environment through interaction with the human, and to continuously update and correct its internal representation. More specifically, from an interaction point of view, it demonstrates that a more correct model can be acquired when the robot attempts to fill information gaps on its own initiative. In particular, the robot’s clarification requests at locations 3, 8 and 10 show that its internal model had become unstable and even incorrect and can actually be improved through the robot’s capabilities to take initiative and to actively contribute to the learning process.