• Keine Ergebnisse gefunden

Advanced HRI Based on the Task State Protocol

WITAS UAV

3.4 Advanced HRI Based on the Task State Protocol

Text-To-Speech Robot Initiative Speech

Recognition

Arm Control Dialog

[human initiative]

[robot initiative]

alt

InteractionGoal="Grasp"

2:

receive (Grasp the apple) 1:

Grasp accepted 4:

Grasp completed 6:

Grasp initiated 3:

Say

(I begin to grasp the apple) 5:

Say (Alright) 7:

Figure 3.9: Mixed task initiative: Grasping can be initiated either on the human’s or on the robot’s initiative.

3.4.1 Use Case 1: Realizing Mixed Task Initiative

The first example illustrates a grasping action that can be proposed either by the human or by the robot. The diagram shown in figure 3.9 distinguishes between these two alternatives.

In the first case, the grasp command is given by the human, whereas in the second case, the robot is grasping on its own initiative. Having received the command, either by the speech recognition in case of human initiative (1) or by an initiative planning component (2), the dialog initiates the appropriate task (3). The arm control accepts (4) and completes (6) it, which is verbalized by the dialog system (5 and 7).

The example serves to illustrate two aspects. First, the motor control server is agnostic about how the task came about, i.e. whether it was initiated on the human’s or on the robot’s initiative. The further process flow is the same. Second, the decision about the robot’s actions (in this case, its grasping action) are not taken by the dialog system, as it is the case with dialog systems for traditional domains (cf. the discussion in chapter 4).

Instead, the proposals for actions come from the back-end. Thus, robot task initiative can be realized as a reaction to real-world events (e.g. that the apple was newly detected in the scene), rather than following a prestructured dialog flow.

The diagram slightly simplifies component interaction. Unlike as depicted, the current interaction goal Graspwas modeled not as a simple event message, but rather as a task request which is either accepted or rejected by the dialog system, depending on whether the current dialog situation allows for interjection from the robot or not.

3.4 Advanced HRI Based on the Task State Protocol 43

Text-To-Speech Speech

Recognition

Arm Control Dialog

Receive (Grasp the apple) 1:

Receive

(Not apple, but lemon) 5:

Receive (Stop) 9:

Grasp accepted 3:

Grasp update_rejected, target="apple"

7:

Grasp cancel_accepted 11:

Grasp initiated, target="apple"

2:

Say

(I begin to grasp the apple) 4:

Grasp update, target="lemon"

6:

Say ( I can not change the target any more ) 8:

Grasp cancel 10:

Say ( OK, I stop ) 12:

Figure 3.10: Integration of action execution and interaction: Grasping with user correc-tions.

3.4.2 Use Case 2: Integrating Action Execution and Interaction

The second example addresses the interaction between the dialog system and the motor control. Figure 3.10 describes an interaction sequence in which the human gives the order to grasp the apple (1), whereupon the dialog initiates the appropriate task (2). The arm control accepts (3) and begins execution, which is announced by the dialog system (4). During grasping, the human attempts to correct the target of grasping (5). The dialog system forwards the requested correction to the arm control by modifying the task specification and setting the task state toupdate (6). However, the arm control is not capable of updating the operation and rejects the update (7), causing a notification generated by the dialog system (8). The human then gives the order to stop execution (9).

Again, the dialog system forwards the request to the arm control via the cancel state (10).

The arm control accepts canceling (11), and the dialog system generates the appropriate verbalization (12).

This example emphasizes the tight integration of interaction and action execution. With theupdate andcancel events, the Task State Protocol provides mechanisms to repeatedly modify ongoing tasks. Conversely, the resulting state updates (such ascancel accepted)

Text-To-Speech Environment Representation Arm Control

Speech recognition

Dialog

ObjectInfo completed 8:

ObjectInfo accepted 7:

Grasp accepted 3:

Grasp completed 10:

Receive (Grasp the apple) 1:

Receive (What objects do you see?) 5:

Grasp initiated 2:

Say

(I begin to grasp the apple) 4:

ObjectInfo initiated 6:

Say

(One apple and two bananas) 9:

Say

(Alright, I finished grasping) 11:

Figure 3.11: Multitasking: User requests information during grasping.

cause event notifications that enable the dialog system to generate feedback on the internal system state. This includes also the case where state updates can not be realized, which is indicated by event notifications such as update rejected.

3.4.3 Use Case 3: Multitasking

In the third example, two tasks are executed in parallel. During an ongoing grasping action (1-4), the human requests information about the objects in the scene (5). This causes a task for the environment representation component to be initiated (5). The environment representation accepts the task (6), adds the requested information to the task specification and completes it (7). The information is verbalized by the dialog system (8). Next, the grasp task is completed and acknowledged as well (9, 10).

The example demonstrates how asynchronous event delivery enables multitasking. Having initiated the task for grasping, the dialog system remains responsive to new commands.

New tasks can be initiated, even though completion of running tasks is still pending. From an interaction point of view, this enables to manage multiple open topics at a time within

3.4 Advanced HRI Based on the Task State Protocol 45

Text-To-Speech Robot Initiative Object

Recognition Speech

Recognition

Dialog

LabelQuery initiated 1:

LearnObject accepted 12:

LearnObject completed 14:

Receive (This is a melon) 4:

Receive (No, it´'s a lemon) 7:

Receive (Yes, that´'s correct) 10:

LabelQuery accepted 2:

Say (What is that?) 3:

Say

(A melon, is that correct?) 6:

LabelQuery intermediate_result, label="melon"

5:

LabelQuery intermediate_result, label="lemon"

8:

Say

(A lemon, is that correct?) 9:

LearnObject initiated 11:

Say

(I am going to learn the lemon) 13:

Say

(I have learned the lemon) 15:

LabelQuery completed 16:

Figure 3.12: Interactive learning. Label learning with user corrections.

a conversation, and to switch between the topics.

3.4.4 Use case 4: Enabling Interactive Learning

The final example shows how the Task State Protocol can be applied to realized interactive learning. More specifically, the robot asks the human for an object label that is unknown to it. First, an initiative planning component requests a LabelQuery task from the dialog system (1). The dialog system accepts the task (2) and, accordingly, verbalizes the question about the object label (3). As the human answers the question (4), the dialog system already publishes the received information (5) before asking the human for final confirmation (6), which gives the human the opportunity to correct the misunderstood label (7). Again, the dialog system already publishes the label (8) before asking for

confirmation (9). This time, the human confirms the label (10), whereupon the dialog system initiate a LearnObject task for the object recognizer (11). The LearnObject task is accepted (12) and completed (14). Both state updates are verbalized by the dialog system (13, 15). On notification that the LearnObject task has been completed, the dialog system completes also the LabelQuery task.

Based on this example, several aspects can be discussed. First, the Task State Protocol allows the dialog system to gather information from the human and to transfer it to the responsible system components. With the corrected state, the information can be revised and corrected repeatedly. Moreover, with the intermediate result transition the information may be submitted incrementally, or preliminary information may be already published although it is still under negotiation. Second, as in the previous example, two interleaving tasks (LabelQuery and LearnObject) are executed. However, unlike before, the tasks are logically related: the dialog system can not complete the LabelQuery task until the LearnObject task has been completed. Third, note that in the present example, the dialog system acts both as task server (for the LabelQuery task) and as task client (for the LearnObject task) at the same time, which assigns it a coordinating role within the overall system.

In the Curious Robot system, there was no component that actually did react to the unconfirmed label. However, one could easily think of cases in which preliminary publication of unconfirmed information would be useful. For example, slow-going actions such as manipulation or navigation tasks could already be prepared even though the precise goal specification is still preliminary.

The previous chapter has taken an external view on the proposed approach to dialog modeling. In contrast, this chapter will provide insights into its internal functioning.

Section 3.5 establishes the link between the internal dialog model and the process of dialog design, and it introduces the twofold function of the proposed Interaction Patterns, which serve both as internal dialog model and as application programming interface (API) for dialog designers. Section 3.6 reviews foundational work that has influenced the concept of Interaction Patterns, both from the field of descriptive dialog modeling and from the field of dialog system API design. Section 3.7 defines the Interaction Patterns and their features. Their function as internal dialog model is discussed in section 3.8, and their function as dialog system API is discussed in section 3.9. Section 3.10 gives an overview of the existing Interaction Patterns and their development over time.