Applications Exploiting the Model’s Potential

Spatial and Temporal Integration of Informative Local Observations

3.3. Anchoring and Integrating Egocentric Models

3.3.4. Applications Exploiting the Model’s Potential

3.3. Anchoring and Integrating Egocentric Models

(a) Previous scene model

(b) Current viewpoint

mov-able,Object 4 is added to the scene while changing perspective.

was simply not visible from older views, but appeared in a subsequent view.

So this algorithm allows the system to gain a much more informative ASM without observing any change in the scene from the current viewpoint.

Figure 3.9 summarizes a few of the properties just described. In view-point A the static background for object 1 is known (solid black), so it is marked as movable (dashed) (3.9a). In viewpoint B before applying the merging algorithm everything is assumed to be background (solid black) (3.9b). This includes the new object 4 which was added in between observ-ing the scene from the two viewpoints. After mergobserv-ing (3.9c) object 1 can be marked as movable because the transformed model provides the required background data and all premises apply. However, although the background is known forobject 2 it cannot be marked as movable because the previous front is known to be static and the back is ignored because it does not fulfill the occlusion premise. Object 3 is assumed to be background as well be-cause of the premises and the missing background. Object 4 can be marked as movable although the correct background is not known, according to Premise 1 (Field of View). Instead of the correct background structure the border of the viewport belonging to viewpoint Ais marked as static.

elab-3. Partitioning the Workspace

orate event history management. Yet, it is possible to implement powerful applications that allow substantial statements about the scene.

Keyframes as Memory References

One of the most powerful tools for making statements about the scene changes is the comparison of snapshots of the scene models at different points in time. In order to detect changes that have a semantic relevance it is important to compare the correct pairs of situations. For example, if the system has the goal to find the car keys which have been last seen by the home owner the day before, it needs to compare memory references from the previous day with more recent data – either from the recent past or from the present through renewed inspection. For this it is crucial to store reasonable snapshots of the scene and scene model as persistent references.

This is realized through the concept of keyframes. A keyframe is de-scribed through the raw depth image snapshot at a specific point in time and the corresponding static background model. My suggestion is to keep keyframes at the beginning and end of an observation (here observation means the duration of recording a ASM at one specific location). A more sophisticated strategy would be to detect additional keyframes in the obser-vation marking a completed movement sequence in the scene. This could be easily implemented by using the optical flow on the movable objects layer of the ASM.

By saving these keyframes the system has a reference to the scene layout at this point in time. It can even apply a more recent version of the scene model to the raw depth image to find movable objects that were not detected at the time the keyframe was generated.

Certainly this strategy needs some kind of forgetting mechanism so that the number of saved keyframes and models does not outgrow over time.

For the scene models this can be done quite intuitively. When a new scene model is established from a very similar viewpoint as an older model, this older model can be forgotten, because its information should be merged into the new one. Models from a significantly different view on the same scene, however, should be maintained in order to facilitate occlusion issues in future analyses. Selecting candidates from the keyframes mentioned above for forgetting is not as intuitive. The goal is to only forget those keyframes that are not relevant for future reference. A naive approach to this problem

3.3. Anchoring and Integrating Egocentric Models

would be to apply an age filter, but the better solution would be to somehow extract the semantic relevance of these specific configurations through a high-level component.

Layered Action Models

When continuing the idea of taking snapshots for completed movements in order to reference reasonable states of the scene model, one may also establish a layered action model. This means that every detected keyframe at the end of a movement is interpreted as a completed action. The detection of these keyframes must be done by another high-level component on top of ASM (e.g. using optical flow). A completed action triggers the generation of a new action layer for representing the next manipulation of the scene.

This new layer takes the keyframe as a basis for creating a new scene model.

This way, the new action layer contains only changes that have been made since the last keyframe while the older layers still track the accumulated changes. Together with the persistence strategy from the previous section this can even be done retrospectively.

A layered action model like this allows to overcome the shortcomings of the raw ASM in terms of segmentation (see Sections 3.2.2, 3.2.3). As described above, the naive ASM algorithm cannot distinguish two movable objects that are located closely together. By using this layered action model a distinction of the objects would be possible as long as they were manipu-lated independently.

Further, the extracted actions can be used to analyze scene changes on a trajectory level. Since these actions should optimally represent the change to only one object, the start and end configuration of the now unique mov-able object allow a rough approximation of the trajectory without using a classic visual tracking algorithm. Among other applications, this is useful for learning trajectories of objects that have a defined but limited move-ment space like doors or drawers. This ability is used in the case example described in the section below.

Case Example: Movement Strategies for a Mobile Robot

The navigation system described in Meyer Zu Borgsen et al. (2014) utilizes the ASM component developed for this thesis. The goal of the system is

3. Partitioning the Workspace

to clear obstacles in navigation tasks through cooperation with a human.

For this it uses the functional roles provided by the ASM component. In a first phase the robot observes the human manipulating objects in the environment, specifically opening or closing doors. From the observations the robot infers the movement space of the doors.

(a) Movable object detected blocking

the path. (b) Object was moved out of the way.

Figure 3.10.: Example visualizations of the navigation system incorporat-ingASM for movement strategies on a mobile robot. The im-ages illustrate the perceivedpoint cloudsincluding the movable parts (blue), the planned path (green), and expected collisions (red). The lower right corners illustrate the robot’s position on the occupancy grid.

In a second phase the robot can utilize this information in a navigation scenario in which the planned path is blocked by one of the doors. Through the knowledge of their movable nature the robot triggers a behavior that tries to clear the path (Figure 3.10a). The robot asks a nearby person to open the door. Using the knowledge of the door’s trajectory it positions itself in a way that the door can be opened by the human. As soon as the analysis component utilizing ASM provides the information that the movable object has moved out of the way the navigational task is continued (Figure 3.10b).

Im Dokument The attentive robot companion: learning spatial information from observation and verbal interaction (Seite 77-81)