• Keine Ergebnisse gefunden

Spatial and Temporal Integration of Informative Local Observations

3.1. The Geometric Foundation

In order to find a valuable strategy for representing spatial knowledge in an artificial system it might be useful to understand the spatial representations in the human mind. Using this information as an inspiration, I will formulate a proposal for an enclosing spatial representation system and analyze ways for populating it with useful spatial knowledge.

Spatial Representation in Human Cognition

While allocentric representations play a major role in navigation and spatial memory of humans (Burgess et al., 2004), egocentric representations have a

28

3.1. The Geometric Foundation

special role in self-motion as well, as found by Hartley et al. (2004). They argue that allocentric representations in human spatial memory could not be built-up or acted upon without interaction with egocentric systems. Wang and Simons (1999) show that visual tasks are significantly impaired if self-motion is disturbed, e.g. by causing visual change through moving a human subject in a wheelchair. They suggest that

“the representation [. . . ] of viewpoint changes is not environ-ment-centered. The representation must be viewer-centered and the difference between observer and display movements results from a difference in the nature of the transformation. Appar-ently, view-dependent layout representations are transformed or updated using extra-retinal information to account for observer movements.”

The encoding of this spatial information in the human mind has been studied by Mou et al. (2006). They show that in persistent encoding both egocentric and allocentric representations play an important role. Allocen-tric representations relate objects to visual landmarks, the egocenAllocen-tric sub-system computes and represents self-to-object relations, which are also used for locomotion, especially when the allocentric information are inaccurate.

Other findings in human cognition research suggest that we as humans consider egocentric representations of obstacles in our immediate vicinity for interaction. Wang and Spelke (2000) argue that

“human navigation [. . . ] depends on the active transformation of a representation of the positions of targets relative to the self.”

They conducted several pointing experiments with human subjects, testing different conditions while the subjects were remaining oriented or were being disoriented. From these experiments they conclude that the distance and direction of target objects inintermediate-sized environments is represented in an egocentric way and is updated over locomotion. However, in addition there seems to be an enduring allocentric representation of environment geometry which is used for (re-)orientation.

Research on human spatial working memory also suggests the existence of allocentric and egocentric systems that collaborate in spatial tasks, re-gardless whether these tasks are based on visual or auditory perception

3. Partitioning the Workspace

(Stark, 1996; Roskos-Ewoldsen et al., 1998; Hartley et al., 2004; Lehnert and Zimmer, 2006). Further, it is common sense in brain research that sev-eral regions in the brain take on different representations for the execution of complex tasks. While the hippocampus provides an allocentric spatial map of the environment (O’Keefe and Dostrovsky, 1971; Squire, 1992), the parietal and prefrontal cortex are presumed to process egocentric spatial in-formation (Stein, 1989; Colby and Goldberg, 1999; Lee and Kesner, 2003).

Enclosing Spatial Representations

These findings at least suggest that a heterogeneous representation of the geometric structures and their properties surrounding an agent might be a valuable approach for robotic systems as well. As seen in Chapter 2 this also makes sense from a functional and application-oriented point of view.

There are approaches to this already being used in existing systems. But it seems that the conjunction of the different representations is often not modeled explicitly or even not at all. Also, egocentric representations are mostly of a short-term nature, so that they only exist in the moments of benefit. They are not preserved for later use.

When we go back to the PR2 system presented in Hornung et al. (2012) (the complete system is better described in Chitta et al. (2012)) it becomes clear that the actual grasping task is executed in an egocentric way, but is completely decoupled from the navigation system which works allocen-trically. The system creates an egocentric representation of the tabletop scenario in front just when it starts executing the grasping task. The only exchange between the two representations mentioned is the transfer of recog-nized object models from the egocentric representation to an environment-centered semantic representation of known entities which is based on the navigation map. However, as already discussed, all navigational tasks are purely allocentric, regardless whether it is a long-distance planning tasks, or a short-distance 3D collision avoidance task.

In Ziegler (2010) an allocentric semantic map representation has been explored, which has been used for navigation and attention tasks in a unified hierarchical representation, also containing global object locations which have been egocentrically detected. Regardless of the representation paradigms that have been used in the different approaches, it could be shown that robotic systems can benefit strongly from semantic knowledge about

30

3.1. The Geometric Foundation

certain areas or structures in the environment. This is particularly true for the object searching task described in the work mentioned above, but one can think of many other applications as well (clean up tasks, inference in planning tasks, grounding in referential communication).

In this chapter I argue for a closer integration of allocentric and egocentric representations in an enclosing structure. This approach will be developed further in the following sections (specifically Section 3.3).

Populating the Spatial Representation

In order to generate knowledge about the environment to facilitate these integrated robotic strategies, much work has been done in segmenting ob-jects from a scene based on classical tracking or classification approaches.

Many of those applications expect a detailed model (e.g. CAD) of the ob-ject to be segmented (Albrecht and Wiemann, 2011). Algorithms pursuing this approach make use of decomposition (Gelfand and Guibas, 2004) of the scene, rely on local hierarchical features combined with a matching al-gorithm (Steder et al., 2009), or combine 3D perception for detection and 2D vision for recognition (Pangercic et al., 2011). However, there has been done research done on segmenting structures with more instance indepen-dent, but category specific properties (Sturm et al., 2010). This approach can apply certain movement properties to structures in the environment just by observation. The authors thereby utilize a fixed set of template models for observed tracks in order to classify fairly specific movement models. This requires pre-learned knowledge of the world.

In contrast, there are other approaches targeting a bottom-up segmen-tation of the scene, rather than a semantic interpresegmen-tation of the objects.

Campbell et al. (2010) suggest a model-free approach for segmenting and 3D model creation through observation of multiple frames. Their approach expects the object to stand still while the camera is actively moved around the object. This might be relevant for certain scenarios in which the robot pro-actively explores its environment, but in other scenarios it may be fea-sible to employ passive observation to distinguish for example background from foreground structures.

Traditional background subtraction algorithms apply the assumption that the static background does not change over time to identify moving objects by detecting changes in the scene. For example Sheikh et al. (2009)

de-3. Partitioning the Workspace

scribe a sophisticated background subtraction algorithm that can be applied on freely moving systems like robots, which analyses trajectories of salient features over time. However, these approaches only detect moving objects, whereas for many scenarios movable objects as described by Sanders et al.

(2002) are of greater interest.