• Keine Ergebnisse gefunden

The Color Saliency Based Tracking Module

6. Acoustic Packaging as a Basis for Feedback on the iCub Robot 69

6.1.3. The Color Saliency Based Tracking Module

The main task of the color saliency based tracking module is to provide features on salient moving regions in the visual input of the system. The module provides trajectories including color properties of the moving regions to the acoustic packaging system. The approach is based on the assumptions that during action demonstrations the objects are typically moved, and uniformly colored objects are used. For the implementation, the same framework by L¨omker et al. (2006) as described in Section 4.3.4 is used. The module performs several processing steps starting with the input image and ending with trajectory estimation which will be described in the following.

Detecting Changing Regions

First, temporal changes in the visual input are detected using an approach based on motion history images (Davis and Bobick, 1997). Motion history images are reused, since they are already calculated for measuring the amount of motion within the initial acoustic packaging system (see Section 4.3.4). Here, the difference is that motion history images will be used to locate and separate changes in the visual input instead of only measuring the amount of change. An example of a motion history image is displayed in Figure 6.1a.

(a) Motion history image. (b) Labeled motion history image (Each la-bel is highlighted in a different color for reasons of discernibility).

(c) Original image masked by the motion history image.

(d) Projection of masked pixels into the YUV color space.

(e) Original image with overlaid trajectory and top ranked salient region.

Figure 6.1.: Snapshots of processing steps within the color saliency tracking module.

Chapter 6. Acoustic Packaging as a Basis for Feedback on the iCub Robot

Masking

Once changing regions are available, the corresponding parts of the input image need to be selected. Therefore, the motion history image and the input image are combined leaving only those pixels that match areas with non-decayed history. Since the history extents over a certain timetmax, an input image delayed bytmax/2 is used (see Figure 6.1c). This approach ensures that actual moving parts are in the center of the history image, which allows for better segmentation of homogeneous areas in the input.

Labeling

The previous step selects changing parts in the input image but does not identify which pixels belong to the same region or a region which is spatially not connected. Therefore, a labeling algorithm is used to identify connected regions (Soille, 2002). Additionally, this method allows for suppressing noise by rejecting regions that do not contain a sufficient amount of pixels. An example with labels highlighted in different colors is displayed in Figure 6.1b.

Projection into a UV Color Histogram

In this step, the result of the previous masking operation is projected into a UV color histogram. The YUV color space was chosen since it roughly approximates human color perception. The idea is to prepare for efficient color clustering. Using a histogram has the advantage that the distance function of the clustering algorithm only needs to be calculated once for each pair of histogram bins processed. Otherwise, every pair of pixels needs to be considered separately. In addition to U and V, the region label provided by the labeling step is used as a third key which separates colors by spatially different regions. Furthermore, each histogram bin does not only contain the number of pixels accumulated, but also their coordinates which allows for backprojection. In summary, all changing regions are represented within a three dimensional histogram using the region label as well as U and V as indices. Figure 6.1d shows a UV representation of the histogram where darker areas represent a larger bin size.

Clustering

The histogram bins are now clustered using thek-means algorithm (MacQueen, 1967).

Clustering is performed separately for each region label. The Euclidean distance in the UV space is used to compare color values. Figure 6.1d shows the UV space where cluster means are displayed as white circles. The black square shows the centroid of all clusters, which is required for the next processing step. After the histogram bins are clustered by comparing their colors, a merging step is performed where similar clusters are combined to a single cluster. In contrast to the previous clustering method, here not only color

information but also each clusters mean position in the input image is exploited. The mean position is determined based on the pixel coordinates associated to the histogram bins (backprojection). In this step, the region label is not further maintained, since spatial distance is part of the comparison. The idea behind this step is to merge neighboring clusters with similar colors, but to avoid clustering colors from separately moving regions.

Ranking

The ranking algorithm has a key role in the color saliency module, since it decides about each cluster’s level of saliency. For this purpose, all color clusters are sorted according to their distance to the centroid of all clusters. The cluster with the largest distance to the global centroid is the most salient cluster. Figure 6.1d displays the most salient cluster as a green circle. In this processing step, two aspects concerning infant color vision are realized (see Section 6.1.1). First, the position of the global centroid — and thus, the ranking — depends on the surrounding colors, since it averages over the area that is uncovered by the motion history image. Second, chromatic colors are preferred, since clusters with high distance to the centroid are likely in the outer areas of the UV plane.

Heuristic Filtering

Several heuristic filtering steps are applied to the ranked clusters. The first filter removes clusters which are below a saliency threshold, which means a cluster with no sufficient distance to the centroid. Typically, this is the case when only a hand or parts of a body are moving. Furthermore, each cluster’s density is checked, which is implemented as the ratio of pixel count to the area they are distributed over. By rejecting clusters with low density, noise is suppressed. Very large clusters are removed by checking the standard deviation of the pixels belonging to that cluster. The most important step in this process is removing clusters that correspond to uncovered background. Especially, background that contains chromatic colors would lead to clusters which do not correspond to an object. To overcome this problem, seeded region growing is used to test each cluster. The idea is that if region growing would expand a cluster further than a threshold factor, it is likely that this cluster belongs to uncovered background. However, this method is limited to uniformly colored backgrounds. Depending on the lighting condition, highlighted areas with skin color such as hands are detected as salient, especially if no object is moved.

Thus, a skin color detection algorithm implemented by Ingo L¨utkebohle based on the method by Peer et al. (2003) was adopted to reject these clusters.

Trajectory Accumulation

The surviving clusters from the previous filtering steps are tracked over time to build a trajectory. The tracking algorithm first compares all new clusters to the existing trajectory hypotheses. If, according to a distance function, no existing trajectory matches

Chapter 6. Acoustic Packaging as a Basis for Feedback on the iCub Robot

the cluster, a new trajectory is initialized. The distance function uses both the distance in color space as well as the spatial distance, which allows to track clusters even if they move quickly. If a newly initialized trajectory is shorter than a certain time frame or converges to another trajectory during this time frame, the trajectory is rejected. A trajectory is considered as complete if no new clusters are added within a certain time frame. The trajectories are ranked according to their mean saliency level. The output of the color saliency module is the most salient trajectory with a sufficient minimal length.

An exemplary tracking result is depicted in Figure 6.1e. The trajectory accumulation and all previous steps work incrementally. For each frame the current trajectory hypotheses are updated and the most salient trajectory is made available to the other modules in the acoustic packaging system by inserting it into the Active Memory (see Section 4.3.2).