• Keine Ergebnisse gefunden

4. Edge-Based Tracking Methods 27

4.9. Adapting the Visual Appearance

If the appearance of an edge is not given, it can be still useful to take the visual properties of an edge into account for tracking lines or contours.

However, the appearance of an edge changes, if an object is viewed from dierent viewing directions and dierent distances. Figure 4.6 demonstrates the necessity of handling multiple appearances of an edge. From dierent viewing positions or by the rotation of the object itself, the perpendicular lines of pixel intensities at the same control point look very dierent.

Tsin et al. [110] presented an online learning approach of intensity patterns to establish correct correspondences between points on the model edges and edge pixels in an image.

They borrowed some ideas from keypoint recognition techniques with randomized trees [60]. In their system small segments of intensity patterns are used as descriptors. These descriptors are taken to train a randomized forest, which is then used to nd the correct edge pixels on a search line by classication.

In [118] we modeled the multiple appearances of an edge with a mixture of Gaussians, which represent the pixel intensities of a line of an edge control point. Considering the visual edge properties of only the last frame will not lead to good detection results, since the appearance of an edge can change when the camera or the object is moved. A possible idea is to apply a temporal low pass lter on the edge pixel intensities, so that their state of appearance becomes more stable over time. Unfortunately, the edge of an object does not necessarily look the same in every frame of an image sequence. The visual perspective, the lighting conditions and the background are many factors that can cause the appearance of an object edge to change considerably. It is therefore necessary to describe the properties of an edge with a multi-appearance model.

Condensation [46] is an eective but costly method to maintain a probability density over time. As the number of control points to be tracked is rather high, the condensation algorithm will be computationally very costly and it is therefore not suitable for real-time tracking. A compromise between accuracy and complexity is to use a mixture of Gaussians with a xed number of distributions.

We used the ideas of Stauer and Grimson [104], who used a Gaussian mixture model to represent a background model with multiple appearances. We did not apply this method on a background image, but on every intensity pattern of an edge control point. Several hypotheses of an edge are maintained by this lter, and the most probable one is always used for detection.

An edge's visual property is represented by a multidimensional Gaussian distribution. In our implementation we simply used the pixel values of the correlation window as variables of the Gaussian. The dimensionm of the distribution is therefore equal to the size of the correlation window.

4.9. Adapting the Visual Appearance

Figure 4.6.: Necessity of using multiple visual appearances. In all images the extracted line of pixel intensities looks dierent.

A Gaussian distribution consists of a mean µ, a variance σ2 and a weight ω. The weight ω is a measure which represents what portion of data from previous frames is taken into account by this Gaussian. To avoid a costly matrix inversion, variances are calculated separately for every dimension of the Gaussian distribution. Edge property values are therefore treated independently from each other.

The probability of observing the current edge featurex at time t is P(xt) =

m

X

i=1

ωi,t·G(xt, µi,t, σi,t), (4.9)

where Gis the Gaussian probability density function.

To update the mixture model with a current value, every measured edge property xt is checked against the existing m Gaussian distributions. A match is dened if the mea-surement is within 3 standard deviations of the distribution. If none of the distributions is a match to the current measurement, the mean of the least probable distribution is replaced by the current measurement. The variance σ2 of this distribution is initialized with a high value, the weight ω is set to a low one. The Gaussian with the highest ωσ is interpreted as the most probable distribution.

The weights of the m distributions at time t can be computed by

ωi,t = (1−α)ωi,t−1 +αMi,t, (4.10)

where α is the learning rate andMi,t is 1for the matched distribution and 0 otherwise.

The parameters for the matched distribution are updated as follows:

µt= (1−β)µt−1 +βxt (4.11)

σ2t = (1−β)σt−12 +β(xt−µt)2, (4.12)

4. Edge-Based Tracking Methods

Algorithm 1 Adaptive line tracking algorithm

1: for all lines l of the 3D line model do

2: determine the number ofn control points depending on the length of the projected line

3: carry out the visibility test for control sample points and store the visible points in the set Vl

4: for all points Mi inVl do

5: project 3D point Mi it into the image plane

6: search for gradient maxima along the edge normal and consider every maximum larger than a certain threshold as an hypothesis for an edge point

7: for all possible edge points do

8: calculate the similarity with the most probable distribution of the adapted edge properties

9: end for

10: end for

11: end for

12: for all points ofV do

13: take only the hypothesis with the highest similarity as match and calculate the camera pose by non-linear minimization

14: end for

15: for all points ofV do

16: append the g most signicant gradient maxima to the list of possible hypotheses

17: apply the minimization again with the estimated pose of the previous step as initial guess

18: end for

19: for all points ofV do

20: update that appearance model which is most similar to the current control point

21: end for

whereβ =c·α is a second learn rate depending on the learn rate α. The learn rate α is controlled by the accuracy of the pose estimation. Good estimates set the learn rate to a higher value, whereas uncondent ones result in an update step with a smaller learn rate.

A signicant advantage of this method is that a dramatic visual change of an object edge in the image, e.g. when viewing an object from a totally dierent direction, does not destroy the existing state of appearance. The original edge property remains in the mixture of Gaussians. When the camera is turned back to an earlier position, the previous visual properties of an edge still exist with the sameµandσ2 but with a lowerω, and will quickly become the most probable distribution again. Another benet is that occlusions do not disturb the adapted edge properties too much, since it is likely that dierent looking edges are assigned to dierent distributions.

To clarify the adaptive line tracking algorithm with multiple hypotheses the high-level pseudocode for processing one frame of an image sequence is pointed out in Algorithm 1.