Particle Filtering for Visual Tracking

4.2 Assembly Localization

4.2.1 Particle Filtering for Visual Tracking

A number of different approaches to particle filtering have influenced this thesis. In order to account for each of them appropriately, they are outlined in the following. Note that they have entirely been proposed for performing object tracking within image sequences.

It is discussed at the end of this paragraph in which important respects this task differs from general pose estimation for assembly inspection.

4.2 Assembly Localization

A widely recognized paper was published by Blake & Isard [IB98a] who were among the first authors that used particle filtering to accomplish computer vision tasks. In their work, they propose the CONDENSATION algorithm which they successfully apply to the problem of visual tracking. For example, they track the contour model of a person walking in front of other persons, the contour model of a dancing girl’s head, and the model of a human hand. The first two examples require the determination of 6 DOF whereas the hand model exhibits 12 DOF. Note that the hand was moved over a very cluttered desk. However, in all cases the object motion was restricted to affine transfor-mations which simplifies the tracking task considerably. Furthermore, a strong motion model was available. It was learned prior to CONDENSATION tracking in a bootstrap procedure that employed conventional Kalman filtering on video footage without (or only little) clutter.

The work of Blake and Isard received much attention in the computer vision community because it became apparent that, in the context of object tracking, particle filtering offers advantages over conventional techniques like Kalman Filtering. A major reason for this is the finding that object tracking frequently involves the approximation of non-Gaussian and modal pdfs, based on observation pdfs that are also non-Gaussian and multi-modal. Deutscher et al. [DBNB99] illustrate this problem in the context of tracking human motion. Such pdfs violate the fundamental assumption of Kalman filters and extended Kalman filters that the respective pdfs are Gaussian. On the contrary, particle filtering doesn’t impose any restrictions on the approximated pdfs.

A large body of literature has sprung from the original proposal of Blake and Isard. For instance, Fritsch [Fri03] extends it with the incorporation of symbolic context knowledge in order to recognize manipulative gestures in an office and an assembly construction domain. Nevertheless, it is important to note that CONDENSATION particle filtering becomes computationally intractable for state spaces of a dimension higher than 10 to 15. The basic problem is that a suitable approximation of pdfs requires particle numbers to increase exponentially in the dimension of the state space. Consequently, standard particle filtering is computationally intractable for the pose localization of multi-part as-semblies, as their state space easily exceeds a critical number of dimensions.

Chang & Ansari [CA03, CA05] and Schmidt et al. [SKF06] recently proposed kernel particle filteringto alleviate the above mentioned problem. By interpreting particles as state space positions around which kernels can be shaped, they combine particle filtering with kernel density estimation. This approach offers the advantage that positions between samples can be interpolated via kernel density estimation. The kernel representation thus allows to approximate a pdf with rather sparsely distributed particles. Furthermore, the authors note that quite frequently one isn’t interested in approximating a whole target pdf but rather needs to find its modes. They consequently apply a local mode finding approach, namely the mean shift algorithm. An instructive tutorial that demonstrates the application of this standard technique in the domain of image segmentation can be found

4 Assembly Inspection

in [CM02]. The application of mean shift iterations on sets of particles yields a compact representation of the modes of a high-dimensional pdf. With this technique, Schmidt et al. manage to track the articulated 3D model of a human torso and arm with 10 DOF in real-time performance on a standard PC. In this thesis, we also follow a kernel particle filtering approach. Several extensions and modifications are contributed that improve the measurement accuracy and precision of the respective assembly pose localization.

The density estimation that is inert to kernel particle filtering demands the specification of bandwidth parameters. Their number grows linear in the dimension of the sample space.

However, Chang & Ansari [CA05] suggest that one bandwidth parameter is sufficient, if the sample space undergoes a variance normalization. This thesis takes up the idea of Chang & Ansari and extends it with an automatic bandwidth selection scheme. The latter is similar to an approach that was proposed by Comaniciu et al. [CRM01] in the context of mean shift image segmentation. The resulting KPF still depends on one bandwidth parameter but behaves more stable.

Deutscher at al. [DBR00] published an alternative idea in order to dampen the amount of particles needed for pdf approximation. The authors generate particle sets in a layered fashion. Each layer contains a small number of particles that are sparsely distributed in the state space. By manipulating the function that associates weights to particles, Deutscher and colleagues manage to iteratively migrate the particles to the modes of the target pdf. Because their approach uses ideas from simulated annealing procedures, they name itannealed particle filtering. This thesis incorporates a new KPF extension that is related to the idea of weighting function manipulation in the course of mode detection.

Gavrila & Davies [GD96] proposed a multi-view approach for the 3D model-based track-ing of humans. Here, they use a search space decomposition strategy in order to re-duce the complexity of the tracking task. It proceeds by first determining the position of head and torso. Afterwards, the model parts representing arms and legs are fitted to the image independent from each other. The advantage of this approach is that it divides the state space into three subspaces within which the subsequent search is computationally tractable. However, the proposed partitioning of the search space is quite ad hoc as the authors don’t state a decomposition strategy. In this thesis, the proposed KPF employs a heuristic that dynamically partitions the state space into subspaces of constant dimensio-nality. This strategy is fundamental for obtaining a KPF that can perform pose estimation for assemblies that are composed from multiple parts.

In Summary, all mentioned approaches have contributed advances in the field of visual tracking. Their capability to localize even articulated objects is appealing. However, it must be noted that the presented tracking approaches rely on two key assumptions that can’t be made in the more general case of pose estimation for assembly inspection. First, all mentioned approaches depend on a full pose initialization to be given in advance.

Second, based on this initialization, the approaches determine the object pose in subse-quent images by exploiting a tracking assumption. The latter assumes that an object can’t

4.2 Assembly Localization

move far in the small time that elapses in between the recording of two successive image frames. The tracking assumption helps to exclude large image and pose space regions from further consideration and thus provides a powerful search constraint. Furthermore, in some of the proposed tracking scenarios, motion models are learned in bootstrap pro-cesses which also guide the tracking.

In a case like assembly pose localization, where all the above mentioned assumptions and constraints are missing, any single of the proposed tracking approaches wouldn’t take us very far. Therefore, the most important contribution of this thesis is to combine and extend the ideas presented above in order to obtain a new kernel particle filter that performs the task of assembly localization.

Im Dokument Automated visual inspection of assemblies from monocular images (Seite 72-75)