Extraction of Object Parts - Training the Hierarchical Model

5.3 Training the Hierarchical Model

5.3.4 Extraction of Object Parts

This results inn^cadditional inequality constraints. However, in the case ofn_i ≤1 the constraint for component ican be omitted since it is already represented in (5.12).

The final constraints ensure that each physical instance is assigned to at most one component. This can be formalized by restricting the sum of allx_i,kthat are associated with the arcs ending in the same physical instance to be smaller or equal to 1. Letn^mat_j be the number of matches that are assigned to the physical instancejand letx_i,k(j, l)be the variablex_i,kthat represents thel-th match that is assigned to the physical instancej. Then the constraint can be formalized as (cf. (5.10)):

nX^mat_j

l=1

x_i,k(j, l)≤1 ,∀j= 1, . . . , n^phys . (5.14)

Thus, the resulting linear programming problem is described by the objective function (5.11), which must be maximized subject to the constraints described by (5.12)–(5.14). Several efficient standard algorithms are avail-able in the literature for linear programming. One of the most popular representatives is the simplex method (Press et al. 1992, Bronstein et al. 2001). Although it has been proven that its theoretically worst case runtime complexity is exponential, it merely shows polynomial time complexity on average for practical problems. Nev-ertheless, several “true” polynomial-time algorithms have been developed, e.g., (Karmarkar 1984). Since the description of one of these algorithms would go beyond the scope of this dissertation the reader is referred to the literature. Finally, the result of the linear programming provides a value for each x_i,k that is either 0 or 1. In the present example, allx_i,kare returned as 0 exceptx₁_,₁,x₂_,₁, andx₃_,₃, which are returned as 1, as one would expect (see Figure 5.16(b)).

It should be noted that the algorithm is able to handle missing components by choosing the constraints in the proposed way. However, it requires that at most one instance of the compound object is present in each example image since otherwise the algorithm would pick out the best component matches from different instances.

Returning to the original example, the ambiguities are solved for each example image individually according to the above described method. Hence, a unique pose for each component in each example image is obtained.

The final result for all example images is shown in Figure 5.17. The unique poses are stored within then^e×n^c component pose matrix.

By solving the ambiguities during the training of the hierarchical model the correspondence problem that would arise during the online phase when searching the object parts independently from each other is already implicitly solved within the hierarchical model. Thus, one can say that the correspondence problem is shifted from the online to the offline phase with the considerable advantage that a real-time recognition of compound objects is made possible.

(a) Model image (b) Example images

Figure 5.17: Ambiguities solved by linear programming. The model image and the reference component configuration is shown in (a). Similar components are distinguished by different line widths for visualization purposes. For each example image the result of the linear programming assigns an unambiguous pose to each component (b).

accuracy information for the pose parameters, the accuracy must be specified empirically (e.g., by applying tests with various objects of different size and shape). The reference position and orientation of the components in the model image are assumed to be error-free. Starting with this information, the probability that the two components belong to the same object part can be computed.

At first, the parametersαandtof the rigid transformation that transformM_i₁ intoE_i₁ are computed (cf. (5.1) and (5.2) in Section 5.3.3.1). By applying the transformation to the poseM_i₂ the projected poseE_i⁰₂ of componenti₂ in the example image is obtained (cf. (5.3) and (5.4) in Section 5.3.3.1). The assumption that both components belong to the same object part would require that component i1 and i2 have moved identically with respect to the model image, and henceE_i⁰₂ =E_i₂. In general, this requirement is not fulfilled, even for components of the same object part because of the limited accuracy of the object recognition method. One method to get a kind of probability value for the current pair of components is to compute a distance measure between E⁰_i₂ andE_i₂. The drawback of this method is that it is hard to decide up to which distance the components can be treated as belonging to the same part. A better result can be obtained by computing a real probability valuep_i₁_,i₂ ∈[0,1]. This is achieved by applying methods of hypothesis testing. Stating the hypothesisE⁰_i

2 =E_i₂ requires

x^e_i₂⁰−x^e_i₂ = 0 (5.15)

y^e_i₂⁰−y_i^e₂ = 0 (5.16)

ϕ^e_i₂⁰−ϕ^e_i₂ = 0 . (5.17)

The hypothesis can be rewritten in matrix formHx=w, where

H =





1 0 0 −1 0 0 0 1 0 0 −1 0 0 0 1 0 0 −1



,x= (x^e_i₂⁰, y^e_i₂⁰, ϕ^e_i₂⁰, x^e_i₂, y^e_i₂, ϕ^e_i₂)^>,w=



 00 0



. (5.18)

For further processing, the covariance matrix K^e_i₂⁰ of the projected pose E_i⁰

2 is needed. It can be obtained by applying the law of error propagation to the covariance matrix K^e_i₁ with respect to equations (5.1)–(5.4):

K^e_i₂⁰ = AK^e_i₁A^>. Here, A is the 3×3 Jacobian matrix, which contains the partial derivatives of the pose

parameters inE_i⁰

2 with respect to the pose parameters inE_i₁. After some simplifications,Areads as follows:





1 0 ∆x_1,2sin ∆ϕ^m,e+ ∆y_1,2cos ∆ϕ^m,e 0 1 −∆x_1,2cos ∆ϕ^m,e+ ∆y_1,2sin ∆ϕ^m,e

0 0 1



 , (5.19)

with∆x₁_,₂ =x^m_i

2 −x^m_i

1,∆y₁_,₂=y^m_i

2 −y^m_i

1, and∆ϕ^m,e=ϕ^e_i

1−ϕ^m_i

1. Now, the associated covariance matrix of xcan be composed:

K_xx = K^e_i₂⁰ 0 0 K^e_i₂

, (5.20)

where0represents a 3×3 zero matrix.

With the results obtained by the previous calculations all information to perform the actual hypothesis testing is available. At first, a test valueT is computed (Koch 1987):

T = 1

r(Hx)^>(HKxxH^>)⁻¹Hx , (5.21)

where r = 3 denotes the number of equations in the hypothesis (5.15)–(5.17). The test valueT ∼ F_m,n has a (Fisher) F-distribution with the parameters m and n denoting the degrees of freedom. The parameter m corresponds to the number of equations r and the parameter n to the redundancy involved in computing the accuracies of the pose parameters. If a value for nis not available, e.g., because the accuracy information has been obtained by empirically tests instead of a preceding parameter adjustment, it is assumed that the accuracies have been determined by using an infinite large set of samples. Consequently, n → ∞ and theF distribution degenerates to theχ²-distribution withT ·r ∼χ²_r. Hence, the probabilityp_i₁_,i₂ that the two components belong to the same object part can be written as

p_i₁_,i₂ = 1− ZT

−∞

F_r,n(t)dt ^n→∞= 1−

T·r

−∞

χ²_r(t)dt , (5.22)

withF_r,n(t)andχ²_r(t)representing the probability density function of the respective distributions. For practical considerations, the evaluation of (5.22) can be reduced to the calculation of the associated incomplete gamma function (Press et al. 1992).

The probability matrix that is obtained by repeating the computations for each directed pair of components is not symmetric, i.e.,p_i₁_,i₂ 6=p_i₂_,i₁. This at first glance non-intuitive observation becomes evident when examining the transformation described by (5.1)–(5.4) more closely. The small example in Figure 5.18 facilitates the discus-sion. Assume that the pose of the two components shown in Figure 5.18(a) are determined in the example image shown in Figure 5.18(b). In the first step, the poses of component 1 in the model image and the example image, respectively, are used to compute the rigid transformation parameters (i.e.,i₁ =1). In the second step, compo-nent 2 is projected into the example image using the calculated transformation (i.e.,i₂ =2). The projected pose of component 2 only differs in orientation from its true pose (see Figure 5.18(c)). A different observation can be made if component 2 is used to compute the transformation parameters and component 1 is projected accordingly (i.e., i₁ = 2,i₂ = 1). The projected pose of component 1 not only differs in orientation but additionally differs in position from its true pose. Hence, in the second case the associated directed probability value is significantly lower. In order to receive a symmetric undirected probability measure, the minimum of both corresponding di-rected probabilities is taken since a rigid object simultaneously requires that both didi-rected probability values are small. Finally, because a high probability value is required for components of the same object part in all images, either the minimum value or a more robust quantile value over all example images is computed. Consequently, another demand on the example images can be derived. Assume that the minimum probability value is decisive.

If in all example images two object parts accidently move in the same manner, then the algorithm will mistakenly assume that the two parts can be combined in one rigid part. Therefore, different object parts must show a relative movement in at least one example image in order to be detected as two separate object parts.

(a) Model image (b) Example image (c)i₁=1 (d)i₁=2

Figure 5.18: Non-symmetry of relative movement. The poses of the two components that are extracted from the model image (a) are uniquely determined in the example image (b). In (c) component 2 is projected according to the pose of component 1. The true and projected pose only differ in orientation. However, when projecting component 1 according to the pose of component 2, additionally a translation difference occurs (d).

0.8 1.0 0.4 0.6

0.0 0.2 (1)(2) (3)(4) (5)(6) (7)(8) (10)(9) (11)(12) (13)(14) (15)(16) (17)(18) FaceHat Left arm Outer square Right arm Inner square

"O"

"b"

"j"

"e"

"c"

"t"

Left hand Right hand Left leg Right leg Left foot Right foot

(2) (3) (4) (5) (6) (7) (8) (9) (10) (11)

(12) (13) (14) (15) (16) (17) (18) Face Left

arm Outersquare Rightarm Innersquare

"O" "b" "j" "e" "c" "t" Left

hand Righthand Leftleg Rightleg Leftfoot Rightfoot

(1)Hat

1 2

4 9

5 6

7 8

(a) Probability matrix (b) Object parts

Figure 5.19: The symmetric probability matrix contains information about the probability that a pair of components belongs to the same rigid object part (a). After clustering the matrix, ten rigid object parts are obtained (b). It should be noted that the numbers in (a) representing the components and the numbers in (b) representing the obtained object parts must not be confused.

In Figure 5.19(a) the finally obtained symmetric probability matrix is shown after computing the minimum over all example images. One can see that the pairwise probability for belonging to the same object part is high for the hat and the face as well as for the components forming the upper body. In contrast, the remaining probabilities are approximately zero.

In the second step the components are partitioned into groups, or clusters, such that the probability between components in the same cluster is high and the probability between components in different clusters is small.

The clusters finally represent the rigid object parts. In the following, the computed probability matrix is also referred to as a similarity matrix, where the similarity expresses the rigidity between components. Many different clustering algorithms for similarity matrices, or dissimilarity matrices, respectively, have become available. A comprehensive overview is given in (Jain et al. 1999). To find the rigid object parts, a hierarchical agglomerative

clustering algorithm is applied to the similarity matrix (actually, it is sufficient to take the upper triangular matrix into account):

1. Initializen^c clusters, each containing one component.

2. Find the maximum entry in similarity matrix. If the maximum similarity is less than a threshold p^min, return clusters and stop calculation.

3. Merge associated pair of clusters into one cluster.

4. Update similarity matrix to reflect the merge operation.

5. If at least two clusters are left, go to step 2, else return clusters.

The update of the similarity matrix stated in step 3 means recalculating the similarity values between the new cluster and the remaining clusters using a certain linkage metric. The linkage metric characterizes the similarity between a pair of clusters. The most popular methods either use the single-link, average-link, or complete-link algorithm (Berkhin 2002). In the single-link algorithm, the similarity between two clusters is the maximum of the similarities between all pairs of components drawn from the two clusters. Accordingly, the complete-link algorithm takes the minimum similarity and the average-link algorithm the average similarity. On the one hand, the complete-link algorithm is too stringent, and hence often fails to link components belonging to the same object part. On the other hand, the single-link algorithm suffers from a chaining effect: in spite of a very low probability value the two associated components may be merged into the same cluster if the low probability is bridged by other components. In contrast, the average-link algorithm has proven to be a suitable compromise, and hence is applied to cluster the components. Finally, the number of returned clusters specifies the number of rigid object parts n^p within the compound object. The result is shown in Figure 5.19(b), where the probability matrix of Figure 5.19(a) has been clustered using a threshold p^min for the probability of 0.5 (cf. step 2 of the clustering algorithm). The 18 components have been clustered into 10 object parts. Now the hat and the face form one object part. Furthermore, the components of the upper body belong to the same object part.

Finally, for the object parts that consist of exactly one component the pose of the component is adopted by the object part in each example image. For those object parts that consist of more than one component the poses must be explicitly determined. Simply taking the average pose over all involved components within a cluster would introduce errors. To avoid these errors, a new rigid model is created for the corresponding object parts from the model image and used to search the object parts in all example images. This can be realized in a similar manner as for the components. However, the search can be focused on a very restricted parameter space since an approximate pose is known. Therefore, the computational effort is negligible and no ambiguities must be solved.

After this step, for each rigid object part the pose parameters in each image are available and stored within the n^e×n^ppart pose matrix for further analysis.

Some concluding remarks concerning the image rectification should be mentioned. In order to ensure a correct computation of the probability values, it is necessary that the model image and all example images are free of distortions. Therefore, it is essential that all images are rectified with the approach presented in Chapter 3 in order to eliminate radial and projective distortions. Otherwise the distortions would lead to pseudo-movements between components that belong to the same rigid object part, and hence would result in small probability values.

Consequently, also the search images in the subsequent online phase must be rectified because the extraction of the relations, which will be described in the following section, is also based on the (rectified) example images.

Nevertheless, in some very time-critical practical cases it may be desirable to refrain from image rectification.

Therefore, if no significant projective distortions are present one can compensate existing radial distortions using one of two possibilities. The first possibility is to appropriately reduce the thresholdp^minfor the probability value.

The second possibility is to appropriately increase the standard deviations of the pose parameters, which are used in the hypothesis test. Consequently, small relative movements between object parts cannot be distinguished from effects caused by the radial distortions any longer. However, if no small movements must be expected and detected this is a suitable possibility in practice to avoid the rectification and to further speed up the online phase.

Im Dokument Hierarchical Real-Time Recognition of Compound Objects in Images (Seite 112-117)