Multi-Resolution Model - Increasing the Efficiency

4.2 Modified Generalized Hough Transform (MGHT)

4.2.3 Increasing the Efficiency

4.2.3.1 Multi-Resolution Model

To reduce the size of the accumulator array and to speed up the online phase, the original GHT is embedded in a coarse-to-fine framework using image pyramids as described in Section 4.1.3. This coarse-to-fine approach affects both the offline phase and the online phase. In the offline phase, it leads to the generation of a multi-resolution model. The construction of this multi-multi-resolution model will be described below.

At first, an image pyramid of the model image is generated. Let I_l^m, be the model image at pyramid level l, l =0, . . . , n^l−1, wheren^ldenotes the number of involved pyramid levels. I₀^m represents the model imageI^m at original resolution. For increasing values oflthe resolution, and hence the image dimensions, are successively halved. To obtain the imageI_l^m at pyramid levell, the imageI_l−^m₁is smoothed using a mean filter of size 2×2 in order to meet the Nyquist theorem, and sub-sampled using a sampling interval of 2 pixels, as described in Section 4.1.3.

When determining the optimum value forn^ltwo conflicting objectives must be balanced. On the one hand, the number of pyramid levels should be chosen as high as possible to obtain a high potential for speeding up the recognition process. On the other hand, the object on the top pyramid level, which has the lowest resolution, must still be recognizable. I.e., the object must still exhibit significant characteristics that keep it distinguishable from other objects in the image. Formally, the number of pyramid levels must be maximized under the constraint that object characteristics are preserved. To avoid burdening the user with an additional input parameter and to ensure a high degree of automation, in the following a method that automatically computesn^lwill be introduced.

Obviously a meaningful description of the object is impossible if the number of model edge pixels on the current level falls below a certain threshold. This represents the first criterion that must be fulfilled. Several practical tests have shown that pyramid levels containing less than ten model points can be discarded.

A minimum number of edge pixels is a necessary but in no way a sufficient requirement. Therefore, a more sophisticated approach must be applied when evaluating the requirement for preserved object characteristics.

The principle of this second criterion is illustrated in Figure 4.7.

At first, an image pyramid is computed on the model image using the maximum number of levels, i.e., the top pyramid level is only one pixel wide in at least one dimension. Figure 4.7(a) shows the first four pyramid levels of the example model image. On all pyramid levels edges are segmented (see Figure 4.7(b)). For each pyramid level that fulfills the criterion of a minimum number of model edges pixels, the edges are scaled back to the original resolution and a distance transform (J¨ahne 2002) is computed on the scaled edges using the chamfer distance (cf.

Section 4.1.1.2) for high accuracy. Figure 4.7(c) shows the scaled edges and the associated distance transforms, where brighter gray values represent higher distances. Finally, the model edges at the original resolution are superimposed on the distance-transformed image and the mean distance of the original model edges to the (scaled) edges at the current level is calculated by summing up the underlying gray values (see Figure 4.7(d)). Hence, this

(a) Pyramid of model image

(b) Segmented model edges on each pyramid level

(d) Model edges at original resolution superimposed on the distance transform

Figure 4.7: The number of pyramid levels is computed automatically by measuring the deformations of the model edges on the respective pyramid levels (D(0) = 0%,D(1) = 1%,D(2) = 3%,D(3) = 9%).

is a measure of how much the model edges are deformed by the smoothing that comes with the image pyramid.

The average distance is normalized by dividing it by the size of the object. This takes into account that small objects are already less distinctive and therefore allow only small deformations while bigger objects can cope with higher deformations without loosing their distinctive characteristics. The object size is represented by the radiusr0of a circle that has the same area as the ellipse that has the same moments as the model edges at original resolution. Thus, the normalized average distance is a measure of deformationD(l)that describes how much the original shape is degenerated on pyramid levell.D(l)is computed as:

D(l) = P_nm

i=1minj=1,...,n^m_l kp^m_i −Λ(p^m_j,l, l)k

3n^mr₀ , (4.17)

wherep^m_j,l,j=1, . . . , n^m_l are the model edge points at pyramid levell,k·kis the chamfer distance, andΛ(p^m_j,l, l) is the scaling of the model edge points from the current levellback to the original resolution:

Λ(x, l) = 2^lx . (4.18)

The division by three in (4.17) again compensates the unit length of the chamfer distance. Finally, D(l)must not exceed a certain threshold for all involved pyramid levels. This threshold is generic and independent from the

D(0) =0.0% D(5) =7.6% D(6) =16.1%

D(0) =0.0% D(5) =5.0% D(6) =14.8%

D(0) =0.0% D(3) =4.6% D(4) =10.4%

D(0) =0.0% D(3) =8.3% D(4) =23.4%

(a) (b) (c) (d)

Figure 4.8: The number of pyramid levels are calculated for four examples. From the model image (a) edges are extracted (b).

The edges of the highest accepted pyramid level that fulfills the criterionD(l) <10%(c) and the edges of the lowest non-accepted level (d) are shown.

object and can be determined empirically. Several experiments involving different types of objects with various sizes have shown thatD(l)should not exceed 10% in order to avoid strong deformations.

In order to visualize the theoretical results, four practical examples are presented in Figure 4.8. Figure 4.8(c) shows the edges of the top pyramid level that has just been accepted by the algorithm. One can see that in most cases the result of the automatic determination of the number of pyramid levels is very intuitive, except maybe for the example in the third row, where one might chose the fourth instead of the third level as top pyramid level.

However, the deformation measureD(4)of 10.4% indicates a narrow decision.

After the pyramid of the model image has been derived, the generation of the multi-resolution model can be started. While generating the model, one has to distinguish between the top level I_n^m_l₋₁ and the lower levels.

In the online phase, also for the search image an image pyramid is derived by computing the same number of pyramid levels as for the model image. A breadth-first search is then applied: the recognition process starts at the top pyramid level of the search image without any prior information about the transformation parameterso^sand ϕ^savailable. Therefore, the conventional GHT is applied to the top pyramid level. As top level strategy, all cells inAthat are local maxima and exceed a certain threshold are stored as match candidates and used to initialize approximate values on the next lower level. Thus, the coarse values on the top level are subsequently refined by tracking the match candidates down through the pyramid to the highest resolution of the original search image.

Using the breadth-first strategy, all match candidates are refined at the current level before the candidates are tracked to the next lower level. The breadth-first strategy is preferable for various reasons, most notably because a heuristic for a best-first strategy is hard to define, and because depth-first search results in higher recognition times if all matches should be found (Steger 2002).

Unfortunately, the GHT in its conventional form is not very well suited for the use of image pyramids because the prior information cannot be used in a straightforward way, as it is the case when using alignment methods, for example. This is the reason why only for the top level the R-table is built in its conventional form as described

in Section 4.2.2. Whereas on the lower levels I_n^m_l₋₂ to I₀^m a modified strategy is employed to efficiently take advantage of the prior information, i.e., approximate transformation parameters, obtained from the next higher level. As lower levels strategy, during the refinement only the respective best match within a local neighborhood of the approximate transformation parameters in parameter space is further tracked (see the description of the descent policy of Section 4.1.3). This strategy combined with the top level strategy facilitates finding all matches in the image while keeping the computational effort low. The problems of using image pyramids within the GHT will be discussed in the following section. Additionally, the proposed solutions will be introduced.

Im Dokument Hierarchical Real-Time Recognition of Compound Objects in Images (Seite 56-59)