Strategy - Hierarchical Real-Time Recognition of Compound Objects in Images

InputOutputProcessing

Initial Decomposition

Rigid Model

Generation Rigid Model

Generation Rigid Model Generation ROI of

Component 3 ROI of

Component 2 ROI of

Component 1 ROI of

Componentn

Extraction of Object Parts

ROI of Part 2 ROI of

Part 1 ROI of

Partn Rigid Model of Comp 3 Rigid Model

of Comp. 2 Rigid Model

of Comp.n^c

Pose (1,1) Pose (1,2) Pose (1,3) Pose (1, )n Pose (2,1) Pose (2,2) Pose (2,3) Pose (2, )n Pose ( ,1)n Pose ( ,2)n Pose ( ,3)n Pose ( , )n n

Rigid Model

Generation Rigid Model

Generation Rigid Model Generation Rigid Model

of Part 2 Rigid Model

of Part 1 Rigid Model

of Partn^p Rigid Object Recognition

Rigid Object Recognition

Extraction of Relations

Example

Image 1 Example

Image 2 Example

Imagen^e

c c

e e e e

Component Pose Matrix (n n^e×^c)

( (

Part Pose Matrix (n n^e× ^p)

Pose (1,1) Pose (1,2) Pose (1, )n Pose (2,1) Pose (2,2) Pose (2, )n Pose ( ,1)n Pose ( ,2)n Pose ( , )n n

p p

e e e

( (

Relations (1,1) Relations (1,2) Relations (1, )n Relations (2,1) Relations (2,2) Relations (2, )n Relations ( ,1)n Relations ( ,2)n Relations ( , )n n

p p

p p p

( (

…

… … ……

…

… … ……

…

… … ……

…

Relation Matrix (n n× )

Model Image ROI

p p

…

ROI of Part 1 ROI of

Part 2 ROI of

Partn^P

…

Rigid Model of Comp. 1

Figure 5.2: Training the hierarchical model (first stage of the offline phase)

movements, while different parts move with respect to each other. Assume that there is no prior knowledge about the compound object. Consequently, if one focuses on the model image and disregards the example images, one is unable to determine the rigid object parts since no information about the movement is available. However, it is possible to decompose the object into small components. In this example, the following components are percep-tible: hat, face, two arms, two hands, two legs, two feet, square margin of the upper body, and six components, one for each letter printed on the upper body. This means, that in this example the decomposition is done on the basis of image regions that exhibit a homogenous gray value. When extending the field of view to the example images, a human tries to match the corresponding components of the model image in the example images.

Fi-(a) Model image (b) Example images

Figure 5.3: Input data of the artificial example. A rectangular ROI defines the compound object in the model image (a). Six example images show the movements of the respective object parts (b).

nally, the components that do not move with respect to each other in all example images are unconsciously and immediately merged into rigid object parts by the human brain.

With this knowledge it is possible to model the extraction of rigid object parts as shown in Figure 5.2. At first, the domain of the model image defined by the ROI is initially decomposed into small components. The resultingn^c components are described byn^cROIs that refer to the model image. For each of the components a rigid model is generated using an arbitrary suitable object recognition approach. The pose of each component is then determined by the recognition approach in each example image and stored in the component pose matrix of sizen^e ×n^c. From the component pose matrix the components that do not show any relative movement in all example images are determined. The rigid object parts can then be extracted by merging the ROIs of the respective components.

Also, for the resultingn^p(≤n^c)object parts, rigid models are generated and used to determine the pose of each part in each example image in an analogous way as for the components. This results in the part pose matrix of size n^e×n^p. Finally, the relations between the parts can be extracted by analyzing the part pose matrix. The relations are stored in the square relation matrix of sizen^p×n^p. I.e., for an arbitrary pair of object parts(p, q)the relative movement of partq with respect to partpis stored in rowpand columnq. The relation matrix together with the ROIs of the extracted model parts represent the output data of the training process.

As an example, Figure 5.4 shows the relations of the left arm and the upper body, respectively, to all other object parts, i.e., the two corresponding rows of the relation matrix are visualized. Hence, the relative movements of the object parts with respect to the left arm and the upper body are displayed. For visualization purposes, these movements are projected back into the model image. The object parts are symbolized by their reference points.

The relative positions of the part’s reference points is symbolized by enclosing rectangles, and the relative orien-tations by circle sectors. A relative orientation of 0^◦is localized at the “3” of a clock’s dial and the center of the clock’s dial is visualized at the mean position of the respective part. For example, when looking at Figure 5.4(a) one can see that the relative movement of the left hand with respect to the left arm is smaller than the relative movements of the other parts. Furthermore, the relative movements with respect to the upper body displayed in Figure 5.4(b) on average are smaller than the movements with respect to the left arm.

In the second stage of the offline phase the information trained in the first stage is used to create the hierarchical model. The process is illustrated in the flowchart of Figure 5.5 in a generalized form. The model image and the output data of the training are passed as input data to the process. Because the orientation ranges of the object parts in the search image do not need to coincide with the orientation ranges used during training, again rigid

(a) (b)

Figure 5.4: Relations visualized for the left arm (a) and the upper body (b) in the model image. The object parts are symbolized by their reference points (small circles). The relations are visualized as rectangles (relative positions) and circle sectors (relative orientations).

models that cover the desired orientation range are generated. In order to find an appropriate root part, the rigid models of all parts are analyzed using certain criteria, which will be introduced later in this work. Based on the root part and the relations, an optimum hierarchical search strategy can be found by minimizing the search effort in the online phase. Here, it is assumed that in the online phase the extent of the relative part movements is less or equal than the extent of the relative part movements represented in the example images. If this assumption fails, the automatically derived relations must be extended manually by appropriate tolerance values. Then, the relations between parts pand q represent the search effort that must be spent to search part q relative to partp under the assumption that the pose of partpis already known. For example, if the poses of the left arm and of the upper body in the search image are known it would be more efficient to search the left hand relative to the left arm instead of searching it relative to the upper body (see Figure 5.4). Finally, the hierarchical model comprises the rigid models of all object parts, the relation matrix, and the optimum hierarchical search strategy.

Figure 5.6 visualizes the resulting hierarchical model for the example case. It uses the head as the root part.

Assuming a minimum search effort in the future search image, it further searches the upper body relative to the head, searches the two arms, and the two legs relative to the upper body, and searches the hands and the feet relative to the arms and the legs, respectively. To valuate the overall search effort, the relations between the parts that are adjacent in the search tree are visualized, i.e., they are connected by a edge in the tree. Thus, during the online phase the reference points of the respective object parts must only be searched within the small rectangular regions and within an orientation range visualized by the circle sectors.

Finally, the process of object recognition is displayed in the flowchart of Figure 5.7. Analogously to the online phase of rigid object recognition, the search image and the model — which now is a hierarchical model — are passed as input data to the algorithm. At first, the rigid model of the root part is selected from the hierarchical model and used to determine the pose of the root part in the search image. Since no prior knowledge about the pose is available the rigid object recognition approach must search the root part by scanning the full parameter space of positions and orientations.

Once the root part is found, the remainingn^p−1 parts can be searched within a restricted relative search space.

Thus, for each partqthe predecessor partpin the search tree is selected. Assume, for example, that a depth-first search is applied to the search tree presented in Figure 5.8(a). After the pose of the head is determined, the next part to search (i= 2) would be the upper body, i.e.,q = “Upper Body”. The associated predecessor part in the search tree is the head, and hence p = “Head”. The parameter space that must be scanned by the recognition

InputOutputInputProcessing

Model Image ROI of

Part 2 ROI of

Part 1 ROI of

Partn^p Rigid Model

Generation Rigid Model

Generation Rigid Model Generation

Root Part Rigid Model

of Part 1 Rigid Model

of Partn^P

…

Relation Matrix (n n^p× ^p)

Find Optimum Hierarchical Search Strategy

Hierarchical Search Strategy Find Optimum

Root Part Rigid Model

of Part 2

Hierarchical Model

Figure 5.5: Creating the hierarchical model (second stage of the offline phase)

(a) Hierarchical model

Upper Body Head

Right Leg

Right Foot

Right Arm

Right Hand ArmLeft

HandLeft

LeftLeg

FootLeft

(b) Search Tree

Figure 5.6: The hierarchical model comprises the rigid models of all object parts, the relations between the parts, and the hierarchical search strategy (a), which is represented by a hierarchical search tree (b).

approach to search part q is defined by the pose of partpand the relation between the parts pandq. The rigid model of part q is selected from the hierarchical model and used to determine the pose of part q within the restricted parameter space. The whole process is repeated for each part, finally obtaining the poses of all object parts. An example search image of size 512×512 and the corresponding found object instance is displayed in Figure 5.8(b). It should be noted that it is not necessary that the absolute orientation of the object in the search image is covered within the example images, since only relative movements between object parts are trained.

To give an impression of the advantage when using the proposed hierarchical model, the recognition time for this example was 20 ms on a 2 GHz Pentium 4. In contrast, the brute-force method that would search all parts

InputOutputInputProcessing

For i=2 ton

Partp Search Image

Select Rigid Model of Root Part Hierarchical

Model

Rigid Model of Root Part Rigid Object Recognition of Root Part

within Full Parameter Space Pose of

Root Part

Calculate Restricted Parameter Space for Part from Pose of Part and Relations ( , ) q

p p q

Partq Select th Part in

Search Hierarchyi Select Predecessor Part of Part in Search Hierarchyq

Restricted Parameter Space

Rigid Object Recognition of Part within Restricted Parameter Spaceq

Select Rigid Model of Partq

Pose of Partq

Rigid Model of Partq

Pose of Part 2 Pose of

Part 1 Pose of

Partn^p

…

Figure 5.7: Object recognition (online phase)

in the entire search space independently from each other would take 310 ms (using the SBM in both cases). The second obvious advantage should also be pointed out here: because of the inherently determined correspondence, which is provided by the hierarchical model, the returned match of the compound object implicitly covers a topologically sound representation. In contrast, when searching the parts independently, it is not immediately possible to distinguish between the matches of the left and the right arm, for example. Furthermore, if several object instances are present in the image, it is hard to assign a match of a certain object part to the correct instance of the compound object.

Although the basic idea of the approach seems to be very simple, several difficulties that are not obvious at first glance occur. They will be discussed in the following sections together with the detailed explanation of the previously introduced steps.

Upper Body Head

Right Leg

Right Foot

Right Arm

Right Hand ArmLeft

HandLeft

LeftLeg

FootLeft 1

3 5 7 9

4 6 8 10

(a) Depth-first search (b) Search image and found object instance

Figure 5.8: A depth-first search is applied to the search tree. In (a) the search order is indicated by numbers. In (b) an example search image and the corresponding found object instance is displayed. The poses of the individual object parts are visualized by superimposing the edges of the parts at the returned pose in white.

Im Dokument Hierarchical Real-Time Recognition of Compound Objects in Images (Seite 95-101)