• Keine Ergebnisse gefunden

Application of Syntactic Context

3. Syntactic Assembly Modeling 27

3.3. Spinoffs from Syntactic Assembly Detection

3.3.3. Application of Syntactic Context

whereas analyzing its construction process yielded:

BOLTovcoRIN Gvco BARvco {BOLTb vcb CU BE}.

where vcb and vco denote the ordering relations due to the blue and the orange bolt, respectively. Note that action detection yielded relations between elementary objects and a set of objects which represents the assembly depicted on the left of Fig. 3.21. This in fact is reasonable because the assembly was taken during the construction process and thus has to appear in one of the hands. It is obvious from this example that simple set comparisons will indicate whether tiny objects are missing in the result from visual assembly detection but are contained in the structure generated by action detection.

They thus can easily be inserted into the syntactic structure.

for Elementary Object Recognition

Color Regions

Classified Focus Points Classified

Image Regions Object Descriptions

Classified Image Regions Assembly Detection

Cue Integration

Syntactic

Grouped Contours

Object Hypotheses Degree of Assemblage

Figure 3.22.: Detail of a computer vision architecture that combines several processing units into a robust and reliable recognition system forbaufixrobjects.

(a) (b) (c)

Figure 3.23.: 3.23(a) Assembly and results from elementary object recognition where the lower bar could not be recognized. 3.23(b) Results from syntactic assembly detection. 3.23(c) Hypothesis generated during the parsing process.

recognition. Note that there are some shadowy regions which were not classifiable and thus are labeledunknown. Apart from these, there is another unknown region which in fact represents a bar but could not be recognized as such.

The second image shows the assembly detection result. Syntactic cluster analysis started at the upper bolt, continued with the adjacent bar and then found the cube.

Consequently, an ASSEMBLY was instantiated. Cluster parsing restarted at the lower bolt which was instantiated as the BOLT-PART of another assembly. At this stage of the

Figure 3.24.: Assembly with detection result and angular range where the parser would expect a bolt from contextual conditions.

analysis, there were two regions directly adjacent to the bolt which left two alternatives to continue parsing. But since one of the regions is represented by a black polygon, it was correctly assumed to depict shadow and was hence neglected. The remaining region, however, does not have an object label either and could thus not be instantiated as an assembly component.

Nevertheless, in situations like this, labels for unknown cluster elements can be guessed from the current state of the analysis. It just has to be considered what kind of part was instantiated last and which parts would accord with the syntactic assembly model if they were instantiated next. If, like in the example, a BOLT-PART was instanti-ated last, one would expect to find a miscellaneous object or a nut subsequently rather than another bolt. Such expectations can be generated for any partial assembly struc-ture, i.e. for any case where some but not all parts of an assembly are instantiated.

Therefore, it is generally possible to restrict the type of unknown cluster objects and to provide plausible cues for elementary object recognition.

In the example the unknown region is of wooden color and has a certain extension.

This narrowed the choice of possible labels even further and the region was hypothesized to represent a bar (see Fig. 3.23(c)). However, hypotheses are not integrated into struc-tural descriptions for they may still be wrong. Instead, they are transferred back to the module for elementary object recognition that, in the next processing cycle, integrates them with cues from lower level modules.

Inference from syntactical context not only allows to generate hypotheses for un-known cluster objects but also enables to predict objects which cannot be seen at all.

Figure 3.24 shows an assembly we already know from the introduction. Here, the image also contains a polygon surrounding the area corresponding to the most comprehensive assembly structure that could be found. Note that the rhomb-nut, the felly, and the tyre are not part of this area. They were not instantiated as assembly components because

Object Recognition:

Assembly Detection:

Figure 3.25.: Two competing recognition results (in the alternative on the right, the red cube was wrongly classified as a felly). Assembly detection yielded a degree of assemblage of 1.0 for the left and of 0.3125 for the right alternative.

the bolt they are attached to is not visible. However, during the analysis the rhomb-nut was chosen as a starting point for parsing13. In that parsing attempt the linear arrange-ment of the nut, the adjacent bar, the felly, and the tyre hinted that these elearrange-ments are very likely to form an assembly if there only was a bolt. The black triangle indicates the admissible angular range in the image where the semantic net would expect and accept the bolt.

Expectations like this can actually be used to coordinate perceptions in cooperative man-machine interaction. If the assembly detection module is integrated into a system for multi-modal cooperation in assembly (cf. [7]) and has to process a scene like the exemplified one, it may verbalize its expectations. Or, it may request the user to slightly move the assembly in question so that a complete structure might be generated.

13Usually, bolts are preferred as starting points for parsing. But if there are unexamined cluster objects and none of them is a bolt, other obligatory parts are chosen instead. As a nut the rhomb-nut did meet this criterion.

Assessing Competing Object Recognition Results

Figure. 3.25 exemplifies the second type of interaction between elementary object recog-nition and assembly detection. The upper row shows two competing object recogrecog-nition results which are due to inconsistent cues from lower level vision. On the right, the red cube was wrongly classified as a felly, a mistake which occurs frequently for features of cubes and fellies computed from region segmented images are in fact very similar. How-ever, if there are competing interpretations of regions contained in a cluster, syntactic assembly detection can provide evidence which alternative might be correct.

As we see in the lower row of Fig. 3.25, assembly detection yielded a structure in-cluding all cluster elements for the alternative on the left. Whereas the recognition error in the right alternative caused the detection of two disjoint assemblies and furthermore prevented some objects of the cluster to be associated with an assembly. Under the assumption that a cluster of labeled regions very likely depicts a single assembly, phe-nomena like this indicate that there are misclassified regions.

However, a cluster must not depict an assembly at all. Hence, a binary decision between right or wrong should be avoided, instead, a measure of the likeliness of a labeling scheme would be preferable. Therefore, we introduc thedegree of assemblageof an object cluster.

Definition:LetCbe a cluster of objects and let|C|be the number of its components.

Let a(C)≤ |C| denote the number of objects in C which are part of an assembly and letA(C) be the number of disjoint assemblies found inC. Then thedegree of assemblage α(C) of an object clusterC is given by the quotient:

α(C) = a(C)/|C|

A(C)

Thus, α sets the number of disjoint assemblies detected in a cluster against the number of objects found to be part of an assembly and the total number of cluster elements. If there are few distinct assemblies and the ratio of assembled objects and cluster elements is high, i.e. a(C) ≈ |C|, then α will be high and the input data for assembly detection is likely to be correct. If the number of disjoint assemblies found in a cluster is high and only few of its objects are assembled, i.e.a(C) |C|, then the degree of assemblage will be low and the results from elementary object recognition seem not to be reliable. In our example, |Cl| = |Cr| = 8 for both alternatives. For the (correct) left alternative we have a(Cl) = 8 and A(Cl) = 1 while assembly detection based on the (faulty) right result yields a(Cr) = 5 and A(Cr) = 2. Thus α(Cl) = 8/81 = 1 and α(Cr) = 5/82 = 165 which hints that the result depicted on the upper left of Fig. 3.25 should be preferred.

Especially in cases of severe perspective occlusion, the recognition of elementary object gains from this scheme of module interaction. On a test set of 26 images of assemblies where many components were just partially visible the recognition accuracy

could be improved by 10% [112]. As we will see in Chapter 6, this of course affects the performance in assembly detection. Right now, we can conclude that the integration of lower level image processing and higher level analysis of symbolic information suggests an avenue towards more robust and reliable recognition in computer vision.