Assembly Reconstruction - Matching Mating Feature Graphs

3. Syntactic Assembly Modeling 27

4.3. Matching Mating Feature Graphs

4.3.2. Assembly Reconstruction

free grammatical model of the general component structure of composite objects. Like one would expect from a grammatical approach to pattern recognition, this model allows to detect a huge variety of structures based on a small set of productions and does not require knowledge of individual assemblies. Therefore, though the modeling effort is low, the number of structures that can be treated is nearly unrestricted.

Syntactic analysis according to the assembly grammar can be performed by a se-mantic network. This allows to register how the objects of a cluster a situated to each other and thus enables to determine mating relations among assembly parts from image analysis. This information can be translated into a graph based representation that de-tailedly captures assembly topology. I.e. comprehensive topological models of individual assemblies can be derived automatically and do not have to be provided by an expert.

After deriving a mating feature graph from a plan it can be matched against a database of previously calculated graphs. If no match is possible, the graph is assumed to represent a yet unknown assembly and is inserted into the database of models. This, in fact, realizes a learning vision system that autonomously extends its knowledge base.

By now, there is no mechanism to automatically assign symbolic information to the mating feature graphs contained in a knowledge base. If a modeled assembly depicts a real world object like a tailplane-fin or a locomotive, a corresponding label must be manually annotated to the graphs. But as the SFB 360 is dedicated to cooperative multi-modal man-machine dialog and interaction, solutions to this problem are foreshad-owing. Just recently, there has been considerable progress in integrated speech and vision processing [133] as well as in system architecture for intelligent man-machine interac-tion [7]. Thus, concluding from the present achievements we presume that it will soon be possible to automatically ground and learn the meaning of a mating feature graph during a sequence of cooperative assembly tasks.

(a)

(b)

(c)

Figure 4.8.: An example of two geometrically different assemblies of the same topology.

The assemblies in Figs. 4.8(a) and 4.8(b) obviously consist of exactly the same parts and show the same attachments. Thus, both correspond to the mating feature graph shown in Fig. 4.8(c) and assembly recognition based on matching feature graphs will not be able to distinguish them. A distinction would require knowledge of three-dimensional assembly structures which is not available from the techniques we discussed so far. More-over, geometric information like this will generally not be obtainable from a single image but will at least demand stereo vision.

If a scene is observed by a pair of cameras whose optical parameters and mutual location and orientation are known or may be estimated, the spatial position of enti-ties visible in both image signals can be determined (cf. [34]). Mathematically, depth estimation is not too demanding a task. Calculating the spatial position of an entity is not difficult, the problem is to provide the data the calculation relies on. In its most simple form, this data is given as a set of pairs of image coordinates (x,x⁰) where x is the coordinate of a pixel in the one and x⁰ is the coordinate of a pixel in the other image and both depict the same aspect of the scene. Thus, if the content of two images of a scene should be reconstructed three-dimensionally, the signals must be searched for corresponding features. And this has proven to be a challenging problem.

Stereo Correspondences from Graph Matching

Usually, correspondences are searched for in the raw or slightly preprocessed image data and a large variety of methods to do so is known (cf. [34, 120]). Our situation,

however, is different for we are dealing with symbolic information. The data we use for assembly recognition are labeled image regions depicting elementary objects of the baufix^rconstruction-kit. As those regions are characterized by a single color and a de-scription of their boundaries instead of a collection of individually colored pixels, no backward inference to the original signal is possible and the usual methods to find cor-respondences cannot be applied.

But in Chapter 3.3.1 we explained how to compute image coordinates of mating features of objects comprised in an assembly. In this chapter, we introduced the idea to model assemblies by means of graphs representing relations among mating features and to match those graphs to accomplish recognition. Thus, there is an obvious approach to finding pairs of corresponding image coordinates from symbolic information. We just have to extend the vertex labels of mating feature graphs such that they not only char-acterize a feature’s type and the type and color of the object it belongs to but also its image coordinates. Instead of the vertex labeling function µ presented on page 78 we henceforth will consider a function µ⁰ with

µ⁰ :L_V →T ×O×C× ².

This extension of course requires to integrate feature coordinates into the high-level assembly plans where mating feature graphs are derived from. And we should verify if this will violate the relation to semantics theory we found earlier. Fortunately, the former is not a problem and the latter is not the case. Feature coordinates are available from image analysis by means of an Ernest network and can be introduced into the plan generating grammar such that it remains context free. We simply introduce a new variable and production

coord= “(”num“,”num“)”

and replace productions like for instance

holes= ( “()”| “(BOLT”num“)” ){holes}

by productions that additionally derive the new variable:

holes= ( “()”| “(BOLT”num“)” )coord{holes}.

As the necessary modification of the semantics function is straightforward, we will not discuss it further.

Thus, given a graph representation where vertices also describe the image coordinate of a feature, graphs derived from stereo images as shown in Fig. 4.9 can be matched in order to obtain pairs of corresponding coordinates. However, we must regard that corresponding features are from different images so their image coordinates will hardly

Figure 4.9.: Stereo images of an assembly with calculated positions of mating features cast into them.

180

BOLT, HEAD

BOLT, HEAD BAR,

HOLE

BAR, HOLE

CUBE, CENTER CUBE, HOLE

CUBE, HOLE

BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE

BOLT, HEAD

BOLT, HEAD BAR,

HOLE

BAR, HOLE

CUBE, CENTER CUBE, HOLE

CUBE, HOLE

BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE

(a)

180

180 180

180

BOLT, HEAD

BOLT, HEAD BAR,

HOLE

BAR, HOLE

CUBE, CENTER CUBE, HOLE

CUBE, HOLE

BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE

BOLT, HEAD BOLT,

HEAD

BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE BAR, HOLE CUBE, CENTER

CUBE, HOLE CUBE, HOLE BAR, HOLE

BAR, HOLE

(b)

Figure 4.10.: Two possible isomorphisms between the mating feature graphs derived from the images in Fig. 4.9. In both cases the subgraph distance vanishes.

match. If not cared for, this will cause the GUB to perform numerous label substitu-tions while searching for a subgraph isomorphism. Remembering that the costs of label substitutions are are given by

C_S=

i=1

w_id(l₁_i, l₂_i)

we can filter out coordinates from the computation of isomorphisms by just setting their weights to 0.0.

Finding the Right Match

Another issue to care about is visualized in Fig. 4.10: if an assembly is axially symmetric in its parts and attachments, it will have a symmetric mating feature graph Γ. As a consequence, there will be several automorphisms, i.e. isomorphisms from Γ to itself which do not require any edit operations. Thus, they do not raise costs and the GUB will not be able to rank them.

From a topological point of view the assembly shown in Fig. 4.9 is axially symmetric and its mating feature graph has two automorphisms which are shown in Fig. 4.10. If we were dealing with assembly recognition as presented in the previous section, this would not matter because detecting any match of low subgraph distance would accomplish the task. Here, however, we are interested in 3D reconstruction based on corresponding image coordinates. And since the automorphism in Fig. 4.10(b) will produce pairs of incongruent coordinates, only the one in Fig. 4.10(a) is acceptable.

Of course it would be desirable if there was a mechanism to automatically decide which out of several sets of matched coordinates should be used for depth estimation.

And indeed, assuming two pinhole cameras in a conventional stereo geometry where the optical axes are parallel and on the same height and the focal length is identical, a simple heuristic can support the decision. Given two corresponding pixels x_l = (x_l, y_l) and x_r= (x_r, y_r) in the left and right image, respectively, their disparityis defined as

d=x_r−x_l.

Withf denoting the focal length of the cameras and bthe horizontal distance between their optical centers, the depth z_w of the real world point x_w depicted by x_l and x_r results from the identity:

d= f

z_wb ⇔ z_w = f db.

Thus, if two points x_w and x⁰_w have approximately the same depth z_w ≈ z_w⁰ the corresponding disparities will be approximately equal, too. This justifies the following approach. As the spatial extension of baufix^rassemblies is rather small, all their mating features will have approximately the same distance to a pair of stereo cameras. More-over, as indicated in Fig. 4.10(a), stereo imagery basically means a constant horizontal translation of coordinates. Therefore, given a set ofN corresponding feature coordinates (x_l

i,x_r

i) the disparities or more general the distances d_i =kx_l

i−x_r

should be more or less the same. From Fig. 4.10(b), in contrast, we can deduce that pairs of incongruent coordinates will accord with a reflection at a center of symmetry.

Therefore, the distance between two wrongly associated pixels will depend on their distance to the center. Hence, if we compute the mean distance µ= _N¹ ^P^N_i=1d_i among a set of corresponding coordinates, the standard derivation

σ= v u u t

1 N

i=1

(d_i−µ)²

should be small for the acceptable automorphism of a mating feature graph and higher for an unacceptable one.

Figure 4.11.: Virtual reconstruction of the assembly in Fig. 4.9.

Of course, there are no pinhole cameras in our experiments. Yet practical experience has proven the suitability of this heuristic. For the automorphism in Fig. 4.10(a), for instance, a standard derivation σ = 16.75 was computed while the one in Fig. 4.10(b) yielded σ= 159.56.

Reconstruction Accuracy

It must be mentioned that the pixel coordinates of mating features used for 3D re-construction are of minor accuracy. As explained in Chapter 3.3.1, they are coarsely estimated during syntactic analysis. In Fig. 4.9, for instance, we can see that most of the calculated positions of the bar holes not coincide with the center of the holes. Yet their degree of accuracy is sufficient to determine details of object attachments. But it certainly will never yield precise spatial positions of mating features. The coordinates resulting from depth estimations must rather be considered as rough estimates of the actual spatial locations of mating features.

Nevertheless, the rough estimates are sufficiently accurate to create virtual prototypes of visually analyzed assemblies. Figure. 4.11 displays the 3D model that was derived from the images in Fig. 4.9. It was created using a CAD tool for virtual construction which was developed within the SFB 360 [64].

Im Dokument A Structural Framework for Assembly Modeling and Recognition (Seite 107-112)