Combining Assembly and Action Detection - Spinoffs from Syntactic Assembly Detection

3. Syntactic Assembly Modeling 27

3.3. Spinoffs from Syntactic Assembly Detection

3.3.2. Combining Assembly and Action Detection

with a flag that indicates whether the corresponding bolt is inserted from above (a) or below (b). These flags are defined intrinsically: each instance of a bar has an attribute to register the temporal order in which the bar’s mating features were considered during cluster analysis. The bolt in the first of these holes is assumed to be above the bar.

Other bolts associated with that bar are above as well if they are on the same side of its principle axis. Those on the other side are said to be below. The order of the slots of bars and cubes is also defined intrinsically. Holes of a cube, for example, are listed in a clockwise manner: the hole that was considered first during analysis will always appear as the leftmost slot, the one intrinsically left to it is represented by the second slot, its vis-a-vis corresponds to the third slot, and the hole on its right appears last in the list.

Further details on this labeling scheme can be found in [13].

Plans like this characterize how the elementary assembly objects are connected to each other. Syntactic component structures augmented with information of mating fea-ture relations thus yield topologically unique descriptions of an assembly¹². High-level plans derived from vision therefore may be translated into appropriate natural language or manipulator instructions in subsequent steps of processing.

Pick X

Preconditions: hand: empty ∧X on table∧ X → disappears Effects: hand:X ∧ ¬(Xon table)

Connect X Y

Preconditions: hand₁: X (BOLT-PART) ∧

hand₂: Y (MISC-PART)∨(NUT-PART) Effects: hand₁: XY ∧ hand₂: empty

Place X

Preconditions: hand: X ∧X →appears Effects: hand: empty∧X on table

Figure 3.20.: A simple set of rules to infer assembly actions from the observation of appearing or disappearing objects in the assembly cell. The rules assume two handed assembly and the notation hand₁ and hand₂ points out that there are two distinguished hands but does not denote a specific hand or manipulator.

the constructor, mating operations cannot be observed directly. The approach to action detection therefore has to be indirect and simple. It reuses results from the object recog-nition module and infers assembly actions from symbolic information: analyzing if and which objects appeared in or disappeared from the scene enables to deduce what has been constructed. The major advantage of this indirect approach is its ability to detect operations independent of whether they are carried out by a human or a robot.

The basic assumption behind all inferences made by our algorithm is that complex objects are constructed from a sequence of two-handed assembly tasks. Two handed assembly characterizes a mode of construction where there are two hands or manipulators and each of which can handle one part at a time. According to this assumption, two hands are modeled which may hold at most a single object or a single (partial) assembly. Given these models, a collection of rules is used to hypothesize actions from changing contents of the scene. On this level of abstraction, rules to inferPick,Place, andConnectactions can be defined which are shown in Fig. 3.20.

Adopting the notation from planning literature [92], each rule itemizes a couple of conditions or events and an assembly operation that can be concluded therefrom. If the scene changes and the requirements of a certain rule are met, the corresponding action is hypothesized and the models are updated accordingly. While Pick and Place actions are directly observable,Connectoperations can only be deduced from temporal context:

if two objects have been taken, there is no hand left to take another one. Thus if another objects disappears the ones in the hands must have been connected (which of course requires that they provide appropriate mating features) so that a hand was free for

Figure 3.21.: Images depicting different stages of a construction process. Note that due to perspective occlusion the purple ring was not recognized in the right image. This caused the assembly detection procedure to yield an erroneous structure.

taking. A minor drawback of indirect inferences like this is that they require a certain level of cooperation from subjects acting in the scene. The following restrictions should be regarded during construction in order to avoid confusion:

• Each hand must only hold a single object or a single (partial) assembly.

• Objects must only be put down within the field of view of the system.

• New objects must only be introduced to the scene while no identical objects are in the hands of the constructor.

Generally, after a sequence of assembly tasks was detected, one of the hand models will contain a set of connected objects. But from Fig. 3.20 we see thatConnectoperations require one of the hands to hold aBOLT-PART. Observing a series of Pickand Connect operations thus means to observe sequential attachments of parts to a bolt. Therefore, the set of interconnected objects must be partially ordered. And since its structure was derived from symbolic information that usually does not suffer from perspective occlusion, it provides a reliable description of the assembly whose construction was observed. However, details of mating relations cannot be extracted from symbolically detectedPick and Connectevents.

As mentioned above, syntactic assembly detection yields connection details, but per-spective occlusion may cause structures to be incomplete. Fusing structures from action and assembly detection can cope with this problem and may provide better results than obtainable from the individual algorithms. In order to integrate both approaches we consider the partially ordered sets they yield. For example, after the assembly depicted on the right of Fig. 3.21 was put into the scene, the purple ring was not recognized and syntactic cluster analysis resulted in:

BOLT_b vcb CU BE and BOLT_o vco BARvco CU BE

whereas analyzing its construction process yielded:

BOLTovcoRIN Gvco BARvco {BOLTb vcb CU BE}.

where vcb and vco denote the ordering relations due to the blue and the orange bolt, respectively. Note that action detection yielded relations between elementary objects and a set of objects which represents the assembly depicted on the left of Fig. 3.21. This in fact is reasonable because the assembly was taken during the construction process and thus has to appear in one of the hands. It is obvious from this example that simple set comparisons will indicate whether tiny objects are missing in the result from visual assembly detection but are contained in the structure generated by action detection.

They thus can easily be inserted into the syntactic structure.

Im Dokument A Structural Framework for Assembly Modeling and Recognition (Seite 66-69)