Association of visual modalities - The visual primitives

2.2 The visual primitives

2.2.4 Association of visual modalities

Because the interest points are extracted on edges, the symbolic descriptor is designed to describe a local edge in the image. The local phase and orientation of the edge is provided by the monogenic signal, and

4Note that for the quality of the process it is important that all positions are computed with sub–pixel accuracy already at this stage.

5Note that the criterion ‘local maxima’ that is applicable for id2 structures can not be applied since edge like structures form a ridge in the local amplitude surface (see Fig. 2.5).

Figure 2.6: Extraction of redundant primitives due to the slant in the amplitude surface. (a) two interest points are correctly extracted; (b) because of the mild decay of the amplitude curve, the edge provokes the extraction of a distant, erroneous interest point (the amplitude of the response at this point is still above a given thresholdt).

interpolated at the interest point’s sub–pixel location. The colour is sampled on both sides of the line, as explained in section 2.2.4. The local optical flow is also sampled at this location — see section 2.2.4.

The resulting symbolic descriptor, called a2D–primitivein this thesis, is described in section 2.2.4.

Colour

In order to represent accurately the colour structure of the edge, the colour information held by a 2D–

primitive is made of several components. Also, we have seen that, depending on the phase, the 2D–

primitive may express a step–edge or a line–like structure. Consequently, the colour information is defined relatively to the phase: if ^π₄ ≤ |ω| < ³₄^π (indicating an edge between two surfaces) the colour information is sampled on the left and right sides of the central line (c=(cl,cr).). Otherwise, the phase indicates a line and the colour is sampled not only on the left and right sides, but also in the middle encoding the colour on the line itself (c=(cl,cm,cr)).

The RGB colour space has the advantage of being readily available from most image format. Yet, it involves a non–intuitive representation of the colour space: a fully saturated red will have for coordinates R=(1,0,0) but a fully saturated yellowY =(1,1,0). For this reason, we will use the HSI colour space

CHAPTER 2. EXTRACTION OF THE PRIMITIVES 36

for encoding the colour modality, and computing distances in this colour space.

Theoretically, the HSI colour space fails to be perceptually uniform (see (Sangwine and Horne, 1998)), unlike more sophisticated spaces like Munsell (also called HVC). On the other hand, the conver-sion from RGB to Munsell is non trivial, requiring the use of either correspondence tables (hence loosing accuracy) or heavy conversion operations. For this reason we will content ourselves with the HSI colour space in this work (note also that the performance of the colour modality in the algorithms described is always very good).

Optical flow

The projection of the 3D motion of the scene onto image pixels is called the Motion Field. From a sequence of images it is possible to estimate the apparent motion of brightness patterns in the image.

This is called theOptical Flow. There is a fundamental difference between the two. For example, a sphere with a smooth surface rotating around its own axis under constant illumination would have a motion field describing this rotation, yet no apparent motion would be described by the optical flow (see (Horn, 1986)). It is generally agreed that the optical flow is the best approximation of the motion field that is in general attainable from the raw image data.

Kalkan et al. (2005) compared the performance of optic flow algorithms depending on the intrinsic dimensionality, i.e., the effect of the aperture problem and the quality on low contrast structures. It appears that different optic flow algorithms are optimal in different contexts. In our system, we primarily use the algorithm proposed by Nagel and Enkelmann (1986), because it gives stable estimates of the normal flow at id1 structures.

In the following we will write the local optic flow vector f =(f_u,f_v)^>.

The primitive descriptor

At each interest point a primitive is extracted, containing the aforementioned multi–modal description of the surrounding image patch.

This primitive is fully described by the vector:

π=(x, θ, ω,c,f, λ)^T (2.4)

(a) An edge Primitive (b) A line Primitive

Figure 2.7: Illustration of the symbolic representation of a Primitive for a i1D interpretation, for a) a bright-to-dark step–edge (phase ω , 0) and b) a bright line on dark background (phaseω , ^π₂. 1) represents the orientation of the Primitive, 2) the phase, 3) the colour and 4) the optic flow.

wherexcontains the sub–pixel localisation of the feature,θ, the orientation (in the range [0, π[),ω, the phase (in the range [−π,+π[), c, the colour (as defined above), f, the optic flow, andλ, the size of the image area the feature describes (therefore we setλ=d_k).

Those local image descriptors are hereafter called2D–primitives. The set of 2D–primitives extracted from an image is calledImage RepresentationIThe result of this processing can be seen in Fig. 2.8. For a detailed description of the 2D–primitive extraction process we refer to (Kr¨uger et al., 2007). Fig. 2.8 shows the primitives extracted, with an origin varianceµ > 0.3 and a line variance ν < 0.3 for the three scales considered in this work: namely for peak frequencies of 0.110 (Fig. 2.8b), 0.055 (Fig. 2.8c), and 0.027 (Fig. 2.8d). Different scales highlight different structures in the scene. Furthermore, lower peak frequency (i.e., coarse scale) removes image noise and generates less spurious primitives, whereas smaller image structures become neglected — see (Lindeberg, 1998a; Elder and Zucker, 1998) for a discussion of the effect of scale in edge detection.

Orientation ambiguity and primitive switching

We explained earlier that the monogenic signal computation provides us with an estimation of the local orientation. Assuming that a contour is present at this location, this orientation value is an estimate of the local tangent to this contour. Hence, a 2D–primitive’s orientationθis bound within the interval [0, π[.

For the phase and colour modalities to be defined unambiguously, both sides of the contour need to be identified. For this reason we arbitrarily assign a direction vector to this orientation. The direction vector

CHAPTER 2. EXTRACTION OF THE PRIMITIVES 38

(a) original image (b) peak frequency 0.110

Figure 2.8: (a) one image of an object. (b,c,d): id1 primitives extracted, with origin varianceµ > 0.3 and line varianceν <0.3, for peak frequencies of: (b) 0.110, (c) 0.055, and (d) 0.027.

Figure 2.9: Illustration of the orientation ambiguity when interpreting 2D–primitives. Because 2D–

primitives describe local edges, only their orientation is well defined: the actual direction is meaningless.

Hence we need to choose an orientation convention, shown in (a), whereθis bound within [0, π[, where 0 encodes a vertical edge and ^π₂ an horizontal one. (b) and (c) show two different, yet equivalent, de-scriptions of the same edge. According to our convention, only b) is valid, ensuring the uniqueness of an edge’s encoding.

tof a 2D–primitive is defined directly from its orientationθas the following vector:





 sin(θ)

−cos(θ)







(2.5)

thus we can identify each side of the contour as ‘left’ and ‘right’ areas relatively to this vector. As illustrated in Fig. 2.9, one image patch can have two primitive interpretations:

1. a direction ofθwith the dark colour on the right side, called thea prioriinterpretationπ— see Fig. 2.9(b).

2. a direction ofθ+πwhere the dark colour is the left side, the alternative interpretation ¯π— see Fig. 2.9(c).

Note thata prioriorientation is indeed within [0, π[ — see Fig. 2.9(a) — whereas the alternative inter-pretation’s is within [π,2π[.

CHAPTER 2. EXTRACTION OF THE PRIMITIVES 40

Because the phase and colour properties are defined relatively to this assumed direction, their values also differ depending on the interpretation:



The other properties of the primitives remain the same for the two possible interpretations of the orienta-tion. At the time of the primitive extraction, thea prioriinterpretation is assumed. Later processing may require the use of the alternative interpretation: we call the operation creating ¯πfromπtheswitchingof the primitive:

S:π−→π¯ (2.7)

This operation is required to solve the ambiguity intrinsic to the orientation definition, during grouping, stereo and reconstruction — see sections 3.2.2 and 4.2.1.

Im Dokument Early Cognitive Vision: Feedback Mechanisms for the Disambiguation of Early Visual Representation (Seite 34-40)