Principle - Generalized Hough Transform - Modified Generalized Hough Transform (MGHT)

4.2 Modified Generalized Hough Transform (MGHT)

4.2.1 Generalized Hough Transform

4.2.1.1 Principle

The conventional Hough transform (HT) is a standard method to efficiently detect analytical curves (e.g., lines, circles, ellipses) in images. Although the GHT is built on the idea of the HT, they both are independent methods.

Therefore, in this dissertation only the idea of the HT and its relations to the GHT are introduced. For further details, the interested reader should refer to (Hough 1962) or to standard text books like (Ballard and Brown 1982, J¨ahne 2002).

To introduce the HT, the problem of detecting straight lines in images is considered. A straight line can be described by the points(x, y)^>that fulfill

r=x·cosϕ+y·sinϕ . (4.13)

Thus, the line is represented by its distance to the originrand its orientationϕ(see Figure 4.1(a)). Figure 4.1(b) shows an image after edge segmentation in which lines should be detected. Each edge pixel in the (x, y)image

rϕ y

x (a) Line parameterization

(x ,y) y

1 1

(x ,y2 2)

(b) Segmented image r

-90° 0° 90° 180°ϕ

p₁

2 2

r=xcosϕ+ysinϕ

r=xcosϕ +ysinϕ

-90° 0° 90° 180°ϕ

(d) Accumulator array Figure 4.1: Example that illustrates the detection of straight lines using the HT. Lines are parameterized using a polar coordinate representation (a). The edge pixels of the segmented image (b) are transformed into the Hough space (c). The corresponding cells in an accumulator array are incremented and the maximum is extracted (d).

space describes a sinusoidal curve in the (r, ϕ) parameter space, which is also referred to as Hough space in the literature. This can be seen from (4.13) by treating x and y as fixed and letting r and ϕvary. Thus, the corresponding sinusoidal curve in the parameter space represents all lines in image space that meet in the same image point(x, y)^>. In Figure 4.1(c) the sinusoidal curves of the two example edge pixels shown in Figure 4.1(b) are displayed. All image points on the same line intersect at the same point in the parameter space. This relation between image space and parameter space is exploited in the HT. For this, the parameter space is divided into rectangular cells and represented by an accumulator array. In the first stage, each edge pixel is transformed into the Hough space and the corresponding cells are incremented. The second stage is an exhaustive search for maxima in the accumulator array. The maxima represent the parameters of the straight lines in the image. Figure 4.1(d) shows the resulting accumulator array, where higher values of the cells are visualized by brighter gray values.

The maximum represents the found straight line. The extension of this technique to detect curves other than straight lines is straightforward. By using the gradient direction as additional information fewer cells need to be incremented, which results in faster computations (Ballard and Brown 1982). For example, when detecting straight lines, points in image space can be transformed to points in parameter space. The main advantages of the HT are that it is relatively unaffected by gaps in the curves and by noise (Ballard and Brown 1982).

In contrast to the HT, the GHT is not restricted to analytical curves. The parameters that describe an analytical curve in the HT correspond to the pose parameters of the object in the GHT. In the offline phase of the GHT, the R-table is built from a model image that shows the object to be recognized. Thus, in the special case of the GHT the R-table represents the model that will be later used to recognize the object in the search image during the online phase. In the following, a simple example will illustrate the principle of the GHT. In the example, the transformation class is restricted to translations to simplify the explanations. In Figure 4.2(a) a model image is given, in which the object is defined by a ROI, which in this example is the inner part of the black rectangle. In a preprocessing step of the offline phase, edge filtering is performed on the model image, resulting in the edge magnitude and gradient direction. By applying a threshold on the edge magnitude, the model edges (pixels or points) are obtained, which are shown in Figure 4.2(b). Additionally, the gradient direction is visualized by using different gray values. This is all the information that is needed to compute the R-table.

(a) Model image and ROI (b) Model edges (c) Search image (d) Search edges

Figure 4.2: In the offline phase, based on a model image (a) edges are extracted and their gradient directions are com-puted (b). For visualization purposes the gradient direction is encoded with different gray values. In the online phase, also for the search image (c) edges and the corresponding gradient directions are computed (d).

p y

x o

ri im

θ_i^m

(a) Model edges

k Θ_k r_i

0 [ 0, ∆θ[ {r_i |θ_i^m∈Θ0} 1 [ ∆θ,2∆θ[ {ri |θ_i^m∈Θ1} 2 [2∆θ,3∆θ[ {r_i |θ_i^m∈Θ2}

... ... ...

(b) R-table

Figure 4.3: Information about the model edges (a) is used to build theR-table (b).

x r¹ θ₁^m

θ3m

θ₄^m

θ2m

p₂^m p₃^m

p₄^m p₁^m o^m

(a) Four selected example model edge points

k Θ_k r_i

0 [ 0^◦, 60^◦[

1 [ 60^◦, 120^◦[ r2,r3

2 [120^◦, 180^◦[ 3 [180^◦, 240^◦[ r4

4 [240^◦, 300^◦[ r1

5 [300^◦, 360^◦[

(b) Example R-table with∆θ=60^◦

Figure 4.4: Example that illustrates the offline phase of the GHT. The gradient directions at the four selected example model edge points areθ^m₁ = 280^◦,θ^m₂ = 65^◦,θ^m₃ = 65^◦,θ^m₄ = 200^◦. The associated displacement vectorsriare recorded in the appropriate row of theR-table according to their gradient direction.

The principle is illustrated in Figure 4.3. At first, an arbitrary reference point o^m = (o^m_x, o^m_y )^> is chosen. In general,o^mis selected to be the centroid of all model edge points, i.e.,o^m =1/n^m^Pp^m_i . For each model edge point the displacement vector

r_i =o^m−p^m_i , ∀i= 1, . . . , n^m (4.14) is calculated. The displacement vectors are then stored in the R-table as a function ofθ^m_i , whereθ^m_i denotes the gradient direction at the model edge pointp^m_i . For this purpose, the range of possible gradient directions must be quantized using quantization intervals of size∆θ. In general, the gradients occur in arbitrary direction, and hence the range of possible gradient directions corresponds to the interval of[0^◦,360^◦[. Each rowkof the R-table is then assigned one quantization interval Θk,k= 0, . . . , n^θ−1, leading to an overall number ofn^θ = 2π

∆θ rows.

Finally, the displacement vectorr_iof the model edge pointp^m_i is recorded in the row of the R-table that contains the associated gradient directionθ_i^m.

The example R-table in Figure 4.4 is built by using four selected model edge points. Here, the quantization interval for the gradient directions was set to∆θ =60^◦. Since the gradient directions at the two pointsp^m₂ and p^m₃ are identical, both associated displacement vectors are recorded in the same row within the R-table.

For the online phase a two dimensional accumulator arrayAis set up over the domain of translations, representing the sampled parameter space or Hough space ofT. Figure 4.5 shows the principle of an accumulator array for the example search image of Figure 4.2. This is similar to the HT, however, each cell of this array now corresponds to a certain range of positions of the reference point o^s = (o^s_x, o^s_y)^> in the search image. In general, the size

of the cells is adapted to the pixel grid, i.e., each cell represents one pixel. The accumulator array is initialized by setting the values of all cells to 0. For each edge pixel p^s_j in the search image the rowkin the R-table that corresponds to the gradient directionθ^s_j is selected. Each displacement vector that is recorded within the selected row represents the position of one reference point candidate o˘ = (˘o_x,o˘_y)^> relative to p^s_j in the search image.

Formally, the displacement vectors are added top^s_jin order to obtain the reference point candidates:

o˘i,j =p^s_j +ri, ∀j= 1, . . . , n^s, ∀{i|θ^m_i ∈Θ_k}, k|(θ^s_j ∈Θ_k) . (4.15) Finally, each cell in the accumulator array Athat is hit by one reference point candidate receives a vote, i.e., its value is incremented by one. After the voting process, each cell in Ahas a value that describes the likelihood that the reference point is located in this cell. Thus, local maxima in Athat exceed a certain threshold represent found object instances in the search image. Figure 4.5 shows the principle of the online phase by means of seven selected search edge points. In Figure 4.5(a) the cells of the accumulator array are overlaid on the search image of Figure 4.2. For illustration purposes, in this case one cell covers several pixels. For example, the edge direction θ₁^sat point p^s₁ is 65^◦. In the R-table of Figure 4.4(b) the respective gradient interval isΘ1 = [60^◦,120^◦[, and hence k = 1. The two displacement vectors r2 and r3 that are recorded in row k = 1 are added to the point position p^s₁and the two obtained reference point candidates are used to increment the two corresponding cells.

Figure 4.5(b) shows the final accumulator array after the voting process, where the number of votes are entered in each cell. The cell with maximum number of votes represents the position of the reference point, and hence the found object instance in the search image.

c r

x p₃^s

p₂^s

p₄^s θ2s

θ3s θ4s

p₆^s

p₇^s p₅^s θ6s

θ5s

θ7s

r2 r3

θ1s

p₁^s

r2 r3

(a) Accumulator array and voting process

4 1 1

1 1 1

1

(b) Final votes (empty cells have zero votes)

Figure 4.5: Example that illustrates the online phase of the GHT. The gradient directions at seven selected example search edge points areθ^s₁= 65^◦,θ₂^s= 65^◦,θ^s₃= 200^◦,θ^s₄= 280^◦,θ₅^s= 65^◦,θ^s₆= 200^◦, andθ^s₇= 280^◦(a). According to the gradient direction the displacement vectors in the respective row of theR-table are added to the point positions, and the corresponding cells receive a vote (b).

Im Dokument Hierarchical Real-Time Recognition of Compound Objects in Images (Seite 49-52)