Template-Based Tracking - Detection and Tracking of Point Features 61

6. Detection and Tracking of Point Features 61

6.5. Template-Based Tracking

Shi and Tomasi [100] extend the method of Lucas and Kanade for ane image trans-formations. A quadratic patch is extracted around a feature point when it is observed rst and then it is used as a reference template. They also address the problem of the detection of feature points which can be tracked stably under the ane transformation model. In regions where image structure exists in only one direction the full optical ow cannot be estimated. This problem is known as the aperture problem. Shi and Tomasi argue that image structure has to be present in all image directions and therefore use an interest point detector as already described in Section 6.2, where the smallest eigen-value of the structure matrix has to be signicantly large. This tracking method where a template patch is tracked under an ane transformation model is also denoted as the Kanade-Lucas-Tomasi (KLT) tracker in the literature.

A more general algorithm for tracking a template in a gray value image is presented by Hager and Belhumeur [35]. A geometric warping function g(x;p) is used to represent more general warps than simple translations. They also switch the role of the template and the image for eciency reasons and call this method the inverse additive approach.

The eciency benet results from the shift of computation from the tracking step to the pre-processing step. If I(x,0) = T(x) is the reference template and g(x;p) the warp function of an image point xwith the parameter vector p, the error to be minimized for the forward additive approach is written as

[I(g(x;p)−T(x)]². (6.6)

To optimize this expression, the parameter increment ∆p is iteratively estimated by minimizing

[I(g(x;p+∆p)−T(x)]², (6.7)

6.5. Template-Based Tracking where after every iteration the parameter vector is updated byp←p+∆p. The algorithm stops if∆pis insignicantly small or a maximum number of iterations has been reached.

The inverse additive algorithm simply switches the role of the image I and the template T.

A compositional approach of Shum and Szeliski [101] iteratively estimates an incremental warpg(x,∆p) by minimizing

[I(g(g(x;∆p);p))−T(x)]², (6.8)

with respect to∆p. The warp function is updated with g(x,p)←g(x,p)◦g(x,∆p). Baker and Matthews [5] introduce another method denoted as the inverse compositional approach for tracking an image template. They approximately minimize

[T(g(x;∆p))−I(g(x;p))]², (6.9)

with respect to ∆p. After every iteration the update of the warp function is computed byg(x,p)←g(x,p)◦g(x,∆p)⁻¹.

Baker and Matthews show that this inverse compositional method is capable of handling a more general set of warps, especially homographies, whereas the inverse additive approach of Hager and Belhumeur can only be used with ane warps. The inverse compositional approach is the most general and ecient method and is therefore most commonly used nowadays.

To solve the minimization problem of equation (6.9) a rst order Taylor expansion is performed and results in

[T(g(x;0)) +∇T∂g

∂p∆p−I(g(x;p))]². (6.10)

Solving for ∆p yields

∆p=H⁻¹X

∇T ∂g

∂p T

[I(g(x;p)−T(x)], (6.11)

where the Hessian matrix H is computed by H =X

∇T∂g

∂p T

∇T∂g

∂p

. (6.12)

The Jacobian ^∂g_∂p is evaluated at (x,0). The eciency of the inverse compositional algo-rithm comes from the fact that the Hessian matrix H does not depend on the parameter vector p. Therefore it is constant during the iterations and the matrix H⁻¹ can be pre-computed. However, if the intensity values of the template T or the size of the area are changed during the minimization, the Hessian must be re-computed. The template must

6. Detection and Tracking of Point Features

be therefore always fully located inside the image and cannot be changed so that eciency benets of the algorithm pay o.

More detailed deviation for dierent warp functions are given in Appendix A. Another very detailed explanation and comparison of these dierent template-based tracking methods can also be found in [6].

6.5.1. Illumination Compensation

If planar patch features are tracked with a long lifetime, severe illumination changes can occur. This happens because a patch is viewed from a dierent viewing direction and the lighting changes or simply because a camera controller is auto adjusting the shutter or the gain of the camera. Therefore in most real-life scenarios it is indispensable to regard changes of illumination.

Tommasini et al. [109] use a photometric normalization with a zero mean SSD residual computation to make the tracking more robust against lighting changes. They also add a robust rejection rule to detect tracking failures.

Another method for illumination compensation described by Hager and Belhumeur [35]

uses a set of illumination basis images, which are all captured under dierent illumination.

An additional parameter per basis image is used to describe the contribution of every basis image to the current image.

Zhu el al. [123] achieve lighting invariance by minimizing the dierences of normalized gradient images instead of intensity discrepancies.

A precise photometric model for a transformed template patch is presented by Jin et al. [49]. They extend the Shi-Tomasi tracker by two additional illumination parameters.

With the contrast compensation factor λ and the brightness correction δ they minimize the following error function with respect to∆p:

[λI(g(x;p+∆p) +δ−T(x)]² (6.13)

However, this forward additive formulation has a lack of eciency, since the Hessian matrix H needs to be recomputed in every frame.

Zinÿer et al. [124] combine the benets of the more ecient inverse compositional ap-proach with the additional illumination parameters presented by [49]. This leads to min-imize the error function which is given by

[˜λT(g(x;∆p)) + ˜δ−(λI(g(x;p)) +δ)]², (6.14) whereλ˜ and δ˜are the incremental changes of the illumination parameters. The values λ and δ are still those parameters that T(x) =λI(g(x;p)) +δ holds.

Again the rst-order Taylor expansion around the identity warp g(x,p) is used to ap-proximate this error function:

[˜λT(g(x;0)) + ˜δ+ ˜λ∇T∂g

∂p∆p−(λI(g(x;p)) +δ)]². (6.15)

6.5. Template-Based Tracking

This equation can be rewritten in matrix form by =X

[h(x)^Tq−(λI(g(x;p)) +δ)]² (6.16)

with

h(x) =

∇T(x)∂g

∂p T(x) 1 T

(6.17) q=

˜λ∆p^T λ˜ ˜δ T

(6.18) The size of the vectorsh(x) and q is2 elements larger than the number of parameters of the vector p.

Finally by solving the least squares problem of equation (6.16) the parameter update vector can be computed by

q= X

h(x)h(x)^T

!−1

h(x)λI(g(x;p)) +δ)

. (6.19)

Since the vector h(x) is independent from all parameters (p, λ, δ) and the current image I, the inverse of the matrix composed of the dyadic products of h(x)can be precomputed for every feature and then be re-used in every frame.

After every iteration the contrast parameterλis updated withλ ← ^λ_˜

λ and an update of the brightness parameter δ is computed by δ← ^δ−_˜^˜^δ

λ . An example of an ane transformation model as it is presented in [124] can be found in Appendix A.

6.5.2. Drift Prevention

Tracking only the translation of a feature point is very fast, since the image border handling is easy and the computation of the inverse of the2×2matrix H is simple. But as the estimation of the displacement vector can never be absolutely accurate, tracking only the translation of a point will produce feature drift. Therefore it is necessary to take the intensity values of the initial patch into account to track the feature and to monitor, whether the feature point kept its visual appearance. A template-based illumination invariant method as described in the previous section prevents the accumulation of drift, since the reference template is never changed. However, there are also some disadvantages, if only a reference template is used. For a successful alignment a complex model like the ane illumination invariant algorithm of Jin et at. [49] is necessary. Because of the high dimensionality of the parameter space such methods can easily run into convergence problems. If the initial parameter values are not close enough to the solution, the Newton-Raphson iterations do not converge, especially if brightness and contrast are estimated as well. It is also dicult to determine the borders of a warped patch, if it is not totally inside the image.

6. Detection and Tracking of Point Features

Zinÿer et al.[124] use a two-stage approach to solve this problem. Pure translation from frame to frame is estimated rst, then this translation vector and the ane and illumi-nation parameters of the previous frame are used to initialize the minimization of the discrepancy of the initial patch and the current frame. This method has a much higher rate of tracking successes since the feature position is already almost correct, when the reference template is aligned. In our implementation we achieved good results with 5 levels of a Gaussian pyramid for tracking the translation from frame to frame and only one pyramid level to track the initial patch with the ane illumination invariant method.

Im Dokument Efficient Line and Patch Feature Characterization and Management for Real-time Camera Tracking (Seite 76-80)