Feature Prediction - Tracking in Unknown Scenes 89

7. Tracking in Unknown Scenes 89

7.4. Feature Prediction

iterative image alignment method as the one presented in Section 6.5.

In order to compute H from the image homography H_I, the scale factor λ has to be determined. The image homography H_I can be transformed into the camera coordinate system by

H_L =K⁻¹H_IK. (7.16)

If the relation between HL and the motion parameters of the camera (R,t) and the struc-ture parameters (n,d) is written as

H_L =λH =λ(∆R+ ∆tn^T

d ), (7.17)

then the scale factorλ can be computed by

|λ|=σ₂(H_L), (7.18)

where σ₂(H_L) is the second largest singular value of H_L. The proof can be found in [67], p.135.

By assuming that the normal vectorn˜_c0 =dn_c0 is non-unit, the valuedof equation (7.17) can be neglected, and the normal vector must satisfy the following equation

λH_L−∆R) = ∆t˜n^T_c0. (7.19)

Transposing and multiplying this equation with ∆t results in the least squares solution for n˜_c0:

n_c0 = 1

∆t^T∆t(1

λH_L−∆R)^T∆t. (7.20)

Then the unit normal n_c0 and the scale factor d can be determined by normalizing n˜_c0. Since this equation computes the normal vector nc0 in the coordinate system of the rst camera, it must be transformed into the world coordinate system by

n_w =R⁻¹₀ n_c0. (7.21)

Now the vector n_w can be regarded as a measurement of the surface normal, which will be used for a robust estimation of the surface orientation.

7.4. Feature Prediction

If an image feature is occluded or cannot be tracked because of reections, these feature points needs to be predicted so that the local search of the template-based tracking can be successful. With no proper prediction the starting point for the iterative image alignment

7. Tracking in Unknown Scenes

would be further away from the solution and the chances for convergence are getting lower with every frame, at which the tracking of this feature failed. Moreover, feature points which move out of the camera image and are not visible in the current frame any more should not be discarded. If the camera moves back and an already reconstructed feature point gets visible again, this points needs to be predicted so that the feature tracking step can be carried out correctly.

The most straightforward method for feature prediction is to use the camera pose of the last frame, assuming that the pose estimation was successful. A more sophisticated approach is to use a Kalman lter with a kinematic motion model. With such a lter the current camera pose can be extrapolated for a proximate camera frame, which results in a more precise prediction. If other input devices like an inertial sensor are integrated into the tracking system, they can also be used to provide a more accurate prediction of the camera pose.

7.4.1. Image Position Prediction

If the approximate camera pose is known, it can be used to predict all feature points, where the tracking step failed in the last frame. Predicting the position of a template is fairly simple. If M is the reconstructed 3D point, the image position of this feature can be estimated by projecting the point M into the image with the current camera parameters. With the intrinsic camera matrixK the homogeneous image positionm˜⁰ can be computed with

m⁰ =K(RM +t), (7.22)

where the camera rotation and translation used for the prediction is given byR and t.

7.4.2. Warp Prediction

If a warp shall be predicted in the image, the 3D geometry of the template needs to be available. A simple approximated method for predicting an ane warp is to use the unit vectors of the ane transformation. When a feature is detected for the rst time, it is assumed that the initial captured patch is observed from an orthogonal direction of the patch plane, and the 2D unit vectors of the ane transformation are un-projected with the 3D position of the regarded feature point. This yields two vectors which approximately describe the orientation of the template in 3D space. If oine models are used, e.g. a reference image of a poster, these 3D unit vectors of the ane transformation can be determined exactly.

For a prediction of an ane warp with a given camera pose, the 3D unit vectors can also simply be projected into the image, and the ane warp can be derived from the projected unit vector.

For a more accurate tracking of templates in camera images with a wide eld of view, the warp function can be modeled with a homography. A homography correctly models the transformation of a plane for a perspective camera projection. In equation (7.12) it

7.4. Feature Prediction can be seen that the surface normal of a plane is related to the homography mapping of a template between two images. If a surface normal of a template is reconstructed or given from a 3D model, this normal vector nw can be used to predict the homography.

Together with the 3D position M_w and the relative camera rotation ∆R and relative camera translation ∆t, the homography of an image patch can be calculated.

Since the equation

d=n^T_c0M_c0 =n^T_c0(R₀X_p+t₀) (7.23)

must be satised for any point X_p on the plane P, the predicted homography H˜ can be calculated by

H˜ = ∆R+ ∆tn^T_c0

n^T_c0(R0Mw+t0). (7.24)

Since we are interested in a prediction of a warp function in the image coordinate system, the homography H˜ has to be projected into image space. For this homography, which transforms a point in the image of the rst camera to the second camera, we get

H˜_I ∼KHK˜ ⁻¹. (7.25)

The homography H˜_I describes the transformation from a template in the initial camera image to the current camera image.

If a feature is initialized at the 2D image position p = (p_x, p_y)^T the homography which represents the current warp of the template is initialized with

H0 =





1 0 p_x 0 1 p_y 0 0 1



. (7.26)

To obtain the homography in the image of the current camera pose, H˜I has to be trans-formed by the initial homography H₀. Finally the prediction of the homography in the current camera image can be computed by

H^∗ = ˜H_IH₀. (7.27)

The undened scale factor ofH^∗ can be eliminated by normalization with the last element ofH^∗. With the warp parameters ofH^∗the iterative template alignment with a projective warp function is initialized.

7.4.3. Prediction of the Illumination Parameters

Since the illumination can change signicantly if the feature has not been observed for a long time, the illumination parameters also need to be predicted. This is very important for the convergence behavior, because if the illumination parameters are too far away from the solution, the whole iteration process is likely to diverge.

7. Tracking in Unknown Scenes

If µ_t and µ₀ are the mean intensity values of the current and the initial patch, and if σ_t and σ₀ are their standard deviations respectively, then the contrast λ and brightness δ can be predicted by

λ=σ_t

σ₀ (7.28)

δ =µ_t− σ_t

σ₀µ₀. (7.29)

The predicted illumination parameters λ and δ describe the illumination correction of a patch, which is extracted out of the current image, thatT(x) =λI(g(x;p)) +δ holds.

Im Dokument Efficient Line and Patch Feature Characterization and Management for Real-time Camera Tracking (Seite 103-106)