Evaluation - Marker-Based Reconstruction and Alignment 67

4. Marker-Based Reconstruction and Alignment 67

4.6. Evaluation

In order to point out the effects of the constrained BA, we ran the marker SLAM in two different modes for the evaluations:

• Rigid transform mode(Sec.4.6.1): In this mode we reversed the order of execution. The bundle adjustment is carried out first without imposing constraints. The rigid registration is executed afterwards - every time that new results from the bundle adjustment are available, i.e. on every key-frame.

4We note, that the used adapters for line markers, see Fig.4.3, in fact only allow them to be attached to the real model in two possible ways (vertical placement is also possible). So, we could also assume the rotation to be fully known.

5Undamped Gauss-Newton (as opposed to Levenberg-Marquardt) is possible, because the gauge is fully defined if a rigid transformation could be successfully estimated from the given constraints. Thus, the approximated Hessian,J^TJ, has in general full rank and no regularization is necessary before inversion or Cholesky decomposition (see Sec.3.4).

• Constrained BA mode(Sec. 4.6.2): This is the default mode, where the constraints are internalized into the BA. Rigid registration and parameter projection must be done before initiating the constrained BA iterations for the first time, but only once during the entire SLAM; specifically at that point, when enough reference markers have been reconstructed. Once the feasibility of the constraints is attained, it will not be violated by the BA again later, because the parameters will evolve on their constraint manifolds.

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 -200

0 400 600 1000 800

200 1200

-400

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 1000

600 400 200 0 -200 1200

-400 800

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 -400 -200 0 200 400 600 800 1000 1200

-800

-600

-400

-200

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

-800

-600

-400

-200

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

-800

-600

-400

-200

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

(a) (b) (c)

Figure 4.4.: Three different evaluation runs in ’rigid transform mode’, where the edge markers are placed at vary-ing locations. The estimated camera pose is colored from cyan to red (rigid transform mode) and the ground-truth pose in brown. Camera frustums are visualized every 40th frame. The reconstructed marker positions are depicted for two time instants: at the beginning of the sequence (cyan) and at the end (red). The orange trapezoid represents the edges of the table used for evaluation. Reconstruc-tion and registraReconstruc-tion in the ’rigid transform mode’ gradually improves the more frames and marker measurements are available.

For our evaluations we mounted the camera⁶onto the tip of a FARO^R measurement arm. The measurement arm is capable of providing a highly accurate full pose of the tip (including translation and rotation) at high frequency rates. However, the raw output (EF_s→F_d) is in FARO^R’s local coordinates of the measurement device,F^sandF^d, whereF^sdenotes the static coordinate system (basement) andF^dis the coordinate system of the dynamically updated tip pose. So, in order to obtain ground-truth data for the camera pose in the coordinate system of the virtual modelV, we needed to perform two calibrations beforehand: ahand-eye calibrationbetween the camera and the measurement arm,EF_d→C, and ascene calibrationfrom measurement arm coordinates to

6We used an IDS UEye monochromatic camera with global shutter and a resultion of 1280×1024 pixels.

4.6. Evaluation

the coordinate system of the virtual model, EF_s→V.⁷ During runtime, the ground-truth pose is given by the concatenation of these relative transformations:EV→C=EF_d→C·EFs→F_d·E⁻_F_s¹_→V.

We recorded three different image sequences and ground-truth poses for evaluation, whereby we moved the camera in half-circles around the table. In each evaluation run we changed the positions of the edge markers.

In the first sequence, they were roughly centered between the edge endpoints, while in the second and third se-quence they were shifted in counterclockwise and clockwise direction, respectively. The plane markers were left unaltered, because we used them for further evaluations (see below). In Fig.4.4, for the ’rigid transform mode’, the ground truth and the estimated camera path are shown from the moment the markers were initially recon-structed and registered. Note, how the registration improves slowly but steadily as more marker measurements become available, and the more accurate the reconstruction by the unconstrained bundle adjustment becomes.

4.6.1. Results for Unconstrained Bundle Adjustment with Rigid Registration

We will first summarize our evaluations for therigid transform mode. The corresponding quantitative results to the three evaluations runs are shown in Fig.4.5. After each (unconstrained) BA the registration was computed as a similarity transformation including a global scale as free parameter using therendezvouz pointsand the lines and planes, i.e. we used the homogeneous version of Eq.2.10(Chap. 2p. 20). We visualized the results of all three rotation solvers of Chap.2, i.e. UPnP [KLS14], DLS/gDLS [HR11,SFHT14] and our own solver, but they yield almost identical results. It can be observed that after 200 frames (which approximately corresponds to a 90^◦turn) in all three sequences an error of less than 20mm (translation) and 1^◦(rotation) is achieved. The final error after 440 frames is around 10mm (translation) and 0.5^◦(rotation).

A large portion of the remaining error is due to the fact that the final pose of the camera for each image is computed with a maximum of 28 2D-3D-correspondences (four corners of the seven markers). So, even if the markers were perfectly reconstructed and registered, the detected marker corners in the image are still noisy and therefore prevent the pose estimates from becoming more accurate. Moreover, we could not find any means for perfectly synchronizing the camera images with the measurement arm.⁸Therefore, in Fig.4.6we also evaluated, how well the three plane markers are reconstructed and registered. We compared their estimated center points with the center points of the real markers measured with the measurement arm. The final reconstruction and registration error is well below 2mm in all three cases. Comparing the final coordinates of the reconstructed and registered marker centers (see Table4.1) - which excludes any remaining errors from the measurement arm calibration - we see that these values are reproduced with sub-millimeter accuracy.

We also considered the case of performing the registration with a fixed and known scale as described in Sec.

2.3.4(Chap. 2.3.4p. 22). This is possible, because we assume that the size of the real markers is known. This knowledge implicitly transfers to the scale to the whole marker ensemble reconstruction. Intuitively, this should produce more accurate results, since there is one parameter less for the estimation. However, depending on the used marker algorithm and parameters the opposite may be true. As shown in Fig.4.7, the registration with fixed scale results in a clear misalignment that persists until the end of each sequence.

7We performed both calibrations in two separate stages. Forhand-eye calibrationwe observed a checker board pattern with a 400 frames sequence from various viewpoints. Then we simultaneously optimizedEF_d→CandEF_s→B, so that the re-projection error in the image between the measured and transformed checker board corners,xC=EF_d→C·EF_s→F_d·E⁻¹_F

s→B·xB, became minimal. Bdenotes the coordinate system of the checker board. EF_s→B was only used as an intermediate slack variable and discarded after hand-eye calibration. Forscene calibrationwe measured the four corner points of the table with the measurement arm various times and averaged the results. Then we computed the relative transformationEF_s→Vbetween the averaged pointsxF_sandxV with the algorithm of Umeyama [Ume91].

8Synchronization is only achieved in as much as we query a pose from the measurement arm immediately before triggering a new image capture, assuming that the transmission of the pixel data is the dominant bottleneck and causes the biggest latency. Effects of synchro-nization errors are kept small by moving the camera very slowly.

number of frames

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 1