• Keine Ergebnisse gefunden

4. Marker-Based Reconstruction and Alignment 67

4.6. Evaluation

In order to point out the effects of the constrained BA, we ran the marker SLAM in two different modes for the evaluations:

• Rigid transform mode(Sec.4.6.1): In this mode we reversed the order of execution. The bundle adjustment is carried out first without imposing constraints. The rigid registration is executed afterwards - every time that new results from the bundle adjustment are available, i.e. on every key-frame.

4We note, that the used adapters for line markers, see Fig.4.3, in fact only allow them to be attached to the real model in two possible ways (vertical placement is also possible). So, we could also assume the rotation to be fully known.

5Undamped Gauss-Newton (as opposed to Levenberg-Marquardt) is possible, because the gauge is fully defined if a rigid transformation could be successfully estimated from the given constraints. Thus, the approximated Hessian,JTJ, has in general full rank and no regularization is necessary before inversion or Cholesky decomposition (see Sec.3.4).

• Constrained BA mode(Sec. 4.6.2): This is the default mode, where the constraints are internalized into the BA. Rigid registration and parameter projection must be done before initiating the constrained BA iterations for the first time, but only once during the entire SLAM; specifically at that point, when enough reference markers have been reconstructed. Once the feasibility of the constraints is attained, it will not be violated by the BA again later, because the parameters will evolve on their constraint manifolds.

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 -200

0 400 600 1000 800

200 1200

-400

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 1000

600 400 200 0 -200 1200

-400 800

-500

0 500 600 800

200 400 -200 0 -600 -400 -800 -400 -200 0 200 400 600 800 1000 1200

-800

-600

-400

-200

0

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

-800

-600

-400

-200

0

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

-800

-600

-400

-200

0

200

400

600

800

-800 -600 -400 -200 0 200 400 600 800

(a) (b) (c)

Figure 4.4.: Three different evaluation runs in ’rigid transform mode’, where the edge markers are placed at vary-ing locations. The estimated camera pose is colored from cyan to red (rigid transform mode) and the ground-truth pose in brown. Camera frustums are visualized every 40th frame. The reconstructed marker positions are depicted for two time instants: at the beginning of the sequence (cyan) and at the end (red). The orange trapezoid represents the edges of the table used for evaluation. Reconstruc-tion and registraReconstruc-tion in the ’rigid transform mode’ gradually improves the more frames and marker measurements are available.

For our evaluations we mounted the camera6onto the tip of a FAROR measurement arm. The measurement arm is capable of providing a highly accurate full pose of the tip (including translation and rotation) at high frequency rates. However, the raw output (EFs→Fd) is in FAROR’s local coordinates of the measurement device,FsandFd, whereFsdenotes the static coordinate system (basement) andFdis the coordinate system of the dynamically updated tip pose. So, in order to obtain ground-truth data for the camera pose in the coordinate system of the virtual modelV, we needed to perform two calibrations beforehand: ahand-eye calibrationbetween the camera and the measurement arm,EFd→C, and ascene calibrationfrom measurement arm coordinates to

6We used an IDS UEye monochromatic camera with global shutter and a resultion of 1280×1024 pixels.

4.6. Evaluation

the coordinate system of the virtual model, EFs→V.7 During runtime, the ground-truth pose is given by the concatenation of these relative transformations:EV→C=EFd→C·EFs→Fd·EFs1→V.

We recorded three different image sequences and ground-truth poses for evaluation, whereby we moved the camera in half-circles around the table. In each evaluation run we changed the positions of the edge markers.

In the first sequence, they were roughly centered between the edge endpoints, while in the second and third se-quence they were shifted in counterclockwise and clockwise direction, respectively. The plane markers were left unaltered, because we used them for further evaluations (see below). In Fig.4.4, for the ’rigid transform mode’, the ground truth and the estimated camera path are shown from the moment the markers were initially recon-structed and registered. Note, how the registration improves slowly but steadily as more marker measurements become available, and the more accurate the reconstruction by the unconstrained bundle adjustment becomes.

4.6.1. Results for Unconstrained Bundle Adjustment with Rigid Registration

We will first summarize our evaluations for therigid transform mode. The corresponding quantitative results to the three evaluations runs are shown in Fig.4.5. After each (unconstrained) BA the registration was computed as a similarity transformation including a global scale as free parameter using therendezvouz pointsand the lines and planes, i.e. we used the homogeneous version of Eq.2.10(Chap. 2p. 20). We visualized the results of all three rotation solvers of Chap.2, i.e. UPnP [KLS14], DLS/gDLS [HR11,SFHT14] and our own solver, but they yield almost identical results. It can be observed that after 200 frames (which approximately corresponds to a 90turn) in all three sequences an error of less than 20mm (translation) and 1(rotation) is achieved. The final error after 440 frames is around 10mm (translation) and 0.5(rotation).

A large portion of the remaining error is due to the fact that the final pose of the camera for each image is computed with a maximum of 28 2D-3D-correspondences (four corners of the seven markers). So, even if the markers were perfectly reconstructed and registered, the detected marker corners in the image are still noisy and therefore prevent the pose estimates from becoming more accurate. Moreover, we could not find any means for perfectly synchronizing the camera images with the measurement arm.8Therefore, in Fig.4.6we also evaluated, how well the three plane markers are reconstructed and registered. We compared their estimated center points with the center points of the real markers measured with the measurement arm. The final reconstruction and registration error is well below 2mm in all three cases. Comparing the final coordinates of the reconstructed and registered marker centers (see Table4.1) - which excludes any remaining errors from the measurement arm calibration - we see that these values are reproduced with sub-millimeter accuracy.

We also considered the case of performing the registration with a fixed and known scale as described in Sec.

2.3.4(Chap. 2.3.4p. 22). This is possible, because we assume that the size of the real markers is known. This knowledge implicitly transfers to the scale to the whole marker ensemble reconstruction. Intuitively, this should produce more accurate results, since there is one parameter less for the estimation. However, depending on the used marker algorithm and parameters the opposite may be true. As shown in Fig.4.7, the registration with fixed scale results in a clear misalignment that persists until the end of each sequence.

7We performed both calibrations in two separate stages. Forhand-eye calibrationwe observed a checker board pattern with a 400 frames sequence from various viewpoints. Then we simultaneously optimizedEFd→CandEFs→B, so that the re-projection error in the image between the measured and transformed checker board corners,xC=EFd→C·EFs→Fd·E−1F

s→B·xB, became minimal. Bdenotes the coordinate system of the checker board. EFs→B was only used as an intermediate slack variable and discarded after hand-eye calibration. Forscene calibrationwe measured the four corner points of the table with the measurement arm various times and averaged the results. Then we computed the relative transformationEFs→Vbetween the averaged pointsxFsandxV with the algorithm of Umeyama [Ume91].

8Synchronization is only achieved in as much as we query a pose from the measurement arm immediately before triggering a new image capture, assuming that the transmission of the pixel data is the dominant bottleneck and causes the biggest latency. Effects of synchro-nization errors are kept small by moving the camera very slowly.

number of frames

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 1

gDLS (OC) UPnP (OC) with scale own solver (OC)

number of frames

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 2

gDLS (OC) UPnP (OC) with scale own solver (OC)

number of frames

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 3

gDLS (OC) UPnP (OC) with scale own solver (OC)

number of frames

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 1

gDLS (OC) UPnP (OC) with scale own solver (OC)

number of frames

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 2

gDLS (OC) UPnP (OC) with scale own solver (OC)

number of frames

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 3

gDLS (OC) UPnP (OC) with scale own solver (OC)

(a) (b) (c)

Figure 4.5.: Absolute tracking error of the camera with respect to the ground-truth data coming from the Faro-Arm. The columns (a)-(c) belong to the camera paths in Fig.4.4, respectively. Spatial registration is performed independently with all three algorithms of Chap.2, namely UPnP [KLS14], DLS/gDLS [HR11,SFHT14] and our own solver.

number of frames

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 1

marker 1 marker 2 marker 3

number of frames

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 2

marker 1 marker 2 marker 3

number of frames

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 3

marker 1 marker 2 marker 3

Figure 4.6.: Evolution of the absolute error in millimeters of the center points of the three reconstructed plane markers. The ground truth positions in model coordinates are obtained by measuring their locations with the FaroArm. ’marker 3’ denotes the center plane marker, ’marker 1’ and ’marker 2’ are at the bottom-left and top-left of the table, see Fig.4.4, bottom row.

The reason for this effect is rooted in the used marker detection algorithm, in which the black marker squares are detected by a binarization of the image with a fixed threshold. This leads to larger marker detections for darker lighting conditions and vice versa (see Fig.4.2(b)). The size of the detected markers also influences the global scale of the reconstruction and must be compensated during registration. Newest versions of the Aruco library are capable of performing an adaptive thresholding, which should alleviate this problem. We did not

4.6. Evaluation

Table 4.1.: Final coordinates in mm of the three plane markers for the three evaluation runs.

Marker 1 Marker 2 Marker 3

Seq. 1 2 3 1 2 3 1 2 3

x 316.597 316.545 316.099−275.979 −275.932−276.095 −21.649 −21.795 −22.313 y −136.905−136.826 −136.440−159.139 −159.336−159.369 64.970 64.913 64.966

z 0.912 0.853 0.967 0.946 0.956 0.883 1.315 1.464 1.312

(a) (b) (c)

Figure 4.7.: Visualization of the registration results after 200 frames in each of the three evaluation sequences.

The orange outlines represent the case when the registration is computed assuming a fixed scale (using GAPS or UPnP), whereas green and white correspond to variable scale and the ground truth, respectively. The estimated scale factors for the three sequences are 1.0573, 1.0619, and 1.0604.

evaluate the behaviour in this case. However, fixed thresholding is also a common strategy in many other marker libraries, so it is worth pointing out this problem. We have shown that by leaving the scaling as free parameter, accurate registrations can still be attained despite this defect.

4.6.2. Results for the Constrained Bundle Adjustment Case

We will now present the results for the ’constrained BA mode’. The BA with internalized constraints is executed on every key-frame once the rigid registration, transformation (Sec. 3.4), and parameter projection (Sec. 3.5) have been accomplished. Due to the observed fact that the lighting conditions affect the scale with which the markers are detected, we parameterize each marker with an additional scaling parameter, so their individual reconstructions comprise a similarity transformation,SLn→V, instead of only Euclidean.

The blue plot in Fig. 4.8depicts the error of the registered camera pose with regard to the measurement arm ground-truth. For comparison, we also added the error of the ’rigid transform mode’ (same as in Fig. 4.5) in green. As can be seen, the ’constrained BA mode’ attains its final error value right from the beginning, i.e. after the initialization phase after enough translation is present to resolve the pose ambiguities of the marker poses.

No further 100-200 frames to collect additional measurements to increase the accuracy. These observations are confirmed for the reconstruction accuracy as visualized in Fig.4.9and the reproducibility plots (Fig.4.10).

frame number

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 1

Rigid Transform (GAPS) Constrained BA

frame number

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 2

Rigid Transform (GAPS) Constrained BA

frame number

0 50 100 150 200 250 300 350 400

rotation error [deg]

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25

2.5 Camera Rotation Error for Seq. 3

Rigid Transform (GAPS) Constrained BA

frame number

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 1

Rigid Transform (GAPS) Constrained BA

frame number

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 2

Rigid Transform (GAPS) Constrained BA

frame number

0 50 100 150 200 250 300 350 400

translation error [mm]

0 5 10 15 20 25 30 35 40 45 50

Camera Translation Error for Seq. 3

Rigid Transform (GAPS) Constrained BA

(a) (b) (c)

Figure 4.8.: Absolute tracking error of the marker-based SLAM with constrained BA (blue) compared to rigid registration only (green, cf. Fig.4.5). Again, the columns (a)-(c) correspond to the camera paths in Fig.4.4.

frame number

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 1

marker 1 (rigid) marker 2 (rigid) marker 3 (rigid) marker 1 (CBA) marker 2 (CBA) marker 3 (CBA)

frame number

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 2

marker 1 (rigid) marker 2 (rigid) marker 3 (rigid) marker 1 (CBA) marker 2 (CBA) marker 3 (CBA)

frame number

0 50 100 150 200 250 300 350 400

error [mm]

0 2 4 6 8 10 12 14

Reconstruction Accuracy for Seq. 3

marker 1 (rigid) marker 2 (rigid) marker 3 (rigid) marker 1 (CBA) marker 2 (CBA) marker 3 (CBA)

Figure 4.9.: Comparison of the constrained bundle adjustment (solid) and the unconstrained version with rigid registration (dotted line plots) for the absolute reconstruction error of the three reconstructed plane markers. The error of the center points is given in millimeters and denotes the distance to the ground truth positions in model coordinates as measured with the FaroArm. Again, ’marker 3’ denotes the center plane marker, ’marker 1’ and ’marker 2’ are at the bottom-left and top-left of the table, see Fig.4.4, bottom row.

We conclude this evaluation by making some final remarks on the scale computation. In the ’rigid transform mode’ we have estimated a single scaling parameter for all markers. This modeling assumption represents a sufficiently precise solution for the given experimental setup, where all markers are exposed to approximately the