Indoor Environment - OPUS 4 | Binocular ego-motion estimation for automotive applications

The ego-motion method presented in this dissertation can be applied to different application areas since it does not make any assumptions about the type of motion expected. The main requirement is that the optical flow and stereo algorithm deliver a high enough number of point correspondences. As already shown in the previous chapter, the condition of at least50%of non-contaminated data is not even required, since the method is able to work relatively well, even when more than half of the data are outliers.

This section shows the ego-motion results with a sequence taken as the vehicle drove on a unpaved roadway in a forest. The road is quite uneven and the vehicle motion is different to that expected on a road. In order to compensate for the rough motion of the vehicle, the frame rate was set to an average of 24f rames/sec. Figure 8.8 shows some snapshots of the sequence. Observe that the top-right image shows a rotation of about 9^◦ around the optical axis, which is normally not expected to happen in normal roads. In sequencesCurvesandRingthe maximal rotation around the optical axis was about2.5^◦. The sequence is composed of 800 stereo images and the vehicle drives at slow speed a distance of approx. 120 meters.

The reconstruction of the scene in this section is achieved by just plotting the 3D points observed through the sequence. Since every observed 3D point is actually measured and estimated in a camera coordinate system (i.e. the camera position is always at the origin), the direct plotting of 3D points will not regenerate the structure of the scene (unless the camera is static). Instead, the clouds of points obtained at each frame are translated and rotated according to the corresponding position and orientation of the camera (i.e. , its ego-motion) obtaining continuity of the scene.

The points to plot are not those directly obtained with stereo. Instead, the Kalman filtered 3D positions of the points are the optimal candidates to plot. A point is selected for plotting (together with its grey value) if its estimated velocity is below a threshold. This way, we avoid the plotting of points with large error terms. This basic information might be used by more efficient reconstruction methods (e.g. [KESFK05]

[JL03] [GS04] [Mor02]), all of which are out of the scope of this dissertation.

Snapshots of the reconstruction are shown in Figure 8.9. The maximal number of tracks was3000. Ego-motion was computed with a maximal multi-step integration level of 15. A total of approx. 1.7×10⁶ 3D points with corresponding grey value were collected trough the sequence. The path traveled by the camera is shown in red.

8.4 Indoor Environment

The main difference between outdoor and indoor environments is the mean and variance distance to the observed objects. In indoor environments both variables are quite a bit smaller than in outdoor environments. For stereo applications this means, mainly, that the noise affecting the 3D position will be smaller. On the other hand, images of indoor environments might contain more and bigger untextured areas, like walls, and therefore less information. As long as the images contain enough texture to allow the computation of optical flow and stereo, the method presented in this dissertation can be used to compute the ego-motion of a freely moving camera in

126 Experimental Results

Figure 8.9: scene reconstruction for the sequenceForest

indoor environments, as this section shows.

In order to obtain some ground truth information about the motion of the camera, a special sequence was constructed. A normal sequence of about 468 images was recorded in an office while a person was moving. A second sequence was generated from the first sequence by just reversing the playing order of each image. Finally the latter was attached to the former in order to obtained a complete sequence of 936 images. The start and end position of the camera for the whole sequence is exactly the same and therefore the complete integration of motion must be zero at the end of the sequence. The camera configuration has a baseline of12cmand a focal length

8.4 Indoor Environment 127

Figure 8.10: Snapshots of the sequenceOffice.

of0.4cm. The images have a standard VGA resolution. A maximal integration depth of 80frames was used for this sequence. Large integration levels are achieved only when the feature can be tracked over so many frames.

Figure 8.10 shows some snapshots of the sequence Office. A person holds the camera and moves it to different position and orientations throughout the sequence.

The sequence shows a person, who stays still and partially occluded at the beginning of the sequence (first three top images of Figure 8.10. The person then moves to the left, then to the right, once again to the left and finally towards the camera. At this point (last image of Figure 8.10) the sequence is repeated in backward order.

In the snapshots, it can be seen that there are some velocity vectors of the points corresponding to the moving person. The color encoding is the same as shown in Figure 8.3, but where red means 1.75m/s. The different colors of the vectors obey principally to the different velocities of the body-parts. Since the scene contain many repetitive structures (e.g. , the folders in the filling cabinet at the background and the blinds at the left), some velocity vectors for static objects are incorrectly estimated, but without major consequences.

Observe that as a consequence of the special construction of the sequence, the results for the total camera motion over time must be symmetric. Figure 8.11 shows the estimation of the total rotation and translation as a function of image number.

Observe the high symmetry in the motion estimation for all motion parameters. A consequence of this symmetry is that the motion estimation at the end of the se-quence should be the same as in the beginning of the sese-quence, i.e. zero. Any

128 Experimental Results

-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40

0 94 187 280 374 468 561 654 748 842 935

Estimated Translation [m]

Frame Number X Y

(a) Total estimated translation

-20 -15 -10 -5 0 5 10 15

0 94 187 280 374 468 561 654 748 842 935

Estimated Rotation [deg]

Frame Number Pitch

Yaw Roll

(b) Total estimated rotation.

Figure 8.11: Results for the sequenceOffice

Im Dokument OPUS 4 | Binocular ego-motion estimation for automotive applications (Seite 143-147)