• Keine Ergebnisse gefunden

2.2 Implementation of the Coverage of Multiple Cameras

2.2.1 Basic Implementation

Thek-reliable coverage of several cameras is an intersection or union of all camera coverages. Thus, the visibility analysis requires to fuse several polyhedral areas. Set operations on polyhedra are known to be a non-robust computation: I.e. when two polygons are tangentially contacted or they intersect in only one edge, numerical errors can lead to topology inconsistencies, see [157].

In order to cope with set operations, the most commonly used data structure for the coverage is the following: The surveillance area is discretized into an orthogonal grid, composed of small cubes of the room, calledvoxels. The coverage of one camera is then a collection of the voxels that are covered. The data structure is also calledbinary occupancy grid, referring to occupied regions in robotic applications [152]. In order to derive the fused coverage of the whole camera network, a set operation such as the union or intersection needs to be applied over all camera specific occupancy grids. After the set operation, the quantized volume of the coverage is derived by adding up the volume of the covered voxels.

Therefore, we assume that the environment of the camera network is a collection of solid, static and dynamic objects. These solid objects are represented by a boundary representation, i.e. their boundary is made out of polygons called faces. Let the environment include fs,fd ∈ Nstatic and dynamic faces with a constant number of faces f = fs+ fdin each frame during the surveillance time. Let the dynamic objects move on a trajectory which is discretized intot−1 time intervals or rathert∈Ntime steps. Let the occupancy grid of each camera havev ∈Nvoxels. Let each voxel be treated as a point in space, in particular avoxel checkis defined as an operation applied to the voxel cube’s center in the surveillance area. One could as well use the voxel’s vertices or the edges, but to show the following procedure the center suffices. Let there be N ∈ Ncameras in the network. Also, let Odenote an upper bound of calculation steps within the simulation.

Consider an occupancy grid of a single camera’s coverage. The question is how to mark each voxel with the right sensor label in S in an efficient manner. The voxel check forout of range is merely a localization of the voxel according to the boundary planes of the camera frustum. Additionally, one variant of marking the voxels of the detectable coverage, i.e. the voxels which arenot out of range, is an inverseray tracing: The half line starting at the focal point of a camera and a point in the image plane is calledray. Usually, the value of the image at such a point is determined by intersecting the ray and the faces of the environment. Conversely here, the sensor label of the voxel and not the value of the pixel is determined. This is accomplished by intersecting the segment between camera position and the

2.2. IMPLEMENTATION OF THE COVERAGE OF MULTIPLE CAMERAS 33

voxel with all the faces of the environment. In case of a non-empty intersection, the voxel isoccluded.

This test needs to be done for each camera, each voxel and each face in each timestep, so the complexity of the simulation has an asymptotic behavior of O(N·v·t· f). If dynamic and static faces are stored in separate data structures this can be reduced to the following upper bound to the asymptotic behavior since static faces only have to be checked once over all timesteps:

O(N·v·(fs+t· fd))

We will now give a visibility analysis that synthesizes camera images, which is cheaper and can be used to derive the identical coverage as well as the detectable coverage. It is illustrated in Figure 2.8.

Cam 1:

Cam N:

x a1

aN

σ−1a1(S)

σ−1aN(S) Images 1 Silhouette images 1 Coverage 1

· · · Cx(k,S) Volume

ImagesN Silhouette imagesN CoverageN

Figure 2.8: Illustration of a single function callλ(Cx(k,S)) with the method of image synthesis.

Each camera synthesizes one (in case of the detectable coverage) or more images (in case of the identical coverage). These images are converted to silhouette images in 2D (in case of the identical coverage, Sections 2.1.2) and projected to each camera’s coverage in 3D. The fused coverage Cx(k,S) is measured by the volume. The calculations corresponding to each sensor (gray) are independent on another sensor’s parameters.

Instead of tracing the ray between the camera position and each voxel, the idea is producing a depth image of the faces: The depth image is discretized into a number ofp∈Nlittle squares which are called pixels. The only difference to color images is that these images do not store a color value in each pixel, the pixels contain the distance between the camera position and the next face. The following procedure distinguishes whether or not a voxel is detectable: Project the voxel into the image plane and thereby determine the pixel it should be seen in, as byρ(π(·)) in Definition 2.1.7. If there is no pixel, the voxel is outside the viewing frustum and thusout of range. Otherwise, if there is a pixel, compare the distance that is stored in this pixel to the distance between voxel and camera. If the latter is larger then the voxel isoccluded.

To derive the identical coverage, we produce more than one depth image. This is a way how to obtain the labelsidenticalandchanged: The first image is a background depth image showing only the static faces. An additional depth image per timestep including all the faces (static and dynamic) is produced.

A background subtraction on the depth images yields a silhouette image per time step, as described in Section 2.1.2. If a voxel is detectable and is projected onto a background pixel then the voxel is labeled

byidenticalotherwisechanged. An upper bound for the complexity of this approach is:

as can be seen in detail in Table 2.2. The latter method is cheaper when using a large amount of voxels or a denser occupancy grid. For a sufficiently large amount of voxels the inverse ray tracing is more expensive than the pixel based visibility analysis.

N(p·(fs+t· f)+t·(p+v) − v·(fs+t· fd))≤0 ⇔p· fs+t· f +t fs+t· fd−t ≤v

Thus, the coverage of a sensor can be constructed cheaper by synthesizing images than by inverse ray tracing. The fused coverage can now be derived by intersecting and uniting these camera coverages.

Description Complexity

Faces to depth image Straight forward approach to produce one reference depth image with pixels p including fsstatic faces, and a few depth images with all faces for each camera.

O(N·p·(fs+t· f))

BG subtraction A silhouette image is produced by sub-tracting each depth image from the refer-ence depth image.

O(t·N·p)

Deriving the Coverage The following will be done for each voxel in each timestep:

O(t·v)·. . . Project the voxel into the image plane of

each camera. The corresponding pixel can be derived by scaling and rounding.

·O(N)

If a pixel including the projected voxel does not exists in the image plane then the voxel is out of range, otherwise the distance between the camera and the voxel is evaluated

·O(N)

If the distance is larger than the depth value of the reference image the voxel is occluded

·O(N)

If the voxel is neitherout of rangenor oc-cludedit can be labeled as follows: If the pixel of the silhouette image is zero then the voxel isidentical, otherwisechanged

·O(N)

Table 2.2: Table of upper bounds of the calculation steps when deriving the coverage of one single camera. An occupancy grid is used as a data structure for the coverage. The occupancy grid is filled by synthesizing images of the environment.

The construction of the k-reliable coverage has been discussed in this section. Instead of an inverse ray tracing method, we will synthesize depth images and use an occupancy grid as data structure. This

2.2. IMPLEMENTATION OF THE COVERAGE OF MULTIPLE CAMERAS 35

method is improved in the upcoming sections.