Evaluation on the changedetection.net benchmark

5. Robust Subspace Tracking on the Grassmannian 79

5.3. The pROST algorithm

5.3.3. Evaluation on the changedetection.net benchmark

One of the main difficulties with comparing different background subtraction methods has been the lack of an accepted benchmark. Various data sets exist (e.g. [58] and [78]) that provide video sequences and few manually segmented test images. However, the lack of pixel-level ground truth for whole video sequences has led to rather subjective evaluation

5.3. The pROST algorithm

as criticized in [13]. The authors overcome the cumbersome task of hand-segmenting video sequences by providing an artificially rendered scene, which allows a very detailed and precise segmentation. But although the animation is claimed to be close to photo-realistic the overall visual impression and statistics are fundamentally different from real video.

In order to establish a benchmark on real-world video sequences, thechangedetection.net data set [37] has been introduced. The data set consists of six categories of videos and provides ground truth for each frame. Categories vary from strictly static (baseline) over dynamic backgrounds to shaking cameras (jitter), scenes with particular objects changing positions (intermittent object motion) and sequences ofthermal images. The ground truth contains information about background and foreground objects as well as their boundaries and shadows (specifically evaluated in the shadows category).

For the evaluation of pROST on the changedetection.net benchmark, one overall set of parameters needs to be chosen. Obviously, this trade-off leads to suboptimal results as some scenarios require a different parameter setting than others. What follows is a brief discussion of the parameter settings and their influence while a much more detailed evaluation can be found in [72].

Subspace dimension

The admissible dimensionkof the subspace defines the inner dimensions of the optimization variables and thereby the size of the search space for the optimization problem. As this defines the computational complexity, k should be chosen as small as possible, while still offering sufficient degrees of freedom for modeling complex backgrounds. Empirical results show that kcan be chosen very small if the background is static, while a value of about 10 to 15 is required for complex dynamic backgrounds.

Initial and online step sizes

The choice of the step size defines how fast the subspace tracking algorithm adjusts to changes in the background model. Large step sizes come with the advantage of fast adapta-tion to changes in the background and allow to learn high-dimensional backgrounds within few frames. But fast adaptation also increases the risk of foreground objects leaking into the background, as foreground objects are never ideally sparse in space and time. They often appear in the same position over the course of several frames, which in combination

with large step sizes leads to the aforementioned issues like foreground leaking and ghost images. The temporal threshold for the transition from foreground to background (i.e. how long does an element have to be present in the scene to blend into background) highly depends on human perception, which is why the step size needs to be hand-tweaked. But even manual selection is difficult if one setting must be defined that should fit both static backgrounds that are constant over time as well as scenes with dynamic backgrounds or foreground-background transitions. For the initialization phase,tinit = 5×10⁻³ is selected and the online step size is chosen to be t_online = 10⁻⁴. This allows the algorithm to learn a background rather quickly in the beginning and leads to a reasonable trade-off between background adaptation speed and leakage of slowly moving foreground objects.

Foreground weighting parameter

The pixel weighting in Equation (5.6) adds a second time scale to the subspace tracking algorithm. While the overall progress in learning and adjusting a background model is controlled by the step sizet, the foreground weighting parameterω offers additional control on how fast foreground objects are incorporated into an existing background model. The pixel weighting has a large effect on the algorithm’s capability of dealing with highly dynamic complex backgrounds. An empirical value ofω= 5×10⁻⁵ allows learning such backgrounds from input sequences that are heavily corrupted with foreground objects, while still being able to incorporate such foreground objects into a background model if they are persistent over an extended period of time.

Detection threshold

As the classification between foreground and background is a binary decision, the optimum value for the detection threshold of foreground objects can be determined as to maximize the overall F-score across the categories. For pROST the value τ = 0.15 has been selected, which corresponds to about 40 intensity levels for 8-bit unsigned integer input. Again, the optimum threshold depends on the statistics of the video sequence, especially on the intensity difference between foreground and background objects.

5.3. The pROST algorithm

Cost function parameters

In contrast to the`₁norm, the freedom of choice for the parameterspandµin the smoothed

`p-norm cost function offers additional control over the required robustness against outliers in the data. As shown in the experimental evaluation of Chapter 4 and Section 5.2, a lower value for p leads to increased robustness against outliers, but it also slows down the convergence of the subspace tracking algorithm. This is why a moderately low value of p = 0.25 has been chosen. The value for the smoothing parameter µ is set to µ = 0.01 following the heuristic (4.7) and according to the choice for p, the threshold τ and the intensity range after scaling.

Im Dokument Robust Structured and Unstructured Low-Rank Approximation on the Grassmannian (Seite 108-111)