Discussion of Related and Arising Problems

4. Reconstruction of Destructed Documents 65

4.2. Strip Shredded Text Documents

4.2.9. Discussion of Related and Arising Problems

Based on the results presented in the above sections, it can be seen that reconstructing strip shredded text documents involving few pages can be done quite eﬀective—at least from the algorithmic point of view. However, there are some issues which should be discussed a little bit more in detail.

Issues Related to the Error Estimation Function

The most crucial part of the above presented methods is the error estimation function.

Although many (preliminary) tests with diﬀerent deﬁnitions of error estimation functions revealed that the two used within this thesis are the most promising ones, it is very easy to ﬁnd examples of documents for which the error estimation completely fails.

Nevertheless, as already discussed previously, there exists no generally “perfect” error estimation function since in some cases even humans are not able to decide which of two possible strip alignments is the correct one. In many situations the intuition of humans is, however, reliable and user can provide valuable information such that the reconstruction process can be optimally performed. Therefore, any automatic document recovery system must ﬁnally rely on the input of users indicating whether or not the reconstructed pages are sound.

Another point of critique which might be formulated in relation to the methods presented in this chapter is the fact that all test instances were automatically generated, i.e., especially the cutting process was not performed using real shredders but by doing it virtually. As shown by Ukovichet al. [132] the extraction of features from “real” shredded documents performs equally good as from virtually shredded documents. Based on this observation we performed some experiments using a standard shredder as found in our

(a) (b) (c) (d)

Figure 4.14.: A scan of shreds indicating that the amount of information lost along the edges is minimal.

oﬀice as well as a ﬂatbed scanner. For an exemplary scan of two matching shreds see Fig. 4.14. In addition the reconstructing of virtually shredded documents were performed for testing the applicability of the proposed approaches and methods. Clearly, for any real-world reconstruction system it is necessary that an automatic device is developed for scanning, extracting and rotating strips which in the following can be evaluated using an error estimation function respecting noise induced during the cutting and scanning process. For example, the error estimation function could be advanced in such a way that not the pixels directly located at the strip’s edges are used for extracting edge information but those pixels being located two or three layers away from the edge, cf. [10].

However, it turned out that for the reliability of the error estimation function the reso-lution used for scanning the images is much more impact then the “perfectness” of the cuts. When using higher resolutions, the number of pixels and therefore the information along the edge is obviously higher. Let us remark that on the one hand the appropri-ate resolution can be chosen by the user of the automatic reconstruction system since in most cases the data acquisition process will be part of such a system. On the other hand, tests showed that using a “standard” resolution of 150dpi is in most cases adequate.

To further enhance the results obtained by our methods we additionally tried a set of more “advanced” error estimation functions trying the incorporate the complexity of the pattern shown along the edges. For example, we deﬁned a relative error estimation which is simply the normalization of the absolute value obtained by Eq. (4.12) to values in the interval[0; 1]. Additionally, we tried methods for counting (and matching) larger blocks of black pixels to blocks of black pixels on the corresponding edge of the second shred. A third approach tried to compute a indicator on how good a matching between two shreds is based on the number of (other) shreds having a similar absolute error estimation value with respect to Eq. (4.12). However, it turned out that all of the approaches increased the obtained results for some instances but simultaneously worsened the results for other in-stances. On average error estimation functionc2, see Eq. (4.12), yielded the best results.

Obviously, for any recovery system to be used for real-world documents during forensic investigations it is necessary that the error estimation function respects information gathered from the strips using pattern recognition and/or image processing methods like line spacings, top and bottom margins, text color, background color and many

others. Obviously, the times for computing any such error estimation function will raise but at the same time the robustness of the method should increase too.

Finally, it was mentioned in the beginning of this section that we assume that the back face of the strips is blank. It is, however, relatively easy to extend the presented error estimation functions such that the information on the back face is regarded too. Never-theless one should keep in mind that, obviously, beside the decision on the orientation of the strips it would also be necessary to decide which of the two faces is the front of the strip. Therefore, the search space is enlarged and obviously the one expects the run-ning times to increase too. At the same time, this extension should positively eﬀect the correctness of the error estimation function since the information included is doubled.

Multilevel Reﬁnement Strategy

While the methods proposed so far, mainly focus on the relative alignment of strips to each other, i.e., the neighborhood relations, it would also be imaginable that absolute position information is computed for individual strips. This would be of high interest, especially for shredders with either various strip widths or if some of the knives are blunt or even missing and therefore the properties of certain strips are signiﬁcantly diﬀerent from other shreds, e.g., sharp versus frayed cuts or even twice as broad strips.

Another, from the algorithmic point of view, interesting extension would be the appli-cation of so called multilevel reﬁnement strategies [136, 137]. The basic idea of such a heuristic is to iteratively solve a given problem instance on diﬀerent levels of abstraction whereas representations on higher levels normally correspond to easier to solve instances, e.g., due to smaller instance size. Using the concepts ofcoarsening and reﬁnement the diﬀerent entities of the instance can be transformed into each other such that solutions on a higher level can be “extended” to solutions on a lower level and vice versa. Obvi-ously, this process can be iterated in both directions until no further improvements can be achieved. For a survey on multilevel reﬁnement strategies including successful ap-plications to several combinatorial optimization problems, including graph partitioning and the traveling salesman problem we refer to [136, 137].

In our case, the coarsening step would include the building ofblocksormeta-stripswhich consist of two or more matched strips. It is therefore easy to generate instances of smaller size, i.e., with less strips, which can then be solved using the above presented methods.

During the reﬁnement the meta-strips are then loosened such that previously ﬁx matched strips can be separated and moved independently of each other. While the building of meta-strips can be done based on heuristics using the error estimation function, this step can also be performed based on the input of humans using an user guided search as proposed in Sec. 4.2.7.

(a) (b)

Figure 4.15.: Two diﬀerent possible cutting patterns.

Im Dokument Hybrid Optimization Methods for Warehouse Logistics and the Reconstruction of Destroyed Paper Documents (Seite 119-122)