• Keine Ergebnisse gefunden

4. Reconstruction of Destructed Documents 65

4.2. Strip Shredded Text Documents

4.2.2. Formulation as Combinatorial Optimization Problem

For the reconstruction of strip shredded text documents (RSSTD) we are given a fi-nite set of rectangularly shaped and (almost) equally sized paper snippets—so-called strips—which have been produced by shredding one or more sheets of paper. In this thesis the widths of the strips are not further investigated since no information to be exploited in our approaches can be extracted from them. Furthermore, the heights of all strips are assumed to be the same. If this is not the case, then a preprocessing step using clustering methods as proposed in [134] can be performed such that (smaller) subprob-lems are generated consisting of (sub)sets of strips having all the same heights which are then solved independently from each other.

Although many office printers are capable of duplex printing nowadays, most docu-ments—especially in offices, one of the main application areas of shredders—are still blank on the back face. Motivated by this observation and for simplicity our presented model only regards the front face of the scanned strips. However, an extension to handle two-sided documents is possible in a straightforward way.

Our methods to be presented in the following solely focus on the information held on the borders of the strips, cf. Sec. 4.2.4. Therefore we neglect all strips with completely blank faces as well as strips with blank borders but non-empty inner regions. Beside the fact that the number of shreds to be regarded during the reconstruction process is reduced this blank strip elimination procedure removes symmetries implied by arbitrarily swapping blank strips, i.e., the search space is significantly reduced. For modeling the circumstance that in general paper documents have white margins at their left and right boundaries an additional virtual strip is added to the input of any instance which finally results in a finite setS ={1, . . . , n}of (almost) equally sized, rectangular shreds forming the output of a shredding process of one or more pages of paper documents. Whereas shreds1ton−1 are non-empty, we request that the virtual shrednis blank. The basic idea of this modeling is that the first, i.e., the leftmost, (non-blank) shred is placed right next to n while the last, i.e., the rightmost, shred is placed left to n yielding a cycle.

The virtual shred is therefore something like a connector between the first and last shred marking the start and the end, i.e., the left and right edge, of a page.

A solution x=hπ, oi to RSSTD consists of a permutation π:S \ {n} → {1, . . . , n1} of the elements in setS as well as a vectoro=ho1, . . . oni ∈ {0,1}n=On which assigns an orientation to each stripi∈ S:

oi = (

0 if stripi is to be placed in its original orientation,

1 if stripi is rotated by 180. (4.6)

In the followingπi, withi∈ S \ {n}, denotes the position of strip iaccording to π and πi = n. Additionally, by sk we denote the strip placed at position k, with 1 ≤k n, i.e., πi = k sk = i, with i∈ S and 1 k n. Possibly empty (sub-)sequences of strips will be denoted by σ = hsk, . . . , sk0i, with 1 ≤k < k0 n. Please note that the orientation of stripnis of no concrete impact sincen is blank.

In the following we make use of a cost functionc(i, j, ω)≥0to be defined later in detail It provides an approximate measure for the possible error made when two stripsiandj appear side-by-side and are oriented according toω= (oi, oj)∈ O2 in the reconstructed document. This implies that in case two shreds perfectly fit, the corresponding value of c(i, j, ω) will be low while in cases of matching two rather different borders the value of the error estimation function will be relatively high.

The overall objective of RSSTD is to find a solution, i.e., a permutation and a corre-sponding orientation vector, such that the following total costs, i.e., the estimated error made during reconstruction, are minimized:

One crucial task in solving RSSTD as stated above is a proper definition of the cost functionc(i, j, ω). A detailed discussion on this topic is given in Sec. 4.2.4. In any case, an error estimation function used for RSSTD must have the property that c(i, j, ω) = c(j, i, ω), with i, j ∈ S, ω = (oi, oj) ∈ O2 and ω = (oCj, oCi ), where oCi = 1−oi. This means rotating two strips and then swapping their positions must lead to the same error estimation.

4.2.3. Complexity Results

For this section we will define the decision variant of RSSTD (DRSSTD) as follows:

Given an RSSTD instance and a integer β. The instance is answered with yes, i.e., is solved, if there is a permutation and orientation vector such that the arising costs are less than or equal to β.

We will now show that DRSSTD isN P-complete by reducing the decision variant of the (symmetric) traveling salesman problem (TSP) to DRSSTD. In addition, it is obvious that DRSSTD is in N P since an instance can be solved in non-deterministic polyno-mial time by guessing a permutation and orientation vector and then verifying them in polynomial time.

We define an instance of the decision variant of TSP (also denoted by DTSP) as follows (see also [43]): Given is a set C of n cities and distances d(ci, cj) Z+ for each pair of cities ci, cj ∈ C. Furthermore, we are given a positive integer constant B. An instance of this decision problem is answered withyes, i.e., is solved, if there is a tour of all cities inC having length B or less, i.e., a permutation

cπ(1), cπ(2), . . . , cπ(n)

ofC is searched, with

mX1 i=1

d(cπ(i), cπ(i+1))

!

+d(cπ(m), cπ(1))≤B (4.8) and cπ(i) denoting thei-th city along the computed tour.

Such an instance of DTSP can now easily be transformed in polynomial time using the following algorithm:

1. For each city ci ∈ C introduce a strip i∈ S.

2. Without loss of generality assume the city cncorresponds to strip n.

3. Further, let us assume that the value of the error estimation function c(i, j, ω) is set tod(ci, cj), for allω ∈ O2,i, j∈ S,ci, cj ∈ C.

4. Setβ =B.

This transformation has an effort of O(n), i.e., linear in the number of cities of the instance of DTSP. Obviously, any solution to a so generated DRSSTD instance can be transformed into a solution to the original DTSP instance. Furthermore, the DRSSTD instance is answered withyes, if and only if the corresponding DTSP instance is answered with yes. Therefore, any algorithm for DRSSTD also solves DTSP. This implies that DRSSTD is N P-complete. Based on the above transformation any TSP instance can be transformed into it a RSSTD instance with same optimal value which implies that RSSTD cannot be approximated by a constant factor unlessP =N P, cf. [122].