• Keine Ergebnisse gefunden

4. Reconstruction of Destructed Documents 65

4.3. Cross Cut Shredded Text Documents

4.3.4. Ant Colony Optimization Based Approach

While in nature ants are guided along paths between food locations and their home by pheromone trails laid by other ants in most computer system inspired by this ant be-havior additional locally available knowledge is incorporated in the solution construction process. For our ant colony optimization (ACO) approach, two pheromone matrices τ and τ exist, whereas values τij and τij correspond to the amount of pheromone laid for placing shredj right next to shred iand placing shred ion top of shredj, respectively.

Both matrices are initialized within two steps, whereas during the first step five solutions Π1, . . . ,Π5 are computed with the construction heuristics presented in Sec. 4.3.2, i.e., GMH, PMH, RBH, MPH and PBH. Based on the best obtained solution within this first step, an initial valueτ0 is computed by

τ0 = m

mini=1,...,5c(Πi), (4.37)

whereas m denotes the number of ants being used within the ACO. Subsequently, all values τij and τij, with i, j∈ S, are set to τ0. In the second step, a regular pheromone update (see the corresponding section in the following) is performed using initial solutions Π1 toΠ5.

Table 4.8.: Results obtained by VNS and ACO. The mean percentage gaps over 20 runs and standard deviations are presented for two independent test sets of VNS initialized using PBH and MPH as well as the mean gaps (over 20 runs) and standard deviations of 4 different ACO variants incorporating RPBH, RGMH, RRBH and all three of them, respectively. Values in columns p correspond to the results of Wilcoxon rank sum tests using a 5% error level.

Solution Construction

New candidate solutions are constructed within the ACO by one of the following alter-native methods, which are based on the construction heuristics GMH, RBH, and PBH presented in Sec. 4.3.2. Each candidate solution created in such a way is then also locally improved by applying a restricted version of the above presented VND (see Sec. 4.3.3) using only neighborhood structuresN1 toN3 with a CPU-time limit of 500ms.

Randomized Greedy Matching Heuristic Analogously to GMH, therandomized greedy matching heuristic(RGMH) greedily matches shreds such that finally one long sequence of snippets is produced, which is then split into multiple rows. But instead of always fixing that pair of shreds which matches best within each iteration, we now perform this selection in a probabilistic way in dependence of pheromone values and the cost function c(i, j). The probability pij of a match for pair (i, j), with i, j ∈ S0, whereas S0 denotes the set of shreds not yet matched, is equal to

pij = within an ACO, parametersα and β are controlling the influence of pheromones versus the influence of heuristic information.

Randomized Row Building Heuristic The randomized version of RBH—called ran-domized row building heuristic(RRBH)—tries to reconstruct a set of rows based on the following probability distribution, i.e., the next matching shred is not selected solely relying on the values of c(i, j), withibeing the last placed shred andj being any shred currently not placed, but using the probability valuespij:

pij =

Again, setS0 is defined as the set of all shreds not used within the current intermediate solution.

Randomized Prim Based Heuristic TheRandomized Prim based heuristic (RPBH) is the non-deterministic variant of PBH. The decision at which position the next (randomly

chosen) shred is placed is based on the following definition of probability values pip for placing shredi∈ S0 to positionp, with set S0 being the set of shreds not yet used:

pip = δ(i, p) P

p0∈D02

P

k∈S0δ(k, p0), ∀i∈ S0, p∈D02 (4.40) Function δ(i, p), withi∈ S0, p D02 computes the additionally introduced error when placing shred i to position p. The value ofδ(i, p) is equal to zero if p is either already used by another shred k ∈ S \ S0 or all neighbor positions of p are free, i.e., no shred k∈ S \ S0 is positioned on them (see also Fig. 4.18). Analogously to PBH, all shreds are shifted one position to the right or to the bottom if the next shred should be assigned to any position outside ofD2.

Pheromone Update

The pheromone update is done according to the following expressions, whereas we assume thatk, with1≤k≤m, refers to the solution obtained by antkduring the last iteration of ACO andΠ0 represents the best so far found solution:

τij = (1−ρ)·τij+ The idea behind these definitions is that the placing of two shreds next to each other should be emphasized when the costs of this placement are low.

4.3.5. Experimental Results

Results obtained using the VNS based approach are presented in Tab. 4.8 together with results obtained using an ACO approach to be discussed in detail in the next section. The first four columns of this table correspond to the first four columns of

Tab. 4.7. The next two columns labeled with VNS-PBH and VNS-MPH correspond to the experiments performed with VNS initialized using PBH and MPH, respectively.

Again, the values represent average gaps to the original document over 20 runs with respect to the objective function. They were obtained on the same hardware as the results presented in the previous section. The column labeled p for VNS-PBH lists the return of Wilcoxon rank sum tests comparing the two VNS variants with each other, i.e., <indicates that VNS-PBH performed significantly better with an error level of 5%

on the corresponding instance while > indicates that VNS-MPH performed better; implies that no statement can be given for the corresponding instance. For each instance the better of the two average gaps is printed bold.

Although we performed tests initializing VNS with all previously presented construction heuristics, it turned out that only the variants using PBH and MPH were successful. The performance of the others was not that good and we therefore omit here the detailed results. However, it can be observed that the initial solutions could be significantly improved by the VNS approach. Nevertheless, it is not easy to decide which of these two variants performs better. Interestingly, the performance of both variants seems to be strongly dependent on the underlying page. Whereas, VNS-MPH is clearly better for page p01, VNS-PBH outperforms VNS-MPH on page p02. Unfortunately, it is not possible to select the appropriate variant based on the (then reconstructed) document in advance.

In addition, Tab. 4.8 shows test results obtained by the ACO for instances based on pages p01 to p05 are presented together with the results obtained using the VNS approach.

The results obtained for different ACO settings are presented in the columns labeled with ACO, ACO-RPBH, ACO-RRBH and ACO-RGMH. The concrete settings were chosen as follows: For each variant of ACO we set the number of antsm= 18. The construction heuristics used by the ants were RPBH, RRBH and RGMH, respectively. The fourth setting corresponding to the column labeled with ACO was chosen such that six of the 18 ants used RPBH, another six RRBH and the last six RGMH. The value of parameter m was chosen based on preliminary tests, which also revealed that the fixing of α and β to 1 and 5, respectively, is reasonable for our ACO variants. The values presented in Tab. 4.8 are again mean percentage gaps over 20 runs, and the conclusions of selected Wilcoxon tests are given in columns labeled with p, whereas VNS-MPH, ACO, ACO-RPBH and ACO-RGMH were compared to ACO-RRBH. The correspondingp columns indicate again whether the first (<) or the second heuristic (>) yielded statistically better results on an error level of 5%. If none of these two cases occur, asign is printed in the according field. In addition the best mean values of the four ACO variants is emphasized.

For the ACO variants a clear conclusion can be drawn: ACO-RRBH performs best on the considered test instances. Therefore, we decided to compare VNS-MPH with ACO-RRBH and observed that the results obtained by the latter one were for 28 instances

significantly better. When comparing VNS and ACO in general, the two VNS variants achieved best mean results only on 11 instances whereas the ACO variant reached 35 times the best mean value (29 times this value was provided by ACO-RRBH).

Taking a closer look at the values in Tab. 4.8 it can be seen that for instance p01 with 9×9shreds ACO-RRBH could always reconstruct the original document page. For some runs, the percentage gap is even negative, which can be explained by the fact that for any error estimation function it is not assured that the original document is evaluated best, see also Sec. 4.2.4 and Sec. 4.2.9 for a discussion related to this topic.

Regarding running times, we can summarize that the construction heuristics performed within hundreds of milliseconds. The VNS approaches needed between one and 100 seconds computation time until termination, and the computation times for ACO lie between approximately 100 seconds and 800 seconds. It can be concluded that although the results obtained by ACO are better in most cases, the computation times needed are significantly higher.

In general further improvements are necessary to address large practical instances espe-cially also involving multiple pages. However, considering the complexity of the problem, the achieved results on small and medium sized instances are remarkable. Especially for those pages containing mainly text, large parts of the documents could be reconstructed.