Ant Colony Optimization Based Approach - Cross Cut Shredded Text Documents

4. Reconstruction of Destructed Documents 65

4.3. Cross Cut Shredded Text Documents

4.3.4. Ant Colony Optimization Based Approach

While in nature ants are guided along paths between food locations and their home by pheromone trails laid by other ants in most computer system inspired by this ant be-havior additional locally available knowledge is incorporated in the solution construction process. For our ant colony optimization (ACO) approach, two pheromone matrices τ and τ exist, whereas values τ_ij and τ_ij correspond to the amount of pheromone laid for placing shredj right next to shred iand placing shred ion top of shredj, respectively.

Both matrices are initialized within two steps, whereas during the ﬁrst step ﬁve solutions Π₁, . . . ,Π₅ are computed with the construction heuristics presented in Sec. 4.3.2, i.e., GMH, PMH, RBH, MPH and PBH. Based on the best obtained solution within this ﬁrst step, an initial valueτ⁰ is computed by

τ⁰ = m

min_i=1,...,5c(Πi), (4.37)

whereas m denotes the number of ants being used within the ACO. Subsequently, all values τij and τij, with i, j∈ S, are set to τ⁰. In the second step, a regular pheromone update (see the corresponding section in the following) is performed using initial solutions Π₁ toΠ₅.

Table 4.8.: Results obtained by VNS and ACO. The mean percentage gaps over 20 runs and standard deviations are presented for two independent test sets of VNS initialized using PBH and MPH as well as the mean gaps (over 20 runs) and standard deviations of 4 diﬀerent ACO variants incorporating RPBH, RGMH, RRBH and all three of them, respectively. Values in columns p correspond to the results of Wilcoxon rank sum tests using a 5% error level.

Solution Construction

New candidate solutions are constructed within the ACO by one of the following alter-native methods, which are based on the construction heuristics GMH, RBH, and PBH presented in Sec. 4.3.2. Each candidate solution created in such a way is then also locally improved by applying a restricted version of the above presented VND (see Sec. 4.3.3) using only neighborhood structuresN1 toN3 with a CPU-time limit of 500ms.

Randomized Greedy Matching Heuristic Analogously to GMH, therandomized greedy matching heuristic(RGMH) greedily matches shreds such that ﬁnally one long sequence of snippets is produced, which is then split into multiple rows. But instead of always ﬁxing that pair of shreds which matches best within each iteration, we now perform this selection in a probabilistic way in dependence of pheromone values and the cost function c(i, j). The probability p_ij of a match for pair (i, j), with i, j ∈ S⁰, whereas S⁰ denotes the set of shreds not yet matched, is equal to

pij = within an ACO, parametersα and β are controlling the inﬂuence of pheromones versus the inﬂuence of heuristic information.

Randomized Row Building Heuristic The randomized version of RBH—called ran-domized row building heuristic(RRBH)—tries to reconstruct a set of rows based on the following probability distribution, i.e., the next matching shred is not selected solely relying on the values of c(i, j), withibeing the last placed shred andj being any shred currently not placed, but using the probability valuesp_ij:

pij =

Again, setS⁰ is deﬁned as the set of all shreds not used within the current intermediate solution.

Randomized Prim Based Heuristic TheRandomized Prim based heuristic (RPBH) is the non-deterministic variant of PBH. The decision at which position the next (randomly

chosen) shred is placed is based on the following deﬁnition of probability values pⁱ_p for placing shredi∈ S⁰ to positionp, with set S⁰ being the set of shreds not yet used:

pⁱ_p = δ(i, p) P

p⁰∈D02

k∈S⁰δ(k, p⁰), ∀i∈ S⁰, p∈D02 (4.40) Function δ(i, p), withi∈ S⁰, p ∈D02 computes the additionally introduced error when placing shred i to position p. The value ofδ(i, p) is equal to zero if p is either already used by another shred k ∈ S \ S⁰ or all neighbor positions of p are free, i.e., no shred k∈ S \ S⁰ is positioned on them (see also Fig. 4.18). Analogously to PBH, all shreds are shifted one position to the right or to the bottom if the next shred should be assigned to any position outside ofD².

Pheromone Update

The pheromone update is done according to the following expressions, whereas we assume thatk, with1≤k≤m, refers to the solution obtained by antkduring the last iteration of ACO andΠ0 represents the best so far found solution:

τ_ij = (1−ρ)·τ_ij+ The idea behind these deﬁnitions is that the placing of two shreds next to each other should be emphasized when the costs of this placement are low.

4.3.5. Experimental Results

Results obtained using the VNS based approach are presented in Tab. 4.8 together with results obtained using an ACO approach to be discussed in detail in the next section. The ﬁrst four columns of this table correspond to the ﬁrst four columns of

Tab. 4.7. The next two columns labeled with VNS-PBH and VNS-MPH correspond to the experiments performed with VNS initialized using PBH and MPH, respectively.

Again, the values represent average gaps to the original document over 20 runs with respect to the objective function. They were obtained on the same hardware as the results presented in the previous section. The column labeled p for VNS-PBH lists the return of Wilcoxon rank sum tests comparing the two VNS variants with each other, i.e., <indicates that VNS-PBH performed signiﬁcantly better with an error level of 5%

on the corresponding instance while > indicates that VNS-MPH performed better; ≈ implies that no statement can be given for the corresponding instance. For each instance the better of the two average gaps is printed bold.

Although we performed tests initializing VNS with all previously presented construction heuristics, it turned out that only the variants using PBH and MPH were successful. The performance of the others was not that good and we therefore omit here the detailed results. However, it can be observed that the initial solutions could be signiﬁcantly improved by the VNS approach. Nevertheless, it is not easy to decide which of these two variants performs better. Interestingly, the performance of both variants seems to be strongly dependent on the underlying page. Whereas, VNS-MPH is clearly better for page p01, VNS-PBH outperforms VNS-MPH on page p02. Unfortunately, it is not possible to select the appropriate variant based on the (then reconstructed) document in advance.

In addition, Tab. 4.8 shows test results obtained by the ACO for instances based on pages p01 to p05 are presented together with the results obtained using the VNS approach.

The results obtained for diﬀerent ACO settings are presented in the columns labeled with ACO, ACO-RPBH, ACO-RRBH and ACO-RGMH. The concrete settings were chosen as follows: For each variant of ACO we set the number of antsm= 18. The construction heuristics used by the ants were RPBH, RRBH and RGMH, respectively. The fourth setting corresponding to the column labeled with ACO was chosen such that six of the 18 ants used RPBH, another six RRBH and the last six RGMH. The value of parameter m was chosen based on preliminary tests, which also revealed that the ﬁxing of α and β to 1 and 5, respectively, is reasonable for our ACO variants. The values presented in Tab. 4.8 are again mean percentage gaps over 20 runs, and the conclusions of selected Wilcoxon tests are given in columns labeled with p, whereas VNS-MPH, ACO, ACO-RPBH and ACO-RGMH were compared to ACO-RRBH. The correspondingp columns indicate again whether the ﬁrst (<) or the second heuristic (>) yielded statistically better results on an error level of 5%. If none of these two cases occur, a≈sign is printed in the according ﬁeld. In addition the best mean values of the four ACO variants is emphasized.

For the ACO variants a clear conclusion can be drawn: ACO-RRBH performs best on the considered test instances. Therefore, we decided to compare VNS-MPH with ACO-RRBH and observed that the results obtained by the latter one were for 28 instances

signiﬁcantly better. When comparing VNS and ACO in general, the two VNS variants achieved best mean results only on 11 instances whereas the ACO variant reached 35 times the best mean value (29 times this value was provided by ACO-RRBH).

Taking a closer look at the values in Tab. 4.8 it can be seen that for instance p01 with 9×9shreds ACO-RRBH could always reconstruct the original document page. For some runs, the percentage gap is even negative, which can be explained by the fact that for any error estimation function it is not assured that the original document is evaluated best, see also Sec. 4.2.4 and Sec. 4.2.9 for a discussion related to this topic.

Regarding running times, we can summarize that the construction heuristics performed within hundreds of milliseconds. The VNS approaches needed between one and 100 seconds computation time until termination, and the computation times for ACO lie between approximately 100 seconds and 800 seconds. It can be concluded that although the results obtained by ACO are better in most cases, the computation times needed are signiﬁcantly higher.

In general further improvements are necessary to address large practical instances espe-cially also involving multiple pages. However, considering the complexity of the problem, the achieved results on small and medium sized instances are remarkable. Especially for those pages containing mainly text, large parts of the documents could be reconstructed.

Im Dokument Hybrid Optimization Methods for Warehouse Logistics and the Reconstruction of Destroyed Paper Documents (Seite 133-138)