• Keine Ergebnisse gefunden

In the first part of this thesis, an algorithm for constructing the labels is developed. In the second part, thek-MLSAproblem is solved with exact and heuristic methods. For the exact approach, the problem is modeled as a mixed integer program (MIP) and solved using abranch-and-cut-and-price (BCP) method. Extending theBC approach towards aBCPalgorithm was motivated by the large number of labels in our problem. TheBCPalgorithm starts with a small subset of labels and adds new labels on demand in the pricing step. TheBCPmethod is superior toBCand aBPbecause it can solve much larger instances.

The second chapter describes the compression approach in detail and introduces the datasets that were used for the tests. The third chapter deals with the construction of candidate template arcsTc. After a short introduction to the structure of the problem, two different approaches for the generation of template arcs are presented. In the first approach, the set of all template arcsTc is generated in advance, using an algorithm callednon-dominated interval search (NIS), which is explained in detail.

In the second approach, template arcs are generated on demand to be used within theBCPframework presented in Chapter 5. The construction of template arcs on demand is done using a variant ofNIS calleddynamic non-dominated interval search tree (DNIST). AsDNISTis invoked very frequently in the pricing step, it has to be very efficient. It is also explained in detail. In the final part of the chapter

the results are presented and discussed. The fourth chapter presents a memetic algorithmMAfor the solution of thek-MLSAproblem. Besides giving a short general introduction to genetic algorithms and known techniques for solving the relatedMLSTproblem, the newly developed memetic algorithm is described in detail, and the results are presented and compared with aGRASPapproach. In the menetic algorithm, feasible arborescences are searched for very frequently using depth first searches (DFS).

A technique to reduce the number of DFS calls is also introduced in Chapter 4. The fifth chapter presents a BCPapproach to solving the k-MLSA problem. It gives a short general introduction to BCPand presents the ILP model as well as the cutting plane separation with cycle elimination cuts and directed connection inequalities. The directed connection cuts are separated by computing the maximum flow and can be improved by back-cuts and creep flow. After presenting the pricing model, the different variants of branch-and-cut-and-price (BCP) algorithms are described. Furthermore, pre-processing strategies with theMAas an initial heuristic and the computation of a lower bound for a reduced version of the problem are discussed. Also in this chapter, differentBCPvariants are com-pared based on computational experiments. Chapter 6 discusses some implementation details such as third party libraries, frameworks and solvers as well as parameter settings and the software design. In Chapter 7, results and possible further developments are discussed.

2. Formal Description of the Problem

This chapter describes the compression model as well as the structure of the input dataset. We give only a short illustration of the compression model, since a more detailed description is already given in [18].

2.1 Compression Model

Our input data set consists ofnminutiae. Each minutia is interpreted as ad-dimensional point, i.e. as a vector~v˜ = {v1, . . . , vn}from a discrete domainD = {0, . . . ,v˜1−1} ×. . .× {0, . . . ,v˜d−1},D ⊆ Nd. The compression can be lossy or lossless. In the first case, a subsetk of thenminutiae is selected, whilst in the second case all minutiae are used, i.e.k =n. Our compression approach is based on the following two ideas:

1. Select kpoints fromn and connect them by ak-node arborescence: For this purpose we start with a complete directed graphG=(V,A) withA={(u, v)|u, v∈V,u,v}. Each nodeVof this graph corresponds to one of thenminutiae. In this graph ak-node arborescence is constructed, so each arc in the arborescence represents the relative geometric position of its endpoint relative to its starting point [18].

2. Use a codebook: We use a small codebook of template arcs, which are selected from the set of candidate template arcsTc (see Chapter 4), and encode each arc relative to the most similiar template arc. The idea behind this strategy is that it should require less space to store the arcs as they can be encoded by their difference to their corresponding template arc. The difference of each arc to its template arc is expressed by a so called correction vector ~δ˜ ∈ D0 from a prespecified, small domainD0 ⊆DwithD0={0, . . . ,δ˜1−1} ×. . .× {0, . . . ,δ˜d−1} [18].

The objective of our optimization is to minimize the number of candidate template arcs by solving the resultingk-MLSAproblem, while the size of the correction vector domain is prespecified. Note that other objectives are also possible and could be subject for further research: E.g., the size of the correction vector could be minimized while the number of candidate template arcs stays the same. Or, in a multi-objective optimization, the size of the correction vector as well as the number of template arcs (i.e. the size of the codebook) could be minimized. The result of our optimization is a minimum labeled spanning arborescence, and a solution consists of:

1. The codebook that contains the candidate template arcsTc.

2. A rooted, outgoing treeGT =(VT,AT) withVT ⊆ VandAT ⊆Aconnecting precisely|VT| =k nodes. Each arc of this tree is associated with a candidate template arc index κi,j ∈ {1, . . . ,m}

and a correction vectorδi,j ∈D0 [18].

Starting at the root of the tree we can derive every point from the relation of the source and target nodeviandvjof each arc. So for each arc the relation

vj=(vi+tκi,ji,j) mod ˜v, ∀(i,j)∈AT, (2.1.1)

holds, i.e. each arcvj can be derived from vi by adding a small correction vectorδi,j to the index of the template arcκi,j that corresponds to it [18]. We use a modulo-calculation so we do not have to deal with negative values and we avoid considering domain boundaries explicitly. For the modulo-calculation the domain border~˜v is defined as ˜vl = maxi=1,...,n vli, l = 1, . . . ,d. From this specific domain border, which is the maximum from one input instance, we have to distinguish the overall domain~ξ˜l, ˜ξl =maxIlI, l=1, . . . ,d, which is the maximum over all input instancesI.

The compressed information we have to store consists of the codebook, the correction vector~δ˜ and the tree. We encode the structure of the tree as a bit string by traversing the arborescence depth first and storing 1 if an arc has been traversed in forward direction and 0 if it is traversed backwards.

The following Equation (see Equation (2.1.2)) shows the complete compressed information on binary level.

Data of constant size are designated by CD, this variable contains for example the offset values for each dimension. For the number of nodes in the arborescencekand the number of template arcsmwe reserve 2×7 bits. Not all dimensions of the vector are necessarily represented by a template arc and a correction vector. To exclude these dimensions we define the following function:

χl =n 1 dimensionlis considered by the compression method

0 otherwise (2.1.3)

The third term of Equation (2.1.2) contains the bits reserved for the root node and the respective domain borders ˜vl. Term four contains the bits reserved for the bit-string to store the tree structure.

Term five contains the bits that hold the template arcs, and term six contains the bits that hold the arcs.

To represent each arc we need the index of the template arc. The space required to store these indices is ldm. Furthermore we have to store the correction vector and the remaining dimensions, which are not encoded by the correction vector. Note that we round up the whole fifth term of Equation (2.1.2), which saves bits compared to rounding up each term individually. To demonstrate the encoding we quote the complete encoding example from [18]. This example shows the encoding for two dimensions with~δ˜ =(5,5)T,k=9 and the following input:

The offsets are 200 for the first and second coordinate and the domain borders are~v˜=(503,477,294)T.

idx δ1 δ2 remaining dimension

3 2 210

Figure 1:This figure shows a concrete encoding example. The first block basically contains information to be able to process the following blocks. It is followed by the list of the template arcs. This can be compared to a dictionary or codebook of traditional compression methods. The block on the bottom contains the actual tree information, i.e. a list of arcs encoded by an index to one of the template arcs, the respective correction vectors, and finally the values of the dimensions which are not considered for compression. The black dots indicate that the size of the respective (sub)blocks is not known in advance, because it depends on output values of the compression algorithm (like the number of template arcsm). Caption and image source [18].

2.1.1 The Set of Candidate Template ArcsTc

The set of candidate template arcsTc is now described in more detail, since its construction will play an important role in this thesis. Each arc (i, j) ∈ A represents the geometric information of a d-dimensional vectora, wherealdenotes the coordinate of dimensionlof that vector andl=1, . . . ,d.

LetA0 ⊆ A, A0 ,∅be some subset of arcs fromA. A setA0is dominated by a setA00ifA0 ⊂A00. We want to determine all different non-dominated subsets ofAthat can be represented together by a common template arct∈D. A template arctcan represent all arcs within the domain of the predefined correction vectorD0⊆D(see Figure 2). Recall that we use a modulo structure instead of subtraction, so each arc can be represented by addingt to the respective correction vector ˜δ. Thus, each t can represent the following subspaceD(t)⊆D:

D(t)={t1, . . . ,(t1+δ˜1−1) mod ˜v1} ×. . .× {td, . . . ,(td+δ˜d−1) mod ˜vd}. (2.1.4) The setA0 ⊂ Amay be representable by several different template arcs t, one of which is chosen as the so called standard template arcτ(A0) by selecting the smallest coordinateal fromE ⊂ A0in each dimensionlwhereEcomprises only arcs inA0that are reachable without crossing the domain border.

Figure 2: Each point within the light-green areas anchored ata1anda2can representa1anda2respectively.

Each point within the dark green area can represent botha1 anda2 together. We designate the point with the greatest dimension values within the dark green area asτl(a1,a2) (the red dot in the upper right corner of the dark green area). That point is also the one whose dimension values coincide with the smallest dimension values from the set of the points it represents (a1 anda2 in this case). This figure does not show the special case of crossing the domain border.

Figure 2 shows how the position of the standard template arc is chosen. Although all template arcs in the dark green area can represent both arcs the standard template arc is set in the upper right corner. A standard template arcτ(A0) is dominated by another standard template arcτ(A00) ifA0⊂A00. The number of this non-dominated template arcs can become very high. An upper bound of the number of non-dominated template arcs|Tc| is given byO(|A|d), as illustrated in [18]. Figure 2.1.1 demonstrates the construction of this upper bound. Bold dots represent the vector setA, small dots the non-dominated standard template arcsTc. Obviously,|Tc|=(|A|/4+1)2= Θ(|A|2) [18].