• Keine Ergebnisse gefunden

3.4 Extreme value distributions

4.1.1 GAVEO - Global Graph Alignment Via Evolutionary Optimiza-

4.1.1.3 Evolutionary operators

As GAVEO uses a problem-specific representation, standard evolutionary operators known from the field of evolutionary optimization do not apply. Hence, these opera-tors have to be adapted for GAVEO. In the following, a concrete description of the operators is given.

Recombination

The recombination operator constructs a new individual from ρ parent individuals drawn at random from the current population via a uniform distribution. To select the submatrices to be combined, ρ−1 random numbers ri, with i= 1, . . . , ρ−1 are generated with 1≤ r1 < r2 < . . . < rρ−1 < m, specifying the rows to be taken from the individual parents. To obtain the new offspring, the rows{ri−1+ 1, . . . , ri} from the i-th parent individual are selected (where r0 = 0 and rρ = m by definition) and combined.

As the indices in a row are not ordered, simply concatenating the rows as they are is not reasonable, since this would disrupt the reference frame of the row. This would be counterintuitive, as it violates the idea behind the recombination step, which is to combine elements from the parent individuals in order to allow the combination of (hopefully) favorable assignments from the parent individuals in the offspring.

Therefore, in a merging step i, the ordering of the ri-th row is used as a pivot row, a reference in order to preserve assignments already present in the parents. The submatrices derived from the parent individuals as specified by the rows {ri−1 + 1, . . . , ri} are combined columnwise, by joining each subcolumn with the subcolumn of the next individual that has the same node index entry in the pivot rowri. In case

the entry in rowri is a gap, which can occur multiple times, the first occurrence of a gap is chosen from the next individual, marking this column as “used”.

This procedure is illustrated in Fig. 4.2 for the caseρ= 3. Three individualsI1, I2, and I3 and two rows designated by the integersr1 and r2 with 1≤ri ≤m are chosen at random. All individuals are split horizontally at the rowsr1 andr2. The resulting blocks are then merged into a new offspring individual. To preserve the ordering of the parent individuals, columns are rearranged according to the reference rows r1 and r2, respectively, whose indexes serve as pivot elements. In the illustration, the first red subcolumn in I1 is transferred to the offspring, with the index of the pivot row r1 being 2. The column is then expanded by searching for the occurrence of the index 2 in the pivot row r1 in the next individual and transferring the associated next subcolumn (red) to the offspring individual. This procedure is repeated for all individuals and columns.

1 - 3 2 4

2 1 4 3 5

1 - - 2 3

5 3 1 2 4

2 - 4 3 1

- - 3 1 2

4 2 3 1 -

1 2 3 4 5

1 2 - - 3

2 4 3 1 5

- 2 1 4 3

- 3 - 2 1

2 1 - 4 3

3 4 5 1 2

- 3 2 1 -

2 3 5 4 1

4 - 1 2 3

- 2 - 1 3

I1 I2 I3

r1

r2

1 - 3 2 4

2 1 4 3 5

2 1 - - 3

4 2 1 3 5

2 - 4 1 3

1 2 - - 3

Recombined offspring

Figure 4.2: Recombination of ρ = 3 individuals. r1 and r2 designate the pivot rows (green), where the parent individuals are split. The red subcolumns are combined in a new offspring individual, preserving the assignment of nodes from the parent individuals.

4.1 Global graph comparison

Mutation

The performance of an EA largely depends on the mutation operator, which is guided by two contrary principles (Beyer and Schwefel, 2002). On one hand, it needs to allow for minimal changes to ensure that the exploration of the search space is fine enough to be able to reach every point, that is, every possible alignment. Moreover, if a near-optimal solution has already been reached, this ensures that one mutation step will not deviate much from this solution. On the other hand, larger steps are necessary to avoid premature stagnation and allow for a more rapid exploration of the search space. This trade-off is controlled by the mutation strength.

In GAVEO, the mutation operator is realized in a relative simple way, by randomly selecting a single row r and swapping two randomly chosen entries ri and rj. The mutation strength is regulated by performing this mutation steps repetitively, with the number of repetitions corresponding to the mutation strength. Fig. 4.3 illustrates the mutation operator.

4 2 3 1 -

1 2 3 4 5

1 2 - - 3

2 4 3 1 5

- 2 1 4 3

- 3 - 2 1

3 2 4 1 -

1 2 3 4 5

1 2 - - 3

5 4 3 1 2

- 4 1 2 3

- 3 - 2 1

Figure 4.3: Mutation of an individual with a mutation strength of 3.

This way, an adjustable mutation operator is created that allows for the determi-nation of the most successful mutation strength by specifying the mutation strength as a strategy component that can be adjusted instantly using a self-adaptation mech-anism (Beyer and Schwefel, 2002). This is necessary, since the optimal mutation strength is not known in advance.

Adaption of Alignment Length

The adaptation of the alignment length, as described above, occurs at randomly cho-sen intervals. For a given individual, the algorithm checks with a random probability pcheck whether an extension of the alignment is necessary. To this end, the presence of a gap column is checked. Three cases can occur:

1. Exactly one gap column is present. The alignment length does not have to be adjusted, as there are still placeholders available for every row.

2. Gap columns have accumulated, indicating that a matching of nodes is more favorable than introducing gaps, which in turn are considered obsolete. The number of gap columns is reduced to one.

3. No gap column is present, indicating that gaps have been introduced in the alignment to improve alignment quality. To restore the reservoir of placeholders, a new gap column is inserted.

Selection

As a selection operator, the deterministic plus-selection was chosen, thus realizing a (µ+λ) scheme known from evolutionary strategies (Beyer and Schwefel, 2002). In a (µ+λ) strategy, the µindividuals of the parental generation are chosen to create additional λ offspring individuals. During the selection process, all individuals are evaluated according to the fitness function 4.1. The population of the next generation is then created by selecting the best µindividuals according to their fitness.

For the current problem, this is arguably the most promising strategy, given that the search space for the multiple graph alignment problem is extremely large with a size ofO(k!m−1) (k denoting the length of the alignment, which is a priori not known and m the number of graphs). Thus, it would be advisable to regard the parent individuals as well as the offspring to ensure that no currently best solution is lost. An (µ, λ) strategy for example, utilizing the comma-selection (Beyer and Schwefel, 2002), which only considers the offspring, would discard the parent generation, regardless of their fitness value.

4.1 Global graph comparison