Evolutionary operators - GAVEO - Global Graph Alignment Via Evolutionary Optimiza-

3.4 Extreme value distributions

4.1.1 GAVEO - Global Graph Alignment Via Evolutionary Optimiza-

4.1.1.3 Evolutionary operators

As GAVEO uses a problem-specific representation, standard evolutionary operators known from the field of evolutionary optimization do not apply. Hence, these opera-tors have to be adapted for GAVEO. In the following, a concrete description of the operators is given.

Recombination

The recombination operator constructs a new individual from ρ parent individuals drawn at random from the current population via a uniform distribution. To select the submatrices to be combined, ρ−1 random numbers r_i, with i= 1, . . . , ρ−1 are generated with 1≤ r₁ < r₂ < . . . < rρ−1 < m, specifying the rows to be taken from the individual parents. To obtain the new offspring, the rows{ri−1+ 1, . . . , r_i} from the i-th parent individual are selected (where r₀ = 0 and r_ρ = m by definition) and combined.

As the indices in a row are not ordered, simply concatenating the rows as they are is not reasonable, since this would disrupt the reference frame of the row. This would be counterintuitive, as it violates the idea behind the recombination step, which is to combine elements from the parent individuals in order to allow the combination of (hopefully) favorable assignments from the parent individuals in the offspring.

Therefore, in a merging step i, the ordering of the r_i-th row is used as a pivot row, a reference in order to preserve assignments already present in the parents. The submatrices derived from the parent individuals as specified by the rows {ri−1 + 1, . . . , r_i} are combined columnwise, by joining each subcolumn with the subcolumn of the next individual that has the same node index entry in the pivot rowr_i. In case

the entry in rowr_i is a gap, which can occur multiple times, the first occurrence of a gap is chosen from the next individual, marking this column as “used”.

This procedure is illustrated in Fig. 4.2 for the caseρ= 3. Three individualsI₁, I₂, and I₃ and two rows designated by the integersr₁ and r₂ with 1≤r_i ≤m are chosen at random. All individuals are split horizontally at the rowsr₁ andr₂. The resulting blocks are then merged into a new offspring individual. To preserve the ordering of the parent individuals, columns are rearranged according to the reference rows r₁ and r₂, respectively, whose indexes serve as pivot elements. In the illustration, the first red subcolumn in I₁ is transferred to the offspring, with the index of the pivot row r₁ being 2. The column is then expanded by searching for the occurrence of the index 2 in the pivot row r₁ in the next individual and transferring the associated next subcolumn (red) to the offspring individual. This procedure is repeated for all individuals and columns.

1 - 3 2 4

2 1 4 3 5

1 - - 2 3

5 3 1 2 4

2 - 4 3 1

- - 3 1 2

4 2 3 1 -

1 2 3 4 5

1 2 - - 3

2 4 3 1 5

- 2 1 4 3

- 3 - 2 1

2 1 - 4 3

3 4 5 1 2

- 3 2 1 -

2 3 5 4 1

4 - 1 2 3

- 2 - 1 3

I₁ I₂ I₃

r₁

r₂

1 - 3 2 4

2 1 4 3 5

2 1 - - 3

4 2 1 3 5

2 - 4 1 3

1 2 - - 3

Recombined offspring

Figure 4.2: Recombination of ρ = 3 individuals. r₁ and r₂ designate the pivot rows (green), where the parent individuals are split. The red subcolumns are combined in a new offspring individual, preserving the assignment of nodes from the parent individuals.

4.1 Global graph comparison

Mutation

The performance of an EA largely depends on the mutation operator, which is guided by two contrary principles (Beyer and Schwefel, 2002). On one hand, it needs to allow for minimal changes to ensure that the exploration of the search space is fine enough to be able to reach every point, that is, every possible alignment. Moreover, if a near-optimal solution has already been reached, this ensures that one mutation step will not deviate much from this solution. On the other hand, larger steps are necessary to avoid premature stagnation and allow for a more rapid exploration of the search space. This trade-off is controlled by the mutation strength.

In GAVEO, the mutation operator is realized in a relative simple way, by randomly selecting a single row r and swapping two randomly chosen entries ri and rj. The mutation strength is regulated by performing this mutation steps repetitively, with the number of repetitions corresponding to the mutation strength. Fig. 4.3 illustrates the mutation operator.

4 2 3 1 -

1 2 3 4 5

1 2 - - 3

2 4 3 1 5

- 2 1 4 3

- 3 - 2 1

3 2 4 1 -

1 2 3 4 5

1 2 - - 3

5 4 3 1 2

- 4 1 2 3

- 3 - 2 1

Figure 4.3: Mutation of an individual with a mutation strength of 3.

This way, an adjustable mutation operator is created that allows for the determi-nation of the most successful mutation strength by specifying the mutation strength as a strategy component that can be adjusted instantly using a self-adaptation mech-anism (Beyer and Schwefel, 2002). This is necessary, since the optimal mutation strength is not known in advance.

Adaption of Alignment Length

The adaptation of the alignment length, as described above, occurs at randomly cho-sen intervals. For a given individual, the algorithm checks with a random probability p_check whether an extension of the alignment is necessary. To this end, the presence of a gap column is checked. Three cases can occur:

1. Exactly one gap column is present. The alignment length does not have to be adjusted, as there are still placeholders available for every row.

2. Gap columns have accumulated, indicating that a matching of nodes is more favorable than introducing gaps, which in turn are considered obsolete. The number of gap columns is reduced to one.

3. No gap column is present, indicating that gaps have been introduced in the alignment to improve alignment quality. To restore the reservoir of placeholders, a new gap column is inserted.

Selection

As a selection operator, the deterministic plus-selection was chosen, thus realizing a (µ+λ) scheme known from evolutionary strategies (Beyer and Schwefel, 2002). In a (µ+λ) strategy, the µindividuals of the parental generation are chosen to create additional λ offspring individuals. During the selection process, all individuals are evaluated according to the fitness function 4.1. The population of the next generation is then created by selecting the best µindividuals according to their fitness.

For the current problem, this is arguably the most promising strategy, given that the search space for the multiple graph alignment problem is extremely large with a size ofO(k!^m−1) (k denoting the length of the alignment, which is a priori not known and m the number of graphs). Thus, it would be advisable to regard the parent individuals as well as the offspring to ensure that no currently best solution is lost. An (µ, λ) strategy for example, utilizing the comma-selection (Beyer and Schwefel, 2002), which only considers the offspring, would discard the parent generation, regardless of their fitness value.

4.1 Global graph comparison

Im Dokument Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity (Seite 97-101)