Advances in Evolution Strategies - machine learning techniques

v_i =ω ~v_i+φ₁·rand()·(b~_i−x~_i) +φ₂·rand()·(~g−x~_i) (3.1) The new inertia factorω decays the memory of the previous direction of the particle each iter-ation. By settingω = 0.7928andφ₁ = φ₂ = 1.49618the algorithm becomes identical to the Canonical PSO.

In this paper the Canonical PSO is implemented using the update rule with inertia, as this variant is widely used and the author considers its parameters more intuitive than the variant using the constriction coefficient.

3.3 Advances in Evolution Strategies

As mentioned in Section 2.3 managing the strategy parameters as part of the chromosome is a common method for automatically controlling the search behaviour of the ES. Its drawback is its limited scalability. In the following a different approach — the so called Covariance Matrix Adaption — for adapting the strategy parameters is introduced. It is expected that the use of this parameter adaption scheme leads to a faster convergence of the algorithm requiring fewer evaluations of the costly objective function.

Covariance Matrix Adaption (CMA)

In [20] Hansen and Ostermeier introduced the Covariance Matrix Adaption (CMA) strategy for adapting the strategy parameters of an ES. Instead of determining the strategy parameters of the ES indirectly by exposing them to selection, CMA computes them directly from the previously selected mutation steps. The underlying idea of CMA is that successful mutation steps from the previous generation will likely be successful in the next generation. The strategy parameters

are therefore adapted such that previously successful mutation steps become more likely to be produced again.

To explain the concept of the update scheme a special version of ES has to be introduced, namely the (µw, λ)-ES. This ES variant is similar to the (µ, λ)-ES with the exception that a weighted recombination is used. Weighted recombination computes a weighted-average chro-mosome~x_w[j+ 1]from the individuals~x_i[j]of the current population — of sizeN — and a set of associated weightsw~i,1≤i≤N as given in Equation 3.2. The computed weighted-average chromosome then acts as the center for the mutation step in iterationj+ 1. The newly gener-ated individuals are seeded around this center according to the updgener-ated normal distribution of iterationj+ 1.

In the following the general idea of the update is explained but not the exact algorithm which can be found in [20]. As it is rather lengthy to describe the algorithm without proper introduction to its basic concepts. Although the algorithm for the CMA-ES variant which is actually used in the experiments is given in Section 4.2.

1. Let C[j]be the covariance matrix of the used normal distribution with zero mean in it-erationj. ThenC[j]can be described by its eigenvalues and eigenvectors. Let B[j]be a matrix with the normalized eigenvectors ofC[j]as columns and D[j]be a diagonal matrix containing the square roots of the according eigenvalues.

Further each individual i can be expressed as a tuple(~x_w[j], ~z_i[j]) where ~x_w[j] is the distribution center of iteration j and~zi[j]∼ N(0, I) is a standard normal-distributed vector. The actual position~xi[j]of individualican be expressed as below.

x_i[j] =~x_w[j] +σB[j]D[j]~z_i[j]

| {z }

∼N(0,C)

(3.3)

2. Calculate the weighted average~zwof the normal distribution realizations of the bestµ in-dividuals in iterationj. In this paper the implementation uses the arithmetic mean instead of a weighted average.

3. CalculateC[j+ 1]as below, wherec_cov is a constant weighting the influence of the pre-vious iteration’s normal distribution.

C[j+ 1] = (1−ccov)C[j] +ccov(B[j]D[j]~zw)(B[j]D[j]~zw)^T (3.4) The first component(1−c_cov)C[j] can be understood as a contraction of the axes of highest variance ofC[j]i.e. its eigenvectors/values. The second component rotates and stretchesC[j+ 1]towards~xw[j+ 1].

The idea of using the best individuals’ mutation steps for adapting the covariance matrix of the mutation distribution is extended further by using so called evolution paths. An evolution

path accumulates the information of successful mutation realizations over several generations. Thereby~p_c[j+ 1]contains directional information from several generations withc_c control-ling the memorization time of the directional information. It can be seen thatc_cacts as an inverse inertia factor. Further a second evolution path~pσ is used for controlling the adaption speed of the mutation’s step sizeσindependently of the adaption speed of the mutation’s direction. The update forσcan be found in [20, p. 18].

The experiments in [20] showed that CMA-ES is a promising metaheuristic outperforming ES with mutative control of strategy parameters and other ES variants on a set of test functions.

Active Covariance Matrix Adaption

Although CMA-ES showed a good performance compared to other ES techniques it is noted in [20] that the adaption speed of the covariance matrix is somewhat slow. CMA-ES is expected to require several hundred generations for fully adapting its mutation distribution. For the problem at hand this is deemed infeasible due to the particularly high costs of function evaluations.

Therefore a variant of the CMA-ES is implemented for this thesis. The variant³called Active Covariance Matrix Adaption ES has been proposed by Jastrebski and Arnold in [24] and targets the particular problem of slow adaption of the covariance matrix. Where CMA-ES uses the information of successful mutation steps only, there Active CMA-ES uses additional information about unsuccessful mutation steps. Therefore Equation 3.4 is modified to Equation 3.6 by using the evolution path improvement and by adding a new term for incorporating information about the current generation’s successful and unsuccessful mutations.

C[j+ 1] = (1−ccov)C[j] +ccov~pc~p^T_c +βZ[j+ 1] (3.6) In Equation 3.7 π denotes the permutation ordering the individuals by their decreasing fitness values. ThereforeZis a scaled and rotated covariance matrix which actively reduces the vari-ance in the direction of theµworst individuals of the current generation. The factorβcontrols the influence ofZon the covariance matrix update.

The experiments in [24] show that ACMA-ES reaches predefined stop values on a set of test functions, significantly faster than CMA-ES and the Hybrid-CMA-ES variant. Consequently the author considers ACMA-ES as an ES variant particularly suited for optimizing functions with high evaluation costs.

3 More specifically the algorithm is a variant of the Hybrid-CMA-ES described in [19], which has not been accessible for the author.

Im Dokument machine learning techniques (Seite 39-42)