Metaheuristics and Optimization - machine learning techniques

Optimization deals with minimizing or maximizing some objective function (also called fitness function). This thesis deals with continuous parameter optimization problems, with objective functions f of the formf : R^d → R. In this thesis all optimization problems are treated as maximization problems as by Definition 1 — see Section 4.2 for details.

Figure 2.6: A plot of the New European Driving Cycle.

Definition 1. A continuous parameter maximization problem for some objective function f : X → R, with X ⊆ Rⁿ, is defined as follows. Find some global optimum ~x ∈ X such that

∀~s∈X :f(~x)≥f(~s)holds.

As mentioned in Chapter 1 the optimization process uses a black box simulation system for implementing its objective function. Consequently mathematical techniques for solving systems of equations and gradient descent methods are not applicable. Therefore the author relies on metaheuristics which are able to operate on black box objective functions.

The main characteristic of any heuristic is that it is not guaranteed that the method finds a global optimum of its search space. Hence, their performance on a specific problem can only be evaluated empirically. Heuristics are usually problem-specific i.e. a heuristic designed for one problem may be useless for another problem. Metaheuristics provide more abstract concepts than conventional heuristics, as they define frameworks which in sequence use problem-specific heuristics to create new solutions or to modify existing ones. A benefit of using metaheuristics is the possibility to analyse the properties of the used metaheuristics in a generic way to un-derstand specific properties of the metaheuristic itself e.g. like the schema theorem for genetic algorithms or influences of metaheuristic-specific parameters on the search behaviour. The au-thor therefore relies on metaheuristics which have been (empirically) proven to be effective on other optimization problems. The basics of the used metaheuristics are explained below.

Genetic Algorithms (GA)

Genetic algorithms are a popular family of metaheuristics. Their general idea has been intro-duced by J.H.Holland in 1975 [22]. Further information can be found in [4]. Genetic algorithms are inspired by the concepts of evolution in biology.

In biology, evolution is the process how specific species/individuals came to fill their current biological niche. According to evolution, existing species/individuals exist because their ances-tors possessed specific attributes that allowed them to perform better in a reproductive sense, than others in their current environment — this principle is also called natural selection. Further it is assumed that the descendants share at least some of these attributes with their ancestors, making them a similarly good “fit” to their current niche.

These principles are tried to be adapted to the field of optimization. Genetic algorithms belong to the class of population-based heuristics as they deal with a set of possible solutions at a time. Each solution is treated as an individual in a population and is encoded in a “chromosome”.

For each chromosome a fitness value can be calculated by means of the objective/fitness function of the optimization problem. After an initial population is created a genetic algorithm typically performs the following steps until some halting condition is fulfilled e.g. a number of iterations or convergence of the population.

1. Selection— In the selection step individuals are selected which are allowed to pass parts of their chromosome to the next generation. It is important that the selection is based on the fitness value of the individuals for genetic algorithms to work. Often used selection methods are roulette wheel selection, where a each individual is selected with a proba-bility proportional to its fitness and tournament selection, where for each individual to be selected two or more are randomly chosen and the best of these is then selected.

2. Recombination— The individuals selected in the previous step are then recombined into new individuals. This step depends strongly on the encoding used for the chromosomes.

Usually the attributes (genes) of two individuals are intermingled to produce a new indi-vidual.

3. Mutation— Commonly the recombination step reduces the variance in the attributes of the new population. This is due to using the information of a small set of selected ances-tors to create a larger set of descendants, thereby losing information from the individuals not selected for recombination. As this behaviour would lead to fast convergence i.e. all individuals become identical, a small percentage of each descendant’s genes are mutated to introduce more variance into the new population. Again this steps depends on the en-coding scheme used for the chromosomes.

GAs usually target combinatorial optimization problems in contrast to continuous optimiza-tion problems, as some of the typically applied encoding/recombinaoptimiza-tion/mutaoptimiza-tion concepts do not apply well to continuous variables. Although binary encodings like gray coding can be used for implementing continuous variables, a different encoding is used in this thesis as described in Section 4.2.

Nevertheless, due to modifications to the search space applied in this thesis (see Section 4.2), it is possible to apply GAs on discrete variables at least for some phases of the optimization process.

Evolution Strategies (ES)

Evolution Strategy (ES) have been introduced by Schwefel in [42] and are similar to genetic algorithms, with a few but nevertheless important differences. Whereas genetic algorithms are primarily used in combinatorial optimization, evolution strategies have been developed with continuous optimization in mind. An overview can be found in [5] and [6]

In ES, like in genetic algorithms, solutions are encoded as chromosomes. Although unlike in genetic algorithms, solutions are encoded as a real valued vector of the solution’s parame-ters. Further the main operation in evolutionary strategies is not recombination but mutation.

Mutation is usually implemented as addition of a normally distributed vector to an individual’s chromosome. The recombination step is often omitted.

For selection an own notation has been introduced. It distinguishes between two types of evolutionary strategies.

1. (µ+λ)-ES have a population size ofµand produceλdescendants per generation. From this pool ofµ+λindividuals the bestµindividuals are selected. This scheme introduces a concept called elitism to the ES, where an individual may survive several generations instead of one.

2. (µ, λ)-ES have a population size ofµand produceλdescendants per generation. The best µdescendants then form the new population.

A commonly applied modification to mutation in ES is to use a separate normal distribution with zero mean for each individual. The variances and covariances — the so called strategy parameters — of these normal distributions are then included into the chromosomes of the indi-viduals. The strategy parameters are then also prone to selection. This technique works well as long as the number of dimensions is low as for higher dimensional counts the algorithm begins to suffer from the “curse of dimensionality”. Alternative attempts for controlling the ES strategy parameters are described in Section 3.3.

Particle Swarm Optimization (PSO)

Like GA and ES, Particle Swarm Optimization (PSO) is another metaheuristic mimicking nature.

Though PSO is not inspired by evolutionary concepts. PSO has been developed by Eberhart and Kennedy in 1995 [27].

PSO tries to emulate the food search behaviour of birds, which have a tendency to cluster around rich food sources. Similar to ES, PSO is typically used for continuous optimization

— although discrete variants like Binary-PSO exist. In PSO each solution is represented as a particle, a real valued vector of the solution parameters, representing the position of the particle in the search space. Further each particle possesses a velocity in the search space. Each particle moves through the search space, evaluating the solution at its current position and orienting itself

towards its own best solution and the globally known best solution. The exact steps executed in each iteration are described below.

1. For each particlei, evaluate the objective function at its positionx~_i.

2. For each particle update its own best solutionb~iand the globally known best solution~g.

3. For each particle update its current velocityv~i.

v_i=v~_i+φ₁·rand_[0,1]·(~b_i−x~_i) +φ₂·rand_[0,1]·(~g−x~_i), φ₁ =φ₂ = 2 (2.1) Whererand_[0,1]is function providing uniform random values in the range[0,1].

4. For each particle update its current positionx~i.

xi=x~i+v~i (2.2)

There exist several variations of the update scheme like canonical particle swarm optimiza-tion (see Secoptimiza-tion 3.2). As for GA and ES, the steps above are executed until some halting condition is fulfilled.

Im Dokument machine learning techniques (Seite 27-31)