Global Optimization using Simulated Annealing

We now solve the action principle on the sphere numerically. Using spherical coor-dinates, each vector in S² can be written as

v(ϑ, ϕ) :=







sin(ϑ) cos(ϕ) sin(ϑ) sin(ϕ)

cos(ϑ)





, (5.3.15)

where ϑ ∈ [0, π] and ϕ ∈ [0,2π). In the numerical approach, we allow both an-gles to obtain arbitrary values, loosing the uniqueness but gaining an unconstrained minimization problem on R^2m. According to (5.1.2) and the symmetry of the La-grangians, instead of minimizing S we can restrict on minimizing

Sˆ= 1 m²

∑

i<j

L(x_i, x_j).

The need of a global optimization routine can be illustrated by considering the dependence of the target function on the variables. This is done in Figure 5.3 by taking the spherical code X₂₀ that solves the Tammes problem and plotting for dierent values of τ the action S considered as a function of only ϑ₁, where x₁ = v(ϑ₁, ϕ₁). Many local minima appear and reveal that attempts using a local minimization routine may not yield satisfying results. The plots in Figure 5.3 also show that the structural behavior of the action changes. In the caseτ = 1, the target function is smooth and there only exists one global minimum for each variable. In

5.3 Global Optimization using Simulated Annealing 45

this case, the matrices Fx are rank-one matrices, and this observation justies the use of a local routine to solve the problems in chapter 3. Ifτ > 1, the target function gets non-smooth with many local minima. For τ close to one, there exist only few local minima, which are not very pronounced, so that even a local minimization routine may nd satisfying solutions. Forτ 1, the situation again gets more easy, as there is only one pronounced minimal value. In the remaining interval, the target function is non-smooth with many points of discontinuity and many local minima.

In order to solve the minimization problem, we need a global optimization method which allows to take a direction yielding higher function values in order to escape a local minimum and attain a branch leading to the global minimum, and which does not require dierentiability. A common routine is the method of simulated annealing, (see [6] and the references therein), which is a probabilistic metaheuristic algorithm based on annealing in metallurgy. Heating a material gives atoms the freedom to move and randomly distribute. Cooling the material again down slowly, the atoms arrange themselves in a ground state of minimal energy state. In the process, the atoms escape an energy state which is locally minimal by shortly accepting a higher energy state. The simulated annealing algorithm adopts this process to nd the global minimum of a function f : D ⊆ Rⁿ → R. The basic steps of the algorithm starting at the vector x∈D are

a) local change: choose vector y close to x,

b) selection: if f(y)≤f(x)then y←x, else y←x with probability e⁻

f(y)−f(x)

T .

A vector is accepted despite a higher function value with a probability determined by the temperatureT. To achieve a local minimum, the temperature decreases and converges to0, thus a point with a lower function value is accepted less probably in the process. The local change of the vector is realized by a mapping U : D → D, which maps x ∈ D at a vector y close to x. For example, one can perturb one randomly chosen entry ofx, thus y_k =x_k+r for random numbersk, r, and y_i =x_i else. To decrease the temperature on the run, one needs the cooling schedule C : R⁺→R⁺, which is a monotone decreasing function withlim_n_→∞Cⁿ(T) = 0, where we use the geometric cooling schemeC(T) =a^T for a∈(0,1).

The process stops, if either the temperature gets too small or a loop is run too often without changing the solution vector and thus stays in a local minimum. The algorithm uses the following parameters:

46 5 Causal Variational Principles on the Sphere

• ccounts the number of passes the solution vector stays unchanged. If a vector with a lower function value is found, the value ofcis again set 0. If c > c_max, the algorithm stops.

• i counts the number of iterations done with one xed temperature. If i gets too large, the temperature is decreased andi is reset 1.

• scounts, how often the solution vector with a xed temperature gets changed.

Ifs gets too large, the temperature gets decreased and s is reset1.

• randis a random number in[0,1]and determines, if the new vector is accepted despite a higher function value. This number gets regenerated in each step.

• τ: the new vector y is accepted despite a higher function values if this value diers from the old function value at less than τ.

This leads to the

Algorithm 5.2. Simulated annealing Start x∈D, i= 1 , c= 0, s = 1, T >0 whilec < c_max and T > T_min do

i←i+ 1

if i > i_max ||s > s_max then T =C(T), i= 1, s = 1 end if

y=U(x)

if f(x)−f(y)> τ then y←x, s←s+ 1, c←0 elseif rand <exp

(f(x)−f(y) T

) then x←y, c←0

else

c←c+ 1 end if end if end while

The choice of the initial and the stopping temperature has to be done carefully.

The initial temperature determines the acceptance of vectors yielding a higher func-tion value. A low initial temperature will fall into a local minimum, but if the initial temperature is chosen too high, all vectors are accepted. A high stopping tempera-ture yields a lower stopping point but causes longer CPU-time.

We use the general simulated annealing algorithm in [30]. For a discussion of the algorithm, a value of τ is adequate, such that all distinct points can be separated spacelike and thus we already know the minimal action Sˆ = 0. We take m = 20 andτ = 2.5> τ₂₀. Starting with a random spherical codeX with Sˆ(X) = 6.167855

5.3 Global Optimization using Simulated Annealing 47

0 100 200 300 400

0 1 2 3 4 5 6 7

Figure 5.4: The development of the action Sˆin the simulated annealing algorithm.

and using the structural parameters InitTemp= 1, MaxConsRej= 1000, StopTemp= 1.0e−16, MaxSuccess= 20, CoolSched: T = 0.8∗T, MaxTries= 300, the algorithm stops after 15.902203 seconds with Sˆ = 0.241036. The progress of the function is shown in Figure (5.4).

This result is not deeply satisfying. Additionally, as the simulated annealing yields only randomly good results, the same routine with the same starting vector will yield dierent results, which may be even higher. To counteract this probabilistic behav-ior, it is promising to repeat the algorithm, additionally adjusting the parameters.

As the most crucial parameter is the temperature, it seems reasonable to start with a high temperature and scale it down in each step, and slowly freezing the system in the global minimum, see Appendix C.

Algorithm 5.3 (annealing loop). Start x∈D, T =f(x), b∈(0,1) whileT > T_min do

y=anneal(f, x) with initial temperature T T ←bT

if f(x)> f(y) then x←y

end if end while

Starting this loop, the vectors are free to overcome a local minimum, while with repeating the annealing the minimum gets rened. The development of one loop can be seen in Figure 5.5 left, where the action stopped after 62.797773 seconds with Sˆ = 0 in accuracy of calculation. In practice, this procedure does not always succeed, see Figure 5.5 right, thus may be repeated with higher parameters and other starting vectors.

We are interested in the global minimizer of the variational principle for dierent values of τ. Since the function Ddepends smoothly on τ, the solution obtained for a certain value of τ contains informations which can be used solving the slightly dierent problem for a lower or higher value of τ. Thus we proceed as follows, see Appendix C: We apply the loop of simulated annealing for the function at τ = 1, choosing as starting vector the known solution of the Tammes problem X_m. For a stepsize h, we increase the parameter τ in each step viaτ ← τ+h (we will choose

48 5 Causal Variational Principles on the Sphere

0 10 20 30 40 50

0 1 2 3 4

Im Dokument Causal Variational Principles (Seite 50-54)