• Keine Ergebnisse gefunden

Block Coordinate Ascent for Non-di ff erentiable, Non-separable Functions

3.2 Optimization Procedure utilizing a Block Coordinate Ascent

3.2.4 Block Coordinate Ascent for Non-di ff erentiable, Non-separable Functions

Due to the non-differentiability and non-separability of the objective (1.2), a sequential or parallel BCA as in Iteration 3.2.1 or 3.2.2 is not reasonable. The non-separability hinders the algorithm to find good positions in one step, and due to the non-differentiability the algorithm may not even converge to a local optimum or saddle point. Thus, we first need to smoothen the objective function to ensure convergence.

One possibility is to use a Radial Basis Function solver as depicted in Section 3.1. Two ideas seem promising: Utilizing an RBF solver in order to optimize the subspace maximizations of the BCA Algo-rithm 3.2.2, or utilizing a BCA in order to optimize the candidate search of the RBF solver. We would like to be able to distribute the computation of the costly objective function to several cores. Therefore, it is tempting to use an RBF solver in order to solve the subspace maximizations of the iteration, as shown in Algorithm 2 from Line 7 to 15.

Algorithm 2 A block coordinate ascent method using an RBF solver in order to solve the subspace maximizations

Require: As in Algorithm 1

1: Update RBF as in Algorithm 1, Line 1 and 2

2:

3: whileBCA termination conditiondo

4: for allsubspacesVm,m=1, ...,Mdo

5:

6: whileRBF-solver termination conditiondo

7: for allβin the search pattern< β1, ..., βL>do

8: ∆←max

x∈Vmmin1≤k≤K||x−sk||

9: Maximizethe surrogate ¯f(x)

10: Subject to

11: ||x−sk|| ≥β∆, k =1, ...,K

12: x∈Vm

13:

14: Costly function evaluation and updates as in Algorithm 1, from Line 11 to 18

15: end for

16: end while

17:

18: end for

19: end while

For this method, two types of response surfaces were envisaged: A separate RBF for each subspace, or a single global RBF. Unfortunately, the constraints of the candidate search of the RBF solver (Lines 8, 9 of Algorithm 1) neither ensure that the procedure nor that the radial basis function as a response surface model converges. The problem for the Algorithm 2 is that a new candidate is chosen only out of the current subspace not from the global domain. Therefore, the convergence of neither the method, nor the RBF is guaranteed. This is true no matter whether a global RBF or separate subspace RBFs are used.

What happens if we use Algorithm 2 with a global RBF is illustrated in Figure 3.8. As an objective function (left upper corner) the roof top function known from Figure 3.6 is chosen. The RBF at the moment of the termination of the procedure is illustrated as a colored hypersurface from blue to yellow, where yellow indicates the largest function value. The intermediate solutions (red crosses) correspond to candidates of the RBF. At the moment of termination the RBF features a maximum (5,0) (at the blue star) in the interior of the domain, but the global maximum of the objective function is at the boundary (10,10).

3.2. OPTIMIZATION PROCEDURE UTILIZING A BLOCK COORDINATE ASCENT 99

Figure 3.8: Illustration of Algorithm 2; The objective function (left upper corner, Figure 3.6) is non-differentiable and linearly ascending from (-10,-10) to (10,10). The RBF at the moment of the termination is shown as colored surface, where yellow corresponds to the largest function value.

The intermediate solutions (red crosses) correspond to candidates of the RBF and were chosen along the subspaces. The maximum of the RBF does not converge to the global maximum of the objective function.

The issues can be resolved by exchanging the order of the BCA iteration and RBF solver iteration.

Instead of utilizing the RBF solver to solve the subspace maximization in the BCA, theBCAwill then be used for the candidate search in theRBF-solver (BCA-R).

x(i)+Umvm

x(i) x(i+1)

x(i)+Umvm

x(i) x(i+1)

Figure 3.9: Illustration of three update methods in Algorithm 3: BCAIR (left) iterates the BCA on an Invariant RBF until a subspace maximum (green dot) is found. This point and the initial candidate are updated to the RBF. BCAUR (right) Updates the RBF at the beginning and after each subspace maximization (green/blue dots); The blue updates can beDistributed (BCADR).

Three instances of the method BCA-R are discussed in this section, as illustrated in Figure 3.9: A version that iterates the BCA on anInvariant response surface model until a stationary point is found which is

updated to the RBF (BCAIR); a version that Updates the response surface model after each subspace maximization (additional updates grayed, BCAUR); and aDistributed version of the last (BCADR).

The methods BCAIR and BCAUR are depicted in Algorithm 3. The lines shaded in gray represent mod-ifications between BCAIR and BCAUR due to the following circumstance: The BCAIR is exclusively calling the response surface model (not the costly objective) when changing the subspace. Two function evaluations are appended, before and after the BCA iteration, which yields the following disadvantages.

Firstly, it is very rare that two successive costly function evaluations are evaluated in a common sub-space. These subspace optimizations correspond to the placement of one camera in the Example 3.2.4.

Therefore, the method for decreasing the complexity of the costly objective function later on in Section 3.2.2 can not be utilized. On a second matter, computing the BCA iteration in parallel is not necessary, anymore, since the function evaluations inside the iteration are very cheap.

Algorithm 3BCAIR,BCAUR, and BCADR when distributing the costly function evaluations in Line 18

1: Update RBF as in Algorithm 1, Line 1 and 2

2:

8: Costly function evaluation and updates as in Algorithm1, from Line 11 to 18

9:

18: Costly function evaluation and updates as in Algorithm1, from Line 11 to 18

19: end for

20: end while

21: Costly function evaluation and updates as in Algorithm 1, from Line 11 to 18

22: end for

23: end while

For a distribution of the costly function evaluations, let us make the following alterations to Algorithm 3, shaded in gray: The costly function evaluations of Line 18 can be parallelized. The sequentially

3.2. OPTIMIZATION PROCEDURE UTILIZING A BLOCK COORDINATE ASCENT 101

executed function evaluations of Line 8 and 18 can be calculated by the cheap function evaluations of Section 3.2.2. Changing the calculations of Line 12 transforms the inner iteration into a second RBF-solver on only one subspace. These two alterations lead to the fact that we can utilize the parallel method with the cheap function evaluations of Section 3.2.2.

It can be seen in the following theorem that Algorithm 3 converges with all three types of updates.

Theorem 3.2.12

LetDbe bounded. The BCA-R in Algorithm 3 converges to the global maximum objective value of any bounded, continuous objective function f onD. Additionally, it is an anytime algorithm.

Proof. For the BCAUR/BCADR the candidates constructed until Line 8 constitute a sequence of iterates that are densifying inDas in Lemma 3.1.15. With the same reason as in Theorem 3.1.16 the proof can be closed.

In this subsection, two globally convergent optimization methods have been developed, the BCAIR and BCAUR of Algorithm 3. The last procedure can be computed in parallel, this version will be called BCADR. The computation in parallel threads can be achieved by using a global RBF as a response sur-face model for the first update in Line 8. For the updates 18 of each parallel thread (or each subspace for that matter) an exact copy of this model needs to be used. The same model is also used for the max-imization of Line 13 till 16. After distributing the iteration into subspaces optmax-imizations, the subspace models need to be merged by updating the newly found sample pairs into the global one. The update of Line 21 again affects the global model.

The distributed version has the following advantage:

Lemma 3.2.13

Consider the BCAUR with M∈Nbeing the number of subspaces and I0being the number of steps of the inner iteration (Lines 10– 20). Let T ∈Nbe the number of parallel threads of the BCADR with the same number of subspaces and iteration steps.

Then the total number of subsequent function calls of one outer iteration step (Lines 3– 23) is

• I0·M+1in case of the BCAUR and

• I0·lM

T

m+2in case of the BCADR.

Proof. Let us suppose the number of parallel threadsT is large enough, I will comment on this later in the proof. The costly objective function calls (Line 8, 18) in the non-distributed version of Algorithm 3 subdivide into two groups: The first call in Line 8 needs to be done before the other calls in the same RBF-solver iteration step are done. The following calls in Line 18 are calls inM subspaces. The calls within each subspace need to be done subsequently, all the subspaces can be computed in parallel. When limiting the BCA iteration to a number of iteration steps I0 for a problem with M subspaces then the number of all calls in one RBF-solver iteration step isI0·M+1.

When distributing the Algorithm, the call in Line 22 has to be done after the parallelization, additionally.

A number of Mcalls can be computed parallel. I0+1+1 need to be done sequentially. Now, consider

that the number of parallel threadsT is limited. If M≤T the maximum number of parallel computable function calls stays the same, this is considered “large enough”. If this is not the case, then, out of the former Mparallel function calls, a number oflM

T

mcalls need to be done subsequently. The minimum total number of subsequent costly function calls isI0·lM

T

m+2.

The advantage of BCADR compared to BCAUR is the decreased number of sequentially computed costly function calls, which is shown in the last lemma. Furthermore, both as well as the BCAIR can be equipped with symmetric prior information as in 3.1.14. In the following sections, we will explain the experimental setup that is needed to investigate whether this advantage helps the BCA-R versions to play in a better league than the plain RBF-solver.