Hence, we have the following reduced Lagrange function:

L(T, u, v) :=f(T)−

D

λ∇T(x)∇v(x)dx −

D

ν(x)T(x)v(x)dx +

D

Q_{rf}(x)v(x)dx

(3.19)

with the Lagrange multiplier v ∈ H_{0}^{1}(D).^{8} Twice diﬀerentiation of the reduced
Lagrange function (3.19) yields the following Hessian (cf. Sect. 3.1.2):

H(T, u, v) =

⎛

⎝D_{T T}L D_{T u}L D_{T v}L
D_{uT}L D_{uu}L D_{uv}L
D_{vT}L D_{vu}L D_{vv}L

⎞

⎠=

⎛

⎜⎜

⎝

∂^{2}

∂T^{2}f 0 λΔ−ν

0 v ^{∂}^{2}

∂u^{2}Q_{rf} ^{∂}

∂uQ_{rf}
λΔ−ν _{∂u}^{∂} Q_{rf} 0

⎞

⎟⎟

⎠ , (3.20)

where the derivatives of the heat sourceQ_{rf} with respect tou are evaluated
numer-ically with help of central diﬀerences. Unfortunately, we have a steep slope of the
heat source Q_{rf} next to the boundary of the probe, so that the components of the
derivative

∂

∂uQ_{rf} =
∂

∂p_{1}Q_{rf}, ∂

∂p_{2}Q_{rf}, ∂

∂p_{3}Q_{rf}, ∂

∂a_{1}Q_{rf}, ∂

∂a_{2}Q_{rf}, ∂

∂a_{3}Q_{rf}

(3.21) with

∂

∂p_{1}Q_{rf} = Q_{rf}_{, p}_{1}_{+}_{ε}−Q_{rf}_{, p}_{1}_{−ε}

2ε = Q_{rf}(x−ε)−Q_{rf}(x+ε)

2ε , . . . (3.22)

have very large amplitudes and even converge to±∞next to the probe’s boundary (see Fig. 3.8).

Consequently, the entries ^{∂}

∂uQ_{rf} within the Hessian contain very large absolute
val-ues, so that values which are approximately (but not exactly) zero, and which are
multiplied with these entries of the Hessian H, become very large (compared to
zero). More precisely, in case of using a Lagrange-Newton method (see Sect.3.1.2),
in each step of the optimization one would have to solve the Newton system:

H(T, u, v) δT

δu δv

=−r_{n} , (3.23)

where r_{n} = (LT,Lu,Lv)^{t} is the residuum of the n-th iteration step, and δT, δu,
δv are the current changes in T, u and v, respectively, which have to be calculated
here. (Note that the zero-block on the diagonal of the Hessian H (cf. (3.20)) easily

8Note that within the deﬁnition of the Lagrange function (3.19), where we use the weak form of
the heat equation adjusted to homogeneous boundary conditions by settingT := T −T_{body}
(see Sect. 2.4.2), the objective functionalf(T) formally changes to f(T +T_{body}).

x
Q_{rf}(x)

Q_{rf}(x−ε) Q_{rf}(x+ε)

Q_{rf}(x−ε)−Qrf(x+ε)
2ε

Figure 3.8: Light Gray: Slope
of the heat source next to
the boundary of the probe
for p= (p_{1}+ε, p_{2}, p_{3}) and
p= (p1−ε, p2, p3), respectively.

Dark Gray: Central diﬀerence
quotient for approximating the
derivative of Qrf with respect to
thex-componentp_{1} of the probe’s
position.

can be eliminated by a permutation of rows or by a multiplication of the Newton
system (3.23) with the Hessian H, such that on the left hand side H turns into
H^{2} which is positive deﬁnite.) Now, when T, u, and v are nearly optimal (i. e.

(δT, δu, δv)≈ 0 and r_{n} ≈ 0), the large absolute values of ^{∂}

∂uQ_{rf} within the Hessian
H will cause the product on the right hand side of the Newton system (3.23) to
be far away from zero which will lead to numerical problems in solving the Newton
system. For example, a Jacobi method for solving the Newton system unfortunately
will not converge (which also can be seen by the fact that obviously the Hession is
not diagonally dominant). Also the usage of a relaxation method would not yield
better results, since the relaxation parameter would have to be chosen such small for
convergence, that the optimization never would be ﬁnished. Therefore, a
Lagrange-Newton (SQP) method would not be a good choice for solving our optimality system.

Instead, as already mentioned above, we use a gradient descent method which is less
sensitive with respect to the large absolute values of _{∂u}^{∂} Q_{rf},. (Here, no system like
the Lagrange-Newton system (3.23) has to be solved, and the large values of _{∂u}^{∂} Q_{rf}
within the descent direction are compensated by the composition with the adjoint
v and by the choice of the step size s; see the following explanation).

Before starting with a detailed description of the applied gradient descent method,
some remarks on the domain of the probe’s positioning (p, d) have to be made: Since
the orientation d of the probe lies on the sphere S^{2} one would have to compute a
gradient of the objective function on the sphere. This would involve some diﬃculties,
in particular because there is no basis of the tangent space of S^{2} atd that depends
continuously on d. For this reason, the domainU =D×S^{2} for the probe’s position
p and orientationd is replaced by the open set

U˜ =D×(R^{3}\ {0})⊃U . (3.24)

Moreover, let the projection P : ˜U →U be deﬁned by

P : (p, d)→(p, d/|d|) . (3.25) Then one can set

Q(u) =Q(p, d) := (Q ◦P)(p, d) =Q(p, d/|d|) , (3.26) i. e. Qdoes not depend on the length of d.

Now the particular ingredients of the gradient descent method which is used here for solving the optimality system, are the following, which will be discussed in more detail below:

• Initial value. First an arbitrary probe positioning u^{0} ∈ U is chosen as an
initial guess for the optimization.

• Descent direction. Then, in each iteration stepn≥0, the descent direction
w^{n} ∈ R^{6} is calculated from the current iterate u^{n} as an approximation of

−∇uf(u^{n}).

• Step size. Next, the step size s^{n}>0 is determined, such that the resulting
new iterate u^{n}^{+1} = P(u^{n}+s^{n}w^{n}) is admissible, i. e. fulﬁlls u^{n}^{+1} ∈ U and
reduces the value of the objective function f(u^{n}^{+1}) < f(u^{n}). By using the
projectionP, it is ensured that the new orientation lies on the sphere.

• Stopping criterion. The iteration continues until the diﬀerence |u^{n}^{+1}−u^{n}|
falls below a given threshold θ.

### 3.4.1 Descent Direction

For the calculation of the descent direction w we have to consider the reduced
Lagrange function L(T, u, v) in (3.19). Then minimizing the objective functionf is
equivalent to ﬁnding solutions ofD_{T}L(T, u, v)[γ] = 0 andD_{u}L(T, u, v)[γ] = 0 for all
test functions γ.

In the following, we exemplarily consider the objective functionf of (3.13). The analog calculations for the other objective functions (3.7)-(3.10) are straightforward.

Then the variation D_{T}L(T, u, v)[γ] is given by
D_{T}L(T, u, v)[γ] =−α

D^{t}

exp(−αT(y)dy
_{−}1

D^{t}

exp(−αT(x))γ(x)dx

−

D

λ∇v(x)∇γ(x)dx −

D

ν(x)v(x)γ(x)dx

(3.27)

for all test functions γ ∈ H^{1}(D). This variation leads us to the so-called adjoint
equation. More precisely, now one can calculate the Lagrange multiplier v (the
so-called adjoint state) as the solution to the equation D_{T}L(T, u, v) = 0 in D which

is

−λΔv(x) +νv(x) =−α

Dt

exp(−αT(y)dy
_{−}1

χDt(x) exp(−αT(x)) (3.28) for all x ∈ D, with the boundary condition v = 0 on ∂D. Here, χ

Dt(x) is an
indicator function which is 1 forx ∈D_{t} and 0 elsewhere. Furthermore, the variation
D_{u}L(T, u, v)[γ] is given by

D_{u}L(T, u, v)[γ] =

D

∂

∂uQ_{rf}(u)(x)·γ v(x)dx (3.29)
for allγ ∈D×S^{2}. Now, one can calculate the gradientD_{u}L(T, u, v)[w] in direction
of w. Since we search for a steepest descent, this direction is

w(x) :=− ∂

∂uQ_{rf}(u)(x)v(x) , (3.30)

where v is given by the adjoint equation from above (3.28). Obviously, for this direction the variation (3.29) attains its minimal value

D_{u}L(T, u, v)[w] =−w^{2}_{L}^{2}(D) . (3.31)
Here, the calculation of the adjoint v is performed by a conjugated gradient (CG)
method on a ﬁnite element grid, analog to the calculation of the temperature T
(cf. Sect. 2.4.2). Further, (as already described before; cf. (3.22)) the derivative

∂

∂uQ_{rf}(u) of the heat source with respect to the probe location is calculated with
help of central diﬀerences. Note that in (3.22) the variable ε must be at least as
large as the corresponding size of one grid cell, i. e. for the derivative of Q_{rf} with
respect to the probe positionp= (p_{1}, p_{2}, p_{3}) the variableε has to be at least as large
as the voxel size of the grid in the corresponding direction, and for the derivative of
Q_{rf} with respect to the probe direction d = (d_{1}, d_{2}, d_{3}) the variable ε has to be at
least as large as the voxel size of the grid in the corresponding direction multiplied
by ^{2}/Lactive where L_{active} is the length of the probe’s active zone. Moreover, in
the implementation the descent direction for the probe’s orientation additionally
is multiplied by a weighting factor, which is needed to adjust the weight on the
optimization of the probe’s orientation to the weight on the optimization of the
probe’s position (which is due to the fact that typically, changes in the probe’s
orientation have a lower eﬀect on the optimization than changes in the probe’s
position).^{9}

9The scaling of the descent direction for the probe’s orientation corresponds to an adaption between the units of the probe’s position (which is given in meter) and the probe’s orientation (which is dimensionless).

### 3.4.2 Step Size

To determine the optimal step size s in each step of the iteration, the equation s = argmin

s >0

f

P(u+s w)

(3.32) has to be evaluated, where the projection P deﬁned in (3.25) ensures that the orientation lies on the sphere. For the identiﬁcation of the optimal step size a bisection rule similar to Armijo’s rule (cf. [34]) can be applied:

First an initial guess is chosen which is adapted to the magnitude of the descent
direction. Then, if necessary, the step size is decreased until the new iterate is
admissible and reduces the objective function value. The choice of the initial guess
for the step size is diﬀerent for the ﬁrst iteration step and the following ones. For
the ﬁrst iteration step n= 0s^{0} is chosen such that

s^{0}|w^{0}|= 1

2diam(D) , i. e. s^{0} = diam(D)

2|w^{0}| ,

where diam(D) is the diameter of the computational domain D, and |·| is the ordi-nary 2-norm. In the following iteration steps n > 0 we start with a step size that fulﬁlls

|w^{n}|s^{n}= 2|w^{n−}^{1}|s^{n−}^{1}, i. e. s^{n} = 2|w^{n−}^{1}|

|w^{n}| s^{n−}^{1} .

After having chosen an initial value for the step size s^{n}, this value is bisected until
the new iterateu^{n}^{+1} =P(u^{n}+s^{n}w^{n}) fulﬁllsu^{n}^{+1}∈U andf(u^{n}^{+1})< f(u^{n}). If these
conditions are not matched after a certain number of bisections, the step sizes^{n} is
set to zero and the algorithm stops.

Note that the search for the optimal step size can be accelerated by using a scheme
which not only uses bisections, i. e. the step size s^{n} can also be divided by e. g. 2^{i}
(i = 2, . . . ,9) after a ﬁxed number of bisections. Moreover, after having found a
suitable step size, it should be tested, if a larger step size for only the position or
only the orientation is admissible and still leads to a better value of the objective
function.

### 3.4.3 Stopping Criterion

It is popular to stop the iteration when the norm of the gradient of the objective function f falls below a given threshold, i. e. when the shape of the objective func-tion becomes ﬂat. Unfortunately, this criterion cannot be used with the type of discretization presented here (cf. Sect. 2.4.2 and 3.3.1).

In Sect. 2.3 we have seen that the positionpand orientation d of the probe enter into the system of PDEs as Dirichlet boundary (2.13)-(2.17). Since, as described in Sect. 2.4.2, the probe is resolved on a Cartesian grid, we obtain a piecewise constant

discretization of the probe’s position and orientation. This induces a piecewise constant approximation of the objective functionf, so that the gradient off actually should be zero almost everywhere, but in fact does not vanish, since it is evaluated numerically with help of central diﬀerences including a variableε >0 depending on the grid size (cf. Eq. (3.22) and p. 42, bottom). Hence, the evaluation of the norm of the gradient off is not a robust stopping criterion any more. In fact the descent direction will not converge to zero, but only decrease up to a certain positive value which depends on the voxel size of the underlying grid and is not easily predictable.

Therefore, the iteration is repeated until the norm of the diﬀerence |u^{n}^{+1} −u^{n}|
between the new and old iterate falls below a suitably chosen threshold θ. In the
experiments shown in Sects. 3.5 and 3.8 the stopping criterionθ is split into diﬀerent
components θ = (θ_{i})_{i}. This allows to prescribe diﬀerent accuracies for the probe’s
position p, and the orientationd. The algorithm stops if both criteria are met, i. e.

|p^{n}^{+1}−p^{n}|< θ_{1} and |d^{n}^{+1}−d^{n}|< θ_{2} . (3.33)

Another reason for the fact that the norm of the gradient of the objective
func-tionf would not be a good choice for a stopping criterion might be the large absolute
values of the derivative of the heat sourceQ_{rf} with respect to the probe locationu.

More precisely, the large absolute values of _{∂u}^{∂} Q_{rf} possibly lead to an eﬀect, where
on some grid nodes the corresponding values of the gradient off are very large, but
only sometimes perchance become very small.

### 3.4.4 Optimization Algorithm

To summarize the last sections, now the basic algorithm that is used for the opti-mization of the probe placement will be presented in pseudo-code (see Alg. 3.1).

Consider that the computation of the descent direction in line 4 includes the
calcu-lation of the adjointv (3.28) with help of ﬁnite elements, as well as the application of
central diﬀerences for the calculation of the derivative ^{∂}

∂uQ_{rf}, which involves multiple
evaluations of the electrostatic equation (2.13).

Furthermore, for each test in the while-condition (see Alg. 3.1, line 12), an eval-uation of the complete system of PDEs (2.13)-(2.17) and the objective function f are needed.

Finally, note that the determination of the step size s (lines 10-19) in practical applications is accelerated by dividing s by e. g. powers of two after a ﬁxed number of bisections (cf. Sect. 3.4.2), and that the stopping criterion (line 20) in practice is split in a condition for the probe position and a separate condition for the probe orientation (cf. Sect. 3.4.3).

Algorithm 3.1 Basic algorithm for the optimization of the probe placement

1: Choose u^{0} Initialize start positioning

2: n←0

3: repeat

4: w^{n}← −∇_{u}f(u^{n})

Initialize step size

5: if n = 0then

6: s^{0} = (2|w^{0}|)^{−}^{1}diam(D)

7: else

8: s^{n}= 2|w^{n−}^{1}|(|w^{n}|)^{−}^{1}s^{n−}^{1}

9: end if

Determine step size

10: m←0 Reset counter

11: u^{n}^{+1} ←P(u^{n}+s^{n}w^{n})

12: while f(u^{n}^{+1})> f(u^{n}) or u^{n}^{+1} ∈U do

13: m ←m+ 1 Increase counter

14: if m=m_{max} then

15: STOP.

16: end if

17: s^{n}←s^{n}/2 Bisect step size

18: u^{n}^{+1} ←P(u^{n}+s^{n}w^{n})

19: end while

Check stopping criterion

20: until|u^{n}^{+1}−u^{n}| ≤θ

21: u¯←u^{n}^{+1} Save optimal probe positioning