• Keine Ergebnisse gefunden

3.2 Projected (Quasi-)Newton method

3.2.3 Termination criterion

As recommended in Kelley [Kel99] we use a termination criterion that is based on relative and absolute reductions in themeasure of stationarity ku−u(1)kU, whereu(1)denotes the projection u(ς) = PUad(u+ς d) for ς = 1. For given initial residual r = ku −u(1)kU andrelative and absolute tolerances τr>0andτa >0, the termination citerion is defined as

ku−u(1)kU ≤τarr. (3.19)

Since a-priorily no general settings for the tolerances τa andτr can be specified, we vary them for our numerical tests in the range of1.0×10−2 to1.0×10−6.

Also the (relative) deviation εJ kˆ ∈R+ in the reduced cost functionJˆfrom the previous to the current iteration,

εJ kˆ := |J(uˆ k)−Jˆ(uk−1)|

|J(uˆ k−1)| ,

might give some meaningful information about the decay in the costs, which can be used for an additional criterion, e.g., the algorithm stops as soon as the deviation εJ kˆ becomes too small and no further decline can be expected. The termination criterion is then defined by

εJ kˆ ≤τJˆ (3.20)

for a given lower toleranceτJˆ>0.

Finally, as a kind of “standard” criterion, we employ also a maximum number of iterations, after which the optimization is stopped and the current iterate is returned as candidate for an optimal solution. Of course, this criterion is exclusively meant as a last resort for preventing the algorithm getting stuck in an infinite loop for numerical reasons. Therefore it is recommended to first adjust problem-related criterions such as (3.19) and (3.20) properly (compare for example also Kelley [Kel99], Nocedal and Wright [NW06], or Ulbrich and Ulbrich [UU12]).

4 A-posteriori error analysis

In this chapter, we present the main concept of error analysis that will find application in this work.

We suppose that u˜ is some arbitrary (suboptimal) control as it might be obtained by a numerical optimization procedure or as solution to some reduced order optimization model. The goal is a reliable estimation of the difference

k¯u−uk˜ U

in an appropriate normk · kU, without commonly knowing the optimal solution u, of course. The¯ idea on which here it is reverted to was used in the context of error estimates for optimal control of ordinary differential equations by Malanowski et al. [MBM97] and extended to elliptic optimal control problems in Arada et al. [ACT02] and Casas and Tröltzsch [CT02]. Let us already mention, that in case of proper orthogonal decomposition (POD) as model order reduction technique no a-priori estimation is available, so that the concept of a-posteriori estimation introduced in the following is of special interest. In this context, a-posteriori error analysis for linear-quadratic optimal control problems was examined in Tröltzsch and Volkwein [TV09] and extended to some nonlinear case in Kammann et al. [KTV13]. We will refer here mainly to the latter publication.

A fundamental assumption is that such a solution u¯ exists in a neighborhood of u˜. Moreover,

˜

u should be sufficiently close to u¯. The approach itself is based on a fairly standard perturbation method involving second-order information for the (unknown) locally optimal control. Especially the latter fact makes this approach in consideration of the underlying nonlinear problem more elaborate, compared to the linear-quadratic case, see Tröltzsch and Volkwein [TV09]).

All quantities arising in the next sections have already been introduced in the chapters before and can be directly drawn on for computation. For this reason we will present a general access to the field of a-posteriori error analysis for the class of optimal control problems for semilinear parabolic equations as it can be found in Kammann et al. [KTV13].

4.1 The perturbation method for nonconvex functionals

In this section we present the concept of the perturbation method. Therefore we consider the following general form of a nonconvex but smooth optimization problem

min ˆJ(u) := 1

2kG(u)−yHk2H

2kuk2L2(D) subject to u∈C, (4.1) with real Hilbert space H, a measurable and bounded set D ⊂Rm, a nonempty, convex, closed and bounded set C ⊂ L2(D), a fixed real number κ ≥ 0 and a fixed element yH ∈ H. Again, we assume that for allu∈C the control-to-state operatorG:L(D)→H is twice continuously Fréchet differentiable with first- and second-order derivatives G0(u) : L(D) → H and G00(u) : L(D)×L(D) →H continuously extendable toL2(D)×L2(D), compare (1.38) and (1.39).

Hence the operatorsG0(u)andG00(u)can also be applied to incrementsv,v1 andv2 inL2(D)and

we can view G0(u) as continuous linear operator from L2(D) to H with adjoint operator G0(u) mapping continuously from H to L2(D).

The first derivative Jˆ0(u) is given by

0(u)v= G(u)−yH,G0(u)v

H+ (κ u, v)L2(D)

= G0(u)(G(u)−yH) +κ u, v

L2(D)

= (pu+κ u, v)L2(D)

with L2(D)-functionpu denoting the adjoint state associated withu, pu :=G0(u)(G(u)−yH).

For the second derivative Jˆ00(u) we consider the expression forJˆ0 with fixed increment v:=v1 ∈ L(D) and differentiate again in direction v2 ∈L(D). By the chain and product rule we find

00(u)[v1, v2] = G0(u)v2,G0(u)v1

H

+ G(u)−yH,G00(u)[v2, v1]

H + (κ v2, v1)L2(D).

By our assumptions on G, also the second derivative Jˆ00(u) can be continuously extended to a bilinear form onL2(D)×L2(D) and it holds

|Jˆ00(u)[v1, v2]| ≤ckv1kL2(D)kv2kL2(D) for all u∈C andv1, v2∈L(D).

If now u¯∈C is a locally optimal solution to the nonlinear problem (4.1) in the sense ofL(D), then there is some radius ρ >0, such that u¯∈L(D)satisfies

Jˆ(u)>Jˆ(¯u) for all u∈C withku−uk¯ L(D) ≤ρ . Together with the variational inequality from Corollary 1.4.7,

0(¯u)(u−u)¯ ≥0 for allu∈C , we obtain the following proposition.

Proposition 4.1.1If u¯∈C is a locally optimal solution of (4.1) in the sense of L(D), then it obeys the variational inequality

Z

D

G0(¯u)(G(¯u)−yH)

(x) +κu(x)¯

u(x)−u(x)¯

dx ≥ 0 for allu∈C. (4.2) On the other hand, let us consider a function u˜∈C that need not be optimal for the nonlinear problem (4.1). If u˜ 6= ¯u holds, then the (suboptimal) control u˜ does not satisfy the optimality condition (4.2). Anyway, this can be compensated by introducing a so calledperturbation function ζ ∈L2(D), such that the perturbed variational inequality

Z

D

G0(˜u)(G(˜u)−yH)

(x) +κu(x) +˜ ζ(x)

u(x)−u(x)˜

dx ≥ 0 for allu∈C (4.3)

4.1 The perturbation method for nonconvex functionals

is fulfilled. Consequently,u˜satisfies the optimality condition of theperturbed optimization problem minu∈C

ζ(u) := 1

2kG(u)−yHk2H

2kuk2L2(D)+ Z

D

ζ(x)u(x) dx . (4.4) Obviously, the smaller the perturbation functionζ, the closer isu˜ to the optimal solution u¯of the original problem (4.1).

For quantifying the distance ku˜−uk¯ , we additionally need also some second-order information on u¯, namely thecoercivity constant δ ∈R of Jˆ00(¯u), which makes the situation more elaborate compared to the linear-quadratic approach (see Kammann et al. [KTV13]). Assume that there exists someδ >0 such that thecoercivity condition

00(¯u)[v, v]≥δ kvk2L2(D) for allv∈L2(D) (4.5) is satisfied. Then for any 0 < ˜δ < δ there exists a radius r(˜δ) > 0 such that for all u with ku−uk¯ L(D)< r(˜δ) holds

00(u)[v, v]≥˜δ kvk2L2(D) for all v∈L2(D), (4.6) i.e., the coercivity condition holds also true in a neighborhood ofu¯. Let us emphasize here, that this is a serious theoretical obstacle that can hardly be rigorously overcome, since we can onlyassume that the method of determining the (suboptimal) control u˜ was sufficiently precise to guarantee k˜u−uk¯ < r. Ifu˜ belongs to this neighborhood, we are able to estimate the distance as follows:

Theorem 4.1.2Let u¯ be locally optimal for (4.1) and assume that u¯ satisfies the second-order condition (4.5). If u˜∈C is given such that k˜u−uk¯ L(D)< r(˜δ), then it holds

ku˜−uk¯ L2(D) ≤ 1

δ˜kζkL2(D), (4.7)

where ζ is chosen such that the perturbed variational inequality (4.3) is fulfilled.

Proof. By (4.3) u˜ satisfies the first-order necessary optimality conditions for the perturbed opti-mization problem (4.4)

minu∈C

ζ(u) = ˆJ(u) + (ζ, u)L2(D). We insertu¯ in the variational inequality foru˜ and vice versa, obtaining

0(˜u) +ζ,u¯−u˜

L2(D) ≥0, Jˆ0(¯u),u˜−u¯

L2(D) ≥0. Now we add both inequalities and get

0(˜u)−Jˆ0(¯u),u¯−u˜

L2(D)+ ζ,u¯−u˜

L2(D)≥0.

The mean value theorem implies

−Jˆ00(ξ)[¯u−u,˜ u¯−u] + (ζ,˜ u¯−u)˜ L2(D)≥0

with some ξ ∈ {v ∈L2(D) |v = s¯u+ (1−s)˜u with s ∈(0,1)}. Now we apply (4.6) and the Cauchy-Schwarz inequality to deduce

δ˜k˜u−uk¯ 2L2(D) ≤ kζkL2(D)k˜u−uk¯ L2(D). From this inequality, the assertion of the theorem follows in turn.

Remark 4.1.3In Kammann et al. [KTV13, Remark 3.3] the authors make the suggestion, to selectδ˜:= δ2 and set the radiusr :=r(δ2), which might be a too pessimistic choice. Since in the application the main interest lies in the order of the error, the factor 12 is not that important and also ˜δ:=δ is used, even this might slightly be a too optimistic choice.