Termination criterion - Projected (Quasi-)Newton method

3.2 Projected (Quasi-)Newton method

3.2.3 Termination criterion

As recommended in Kelley [Kel99] we use a termination criterion that is based on relative and absolute reductions in themeasure of stationarity ku−u(1)kU, whereu(1)denotes the projection u(ς) = P_U_ad(u+ς d) for ς = 1. For given initial residual r◦ = ku◦ −u◦(1)k_U andrelative and absolute tolerances τ_r>0andτ_a >0, the termination citerion is defined as

ku−u(1)k_U ≤τa+τrr◦. (3.19)

Since a-priorily no general settings for the tolerances τa andτr can be specified, we vary them for our numerical tests in the range of1.0×10⁻² to1.0×10⁻⁶.

Also the (relative) deviation εJ kˆ ∈R+ in the reduced cost functionJˆfrom the previous to the current iteration,

ε_{J k}_ˆ := |J(uˆ _k)−Jˆ(uk−1)|

|J(uˆ k−1)| ,

might give some meaningful information about the decay in the costs, which can be used for an additional criterion, e.g., the algorithm stops as soon as the deviation ε_{J k}_ˆ becomes too small and no further decline can be expected. The termination criterion is then defined by

ε_{J k}_ˆ ≤τ_J_ˆ (3.20)

for a given lower toleranceτ_J_ˆ>0.

Finally, as a kind of “standard” criterion, we employ also a maximum number of iterations, after which the optimization is stopped and the current iterate is returned as candidate for an optimal solution. Of course, this criterion is exclusively meant as a last resort for preventing the algorithm getting stuck in an infinite loop for numerical reasons. Therefore it is recommended to first adjust problem-related criterions such as (3.19) and (3.20) properly (compare for example also Kelley [Kel99], Nocedal and Wright [NW06], or Ulbrich and Ulbrich [UU12]).

4 A-posteriori error analysis

In this chapter, we present the main concept of error analysis that will find application in this work.

We suppose that u˜ is some arbitrary (suboptimal) control as it might be obtained by a numerical optimization procedure or as solution to some reduced order optimization model. The goal is a reliable estimation of the difference

k¯u−uk˜ _U

in an appropriate normk · k_U, without commonly knowing the optimal solution u, of course. The¯ idea on which here it is reverted to was used in the context of error estimates for optimal control of ordinary differential equations by Malanowski et al. [MBM97] and extended to elliptic optimal control problems in Arada et al. [ACT02] and Casas and Tröltzsch [CT02]. Let us already mention, that in case of proper orthogonal decomposition (POD) as model order reduction technique no a-priori estimation is available, so that the concept of a-posteriori estimation introduced in the following is of special interest. In this context, a-posteriori error analysis for linear-quadratic optimal control problems was examined in Tröltzsch and Volkwein [TV09] and extended to some nonlinear case in Kammann et al. [KTV13]. We will refer here mainly to the latter publication.

A fundamental assumption is that such a solution u¯ exists in a neighborhood of u˜. Moreover,

u should be sufficiently close to u¯. The approach itself is based on a fairly standard perturbation method involving second-order information for the (unknown) locally optimal control. Especially the latter fact makes this approach in consideration of the underlying nonlinear problem more elaborate, compared to the linear-quadratic case, see Tröltzsch and Volkwein [TV09]).

All quantities arising in the next sections have already been introduced in the chapters before and can be directly drawn on for computation. For this reason we will present a general access to the field of a-posteriori error analysis for the class of optimal control problems for semilinear parabolic equations as it can be found in Kammann et al. [KTV13].

4.1 The perturbation method for nonconvex functionals

In this section we present the concept of the perturbation method. Therefore we consider the following general form of a nonconvex but smooth optimization problem

min ˆJ(u) := 1

2kG(u)−yHk²_H +κ

2kuk²_L2(D) subject to u∈C, (4.1) with real Hilbert space H, a measurable and bounded set D ⊂R^m, a nonempty, convex, closed and bounded set C ⊂ L²(D), a fixed real number κ ≥ 0 and a fixed element y_H ∈ H. Again, we assume that for allu∈C the control-to-state operatorG:L^∞(D)→H is twice continuously Fréchet differentiable with first- and second-order derivatives G⁰(u) : L^∞(D) → H and G⁰⁰(u) : L^∞(D)×L^∞(D) →H continuously extendable toL²(D)×L²(D), compare (1.38) and (1.39).

Hence the operatorsG⁰(u)andG⁰⁰(u)can also be applied to incrementsv,v1 andv2 inL²(D)and

we can view G⁰(u) as continuous linear operator from L²(D) to H with adjoint operator G⁰(u)^∗ mapping continuously from H to L²(D).

The first derivative Jˆ⁰(u) is given by

Jˆ⁰(u)v= G(u)−yH,G⁰(u)v

H+ (κ u, v)_L²_(D)

= G⁰(u)^∗(G(u)−y_H) +κ u, v

L²(D)

= (pu+κ u, v)_L2(D)

with L²(D)-functionpu denoting the adjoint state associated withu, p_u :=G⁰(u)^∗(G(u)−y_H).

For the second derivative Jˆ⁰⁰(u) we consider the expression forJˆ⁰ with fixed increment v:=v₁ ∈ L^∞(D) and differentiate again in direction v2 ∈L^∞(D). By the chain and product rule we find

Jˆ⁰⁰(u)[v1, v2] = G⁰(u)v2,G⁰(u)v1

+ G(u)−y_H,G⁰⁰(u)[v₂, v₁]

H + (κ v₂, v₁)_L2(D).

By our assumptions on G, also the second derivative Jˆ⁰⁰(u) can be continuously extended to a bilinear form onL²(D)×L²(D) and it holds

|Jˆ⁰⁰(u)[v₁, v₂]| ≤ckv₁k_L2(D)kv₂k_L2(D) for all u∈C andv₁, v₂∈L^∞(D).

If now u¯∈C is a locally optimal solution to the nonlinear problem (4.1) in the sense ofL^∞(D), then there is some radius ρ >0, such that u¯∈L^∞(D)satisfies

Jˆ(u)>Jˆ(¯u) for all u∈C withku−uk¯ _L^∞_(D) ≤ρ . Together with the variational inequality from Corollary 1.4.7,

Jˆ⁰(¯u)(u−u)¯ ≥0 for allu∈C , we obtain the following proposition.

Proposition 4.1.1If u¯∈C is a locally optimal solution of (4.1) in the sense of L^∞(D), then it obeys the variational inequality

G⁰(¯u)^∗(G(¯u)−yH)

(x) +κu(x)¯

u(x)−u(x)¯

dx ≥ 0 for allu∈C. (4.2) On the other hand, let us consider a function u˜∈C that need not be optimal for the nonlinear problem (4.1). If u˜ 6= ¯u holds, then the (suboptimal) control u˜ does not satisfy the optimality condition (4.2). Anyway, this can be compensated by introducing a so calledperturbation function ζ ∈L²(D), such that the perturbed variational inequality

G⁰(˜u)^∗(G(˜u)−y_H)

(x) +κu(x) +˜ ζ(x)

u(x)−u(x)˜

dx ≥ 0 for allu∈C (4.3)

4.1 The perturbation method for nonconvex functionals

is fulfilled. Consequently,u˜satisfies the optimality condition of theperturbed optimization problem minu∈C

Jˆζ(u) := 1

2kG(u)−yHk²_H +κ

2kuk²_L2(D)+ Z

ζ(x)u(x) dx . (4.4) Obviously, the smaller the perturbation functionζ, the closer isu˜ to the optimal solution u¯of the original problem (4.1).

For quantifying the distance ku˜−uk¯ , we additionally need also some second-order information on u¯, namely thecoercivity constant δ ∈R of Jˆ⁰⁰(¯u), which makes the situation more elaborate compared to the linear-quadratic approach (see Kammann et al. [KTV13]). Assume that there exists someδ >0 such that thecoercivity condition

Jˆ⁰⁰(¯u)[v, v]≥δ kvk²_L2(D) for allv∈L²(D) (4.5) is satisfied. Then for any 0 < ˜δ < δ there exists a radius r(˜δ) > 0 such that for all u with ku−uk¯ _L^∞_(D)< r(˜δ) holds

Jˆ⁰⁰(u)[v, v]≥˜δ kvk²_L2(D) for all v∈L²(D), (4.6) i.e., the coercivity condition holds also true in a neighborhood ofu¯. Let us emphasize here, that this is a serious theoretical obstacle that can hardly be rigorously overcome, since we can onlyassume that the method of determining the (suboptimal) control u˜ was sufficiently precise to guarantee k˜u−uk¯ < r. Ifu˜ belongs to this neighborhood, we are able to estimate the distance as follows:

Theorem 4.1.2Let u¯ be locally optimal for (4.1) and assume that u¯ satisfies the second-order condition (4.5). If u˜∈C is given such that k˜u−uk¯ _L^∞_(D)< r(˜δ), then it holds

ku˜−uk¯ _L²_(D) ≤ 1

δ˜kζk_L²_(D), (4.7)

where ζ is chosen such that the perturbed variational inequality (4.3) is fulfilled.

Proof. By (4.3) u˜ satisfies the first-order necessary optimality conditions for the perturbed opti-mization problem (4.4)

minu∈C

Jˆ_ζ(u) = ˆJ(u) + (ζ, u)_L²_(D). We insertu¯ in the variational inequality foru˜ and vice versa, obtaining

Jˆ⁰(˜u) +ζ,u¯−u˜

L²(D) ≥0, Jˆ⁰(¯u),u˜−u¯

L²(D) ≥0. Now we add both inequalities and get

Jˆ⁰(˜u)−Jˆ⁰(¯u),u¯−u˜

L²(D)+ ζ,u¯−u˜

L²(D)≥0.

The mean value theorem implies

−Jˆ⁰⁰(ξ)[¯u−u,˜ u¯−u] + (ζ,˜ u¯−u)˜ _L²_(D)≥0

with some ξ ∈ {v ∈L²(D) |v = s¯u+ (1−s)˜u with s ∈(0,1)}. Now we apply (4.6) and the Cauchy-Schwarz inequality to deduce

δ˜k˜u−uk¯ ²_L2(D) ≤ kζk_L2(D)k˜u−uk¯ _L²_(D). From this inequality, the assertion of the theorem follows in turn.

Remark 4.1.3In Kammann et al. [KTV13, Remark 3.3] the authors make the suggestion, to selectδ˜:= ^δ₂ and set the radiusr :=r(^δ₂), which might be a too pessimistic choice. Since in the application the main interest lies in the order of the error, the factor ¹₂ is not that important and also ˜δ:=δ is used, even this might slightly be a too optimistic choice.

Im Dokument POD-Based A-posteriori Error Estimation for Control Problems Governed by Nonlinear PDEs (Seite 72-76)