Greedy Sampling using Nonlinear Optimization

(1)

Universität Konstanz

Greedy Sampling using Nonlinear Optimization

Karsten Urban Stefan Volkwein

Oliver Zeeb

Konstanzer Schriften in Mathematik Nr. 308, November 2012

ISSN 1430-3558

Fach D 197, 78457 Konstanz, Germany

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-208051

(2)

(3)

Greedy Sampling using Nonlinear Optimization

Karsten Urban and Stefan Volkwein and Oliver Zeeb

AbstractWe consider the reduced basis generation in the offline stage. As an alter- native for standard Greedy-training methods based upon a-posteriori error estimates on a training subset of the parameter set, we consider a nonlinear optimization com- bined with a Greedy method. We define an optimization problem for selecting a new parameter value on a given reduced space. This new parameter is then used –in a Greedy fashion– to determine the corresponding snapshot and to update the reduced basis. We show the well-posedness of this nonlinear optimization problem and derive first- and second-order optimality conditions. Numerical comparisons with the standard Greedy-training method are shown.

Key words: Reduced basis method, Greedy algorithm, nonlinear optimization, a- posteriori error

1 Introduction

Reduced Basis Methods (RBM) are nowadays a well-known tool to solve para- metric partial differential equations (PPDEs) in cases, where the PPDE has to be solved for various values of the parameters (the so-calledmulti-querycontext, e.g.

in optimization) or when the solution for different parameter values has to be com- Karsten Urban

Universit¨at Ulm, Institute for Numerical Mathematics, Helmholtzstraße 20, D-89069 Ulm, Ger- many, e-mail: Karsten.Urban@uni-ulm.de

Stefan Volkwein

University of Constance, Department of Mathematics and Statistics, Universit¨atsstraße 10, D- 78457 Konstanz, Germany e-mail: Stefan.Volkwein@uni-konstanz.de

Oliver Zeeb

Universit¨at Ulm, Institute for Numerical Mathematics, Helmholtzstraße 22, D-89069 Ulm, Ger- many, e-mail: Oliver.Zeeb@uni-ulm.de

1

(4)

puted extremely efficient (therealtimecontext), see e.g. [12]. A key ingredient is an offline-online-decomposition. In the offline stage, detailed and thus expensive simu- lations (sometimes calledtruth) are computed for a moderate number of the parameters,µ1, . . . ,µN. The arising solutionsu(µ_i),i=1, . . . ,N, of the PPDE (sometimes calledsnapshots) are stored and are used to form a low-dimensional linear space spanned by the reduced basis. In the online stage, an approximationu_N(µ)for a new parameter µ6=µi is determined as the Galerkin projection onto the reduced spaceV_N =span{u(µ_i): i=1, . . . ,N}. A whole variety of results for all sorts of problems has been published in the last years so that an even only halfway complete review including a reference list is far beyond the scope of this paper.

The topic of this paper is the generation of the reduced basis in the offline stage, namely the selection ofµ1, . . . ,µN above. It is nowadays basically standard to use a Greedy method, see e.g. [9]. The starting point is an a-posteriori error estimator

∆_N(µ)for the quantity of interest on a current reduced spaceV_N. Such an estimator can often be constructed in such a way that the evaluation for a given parameterµis highly efficient (in particular independent of the size of the truth system). A training setΞ_train is defined and the error estimator∆_N(µ)is maximized over Ξ_train. The arising maximizerµN+1is used to compute the next snapshotu(µ_N+1)in order to form the reduced spaceVN+1of the next higher dimension. We refer to this approach asGreedy-training.

Even though this approach obviously has the advantage of being efficiently re- alizable, it may also suffer from the following fact: The training setΞtrain needs to be defined. This may be a delicate task sinceΞtrain should be small for efficiency reasons and at the same time sufficiently large in order to represent the whole parameter range as good as possible. The performance of the RBM crucially depends on the choice ofΞtrain.

This is the starting point of the present paper. Instead of maximizing the error estimator∆_N(µ)overΞ_train, we develop a nonlinear optimization problem w.r.t.µ onV_N based upon the residual of the primal (and possibly the dual) problem. We show the well-posedness of this optimization problem and derive first-order optimality conditions. The optimization problem is solved numerically by a gradient- type method. This method suffers from the fact that we can only determine local but not global solutions. To overcome this problem we combine the optimization strategy with a Greedy training on a coarse training setΞtrain.

Let us refer to the recent work [2, 3, 4, 8], where adaptive strategies are suggested for the Greedy-training to overcome the problem with high-dimensional parameter spaces. In the context of the method of proper orthogonal decomposition (POD) nonlinear optimization is utilized in [7] to determine optimal snapshot locations in order to control the number of snapshots and minimize the error in the POD reduced-order model.

The remainder of the paper is organized as follows. In Section 2, we review the basic ingredients of the RBM and develop the nonlinear optimization problem (which, in fact, is a minimization problem). We also prove the existence of a solution (Theorem 2.1). Section 3 is devoted to the derivation of first order optimality conditions (Theorem 3.1) while second-order conditions are discussed in Section 4.

(5)

Finally, in Section 5 we report on numerical experiments in which we compare the optimization method with the known Greedy-training approach.

2 Problem formulation

In this section we introduce our minimization problem and discuss the existence of optimal solutions.

2.1 The exact variational problem

LetD⊂R^Pbe a given nonempty, closed, bounded and convex parameter domain andV a separable Hilbert space. For given `∈V⁰ (V⁰ denotes the space of all bounded and linear functionals defined onV with normk · k_V⁰ and scalar product (·,·)_V0), the goal is to find the scalar output

s(µ):=h`,u(µ)i_V0,V, µ∈D, (1a) whereu(µ)∈Vsatisfies the variational problem (f ∈V⁰given)

a(u(µ),ϕ;µ) =hf,ϕi_V0,V for allϕ∈V. (1b) In (1a), we denote byh·,·i_V⁰_,V the dual pairing of the spacesV⁰andV. Furthermore, in (1b) the parameter-dependent, bilinear forma(·,·;µ):V×V→Ris assumed to have the affine form

a(ϕ,ψ;µ) =

Q q=1

∑

ϑ^q(µ)a^q(ϕ,ψ) forϕ,ψ∈Vandµ∈D

with (twice) continuously differentiable coefficient functionsϑ^q:D→Rand with parameter-independent bounded bilinear formsa^q:V×V →R, 1≤q≤Q. More- over, that the parameter-dependent bilinear formais uniformly bounded and coercive, i.e., there exist constantsα0>0 andγ>0 such that

α(µ):= inf

ϕ∈V\{0}

a(ϕ,ϕ;µ)

kϕk²_V ≥α0>0 for allµ∈D, (2a) a(ϕ,φ;µ)

≤γkϕk_Vkφk_V for allϕ,φ∈Vandµ∈D. (2b) Since the bilinear formsa^qare bounded we assume that

a^q(ϕ,φ)

≤γkϕk_Vkφk_V for allϕ,φ∈V and for 1≤q≤Q. (3) Notice that (2a) implies

(6)

a(ϕ,ϕ;µ)≥α₀kϕk²_V for allϕ∈Vand for allµ∈D. (4) Let us mention that we suppose that both f and`do not depend onµin the affine form only for simplifying the presentation. From (2a) it follows by standard arguments that (1b) has a unique solutionu(µ)∈V for anyµ∈D.

Due to (1a) we require the following dual problem: for givenµ∈Dfindp(µ)∈ V solving

a(ϕ,z(µ);µ) =−h`,ϕi_V0,V for allϕ∈V. (5) Since the bilinear forma(·,·;µ)is bounded and uniformly coercive, the dual problem (5) possesses a unique solutionz(µ)∈Vfor anyµ∈D.

2.2 The truth approximation

Next we introduce a so-called truth approximation for (1). For that purpose let V^N =span{ϕ1, . . . ,ϕ_N} ⊂V be a finite dimensional subspace with linearly independent functionsϕi. The subspaceV^N is endowed with the topology ofV. We think ofN 1 being ‘large’. Then, for anyµ∈Dwe consider the ‘truth’ output

s^N(µ):=h`,u^N(µ)i_V0,V, (6a) whereu^N(µ)∈V^N satisfies the variational equation

a(u^N(µ),ϕi;µ) =hf,ϕii_V0,V for 1≤i≤N . (6b) We define the discrete coercivity constant

α^N(µ):= inf

ϕ^N∈V^N\{0}

a(ϕ^N,ϕ^N;µ)

kϕ^Nk²_V , µ∈D.

UnsingV^N ⊂V and (2a) we find α^N(µ)≥ inf

ϕ∈V\{0}

a(ϕ,ϕ;µ)

kϕk_V² ≥α0 for allµ∈D. Thus, (6b) has a unique solutionu^N(µ)∈V^N for everyµ∈D.

2.3 The reduced-order modelling

Let us introduce a reduced-order scheme for (6). For chosen linearly independent elements{ψ_i}^N_i=1^pr inV^N we defineV_N^pr :=span{ψ₁, . . . ,ψ_N^pr}. Analogously, for linearly independent {φ_i}^N_i=1^du inV^N we set ˜V_Ndu :=span{φ₁, . . . ,φ_Ndu}. We have

(7)

that max(N^pr,N^du)≤N . In the context of reduced-order modeling, max(N^pr,N^du) is much smaller thanN .

For anyµ∈Dwe consider the scalar output

h`,uN(µ)i_V0,V, (7a) whereu_N(µ)∈V_N^prsatisfies the variational equation

a(u_N(µ),ψi;µ) =hf,ψii_V0,V for 1≤i≤N^pr. (7b) For notational convenience, we just writeu_N instead ofu_N^pr (also for other quan- tities) since there should be no misunderstanding. We collect some more or less known facts for later reference.

Lemma 2.1.Suppose that the bilinear form a(·,·;µ)satisfies(2). Further, f ∈V⁰ holds. Then, there exists a unique solution u_N(µ)∈V_N^pr to(7b)for everyµ∈D with

ku_N(µ)k_V ≤kfk_V0

α0

for allµ∈D. (8)

Proof. By assumption, the bilinear forma(·,·;µ)is bounded for everyµ∈D. Since V_N^pr⊂V, the forma(·,·;µ)is also uniformly coercive onV_N^pr. Thus, it follows from the Lax-Milgram theorem that (7b) possesses a unique solutionu_N∈V_N^pr for every µ∈D. Utilizing (4) and (7b) and the uniform coercivity, we obtain

ku_N(µ)k_V²≤a(u_N(µ),u_N(µ);µ) α0

=hf,u_N(µ)i_V0,V

α0

≤kfk_V0

α0

ku_N(µ)k_V, which gives (8).

Remark 2.1.1) Due to Lemma 2.1 we can define the primal (non-linear) solution operatorS_N^pr:D→V_N^pr, whereu_N(µ) =S_N^pr(µ)denotes the unique solution to (7b).

2) Let us consider a specific case. Suppose that the bilinear form is given by a(·,·;µ) =ϑ¹(µ)a¹(·,·) (i.e., Q=1) and ϑ¹(µ)6=0 holds for all µ ∈D. Let u¹_N =u_N(µ₁) be a solution to (7b) for given µ1∈D. Then, the function u²_N=ϑ¹(µ₁)u¹_N/ϑ¹(µ₂)∈V^Nsolves (7b) forµ2∈D. In fact, we have

a(u²_N,ψ_i;µ2) =ϑ¹(µ₂)a¹(u²_N,ψ_i) =ϑ¹(µ₁)a¹(u¹_N,ψ_i) =a(u¹_N,ϕi;µ1)

=hf,ψ_ii_V0,V for 1≤i≤N.

Consequently, solutions to different parameter values are linearly dependent. ♦ For givenµ∈Dthe associated dual variablez_N(µ)solves the dual problem [1], namely

a(φ_i,z_N(µ);µ) =−h`,φii_V0,V, 1≤i≤N^du. (9)

(8)

Remark 2.2.1) If the bilinear form satisfies (2) and`∈V⁰holds, it follows by sim- ilar arguments as in the proof of Lemma 2.1 that (9) admits a unique solution z_N(µ)∈V˜_Ndusatisfying

kz_N(µ)k_V≤k`k_V0

α0

for allµ∈D. (10)

2) We define the (non-linear) solution operator S_N^du :D →V˜_Ndu, where z_N =

S_N^du(µ)is the unique solution to (9). ♦

Next we define the residualsr^pr_N(·;µ),r^du_N(·;µ)∈(V^N)⁰by

r^pr_N(ϕ^N;µ):=hf,ϕ^Ni_V0,V−a(u_N(µ),ϕ^N;µ) forϕ∈V^N andµ∈D, r^du_N(ϕ^N;µ):=h`,ϕ^Ni_V0,V+a(ϕ^N,z_N(µ);µ) forϕ∈V^N andµ∈D. It has turned out that the primal-dual output defined as

s_N(µ):=h`,u_N(µ)i_V0,V−r^pr_N(z_N(µ);µ),

gives rise to favorable output error estimates which take the form (see [12], for instance)

s^N(µ)−s_N(µ)

≤∆_N^s(µ) =kr_N^pr(·;µ)k_(V_N₎0

α₀^1/2

kr_N^du(·;µ)k_(V_N₎0

α₀^1/2

. (11)

Remark 2.3.1) From

u_N(µ) =

N^pr

∑

j=1

u_N,_j(µ)ψj and z_N(µ) =

N^du j=1

∑

z_N,_j(µ)φj

we infer that

r_N^pr(ϕ_i;µ) =hf,ϕ_ii_V0,V−

N^pr

∑

j=1

u_N,_j(µ)a(ψ_j,ϕi;µ)

=hf,ϕ_ii_V0,V−

N^pr

∑

j=1

u_N,_j(µ)

Q q=1

∑

ϑ^q(µ)a^q(ψ_j,ϕi), r^du_N(ϕ_i;µ) =h`,ϕii_V0,V+a(ϕ_i,z_N(µ);µ)

=h`,ϕii_V0,V+

N^du

∑

j=1

z_N,_j(µ)

Q

∑

q=1

ϑ^q(µ)a^q(ϕ_i,φj)

for 1≤i≤N . These representations of the residuals are utilized to realize an efficient offline-online decomposition for the reduced-order approach, see e.g.

[9, 12].

(9)

2) Suppose that the bilinear form is given bya(·,·;µ) =ϑ¹(µ)a¹(·,·)(i.e.,Q=1) andϑ¹(µ)6=0 holds for allµ∈D. Then, solutions to different parameter values are linearly dependent; see Remark 2.1-2). Letµ1,µ2∈Dbe chosen arbitrarily.

By uⁱ_N,i=1,2, we denote the solutions to (7b) for parameter µ =µi. From u²_N=ϑ¹(µ₁)u¹_N/ϑ¹(µ₂)we infer that

V⁰3a(u²_N,·;µ2)−f=ϑ¹(µ₁)

ϑ¹(µ₂)a(u¹_N,·;µ2)−f =a(u¹_N,·;µ1)−f. Hence, the norm ka(u_N(µ),·;µ)−fk_(VN)⁰ is constant for all µ ∈D, where uN(µ) denotes the solution to (7b) for the parameterµ. Analogously, we can prove that the normka(·,zN(µ);µ) +`k_(V_N₎0 is constant for allµ∈D, where z_N(µ)denotes the solution to (9) for the parameterµ. ♦

2.4 The minimization problem

LetN:= (N^pr,N^du),Y_N:=V_N^pr×V˜_Ndu,X_N=Y_N×R^PandX_N^ad=Y_N×D. We endow X_N with the natural product topology. In the Greedy algorithm a new reduced-basis solutionu_N(µ)¯ associated with a certain parameter value ¯µis added to the already computed set of ansatz functions provided an a-posteriori error measure∆_N^s(µ)¯ in (11) is maximal. The idea here is to avoid the Greedy method and to determine ¯µ as the solution of a minimization problem. Thus, we introduce the cost functional J:XN→RforxN= (u_N,zN,µ)∈XN by

J(x_N) =−1

2 kf−a(u_N,·;µ)k²_(V_N₎0+k`+a(·,z_N;µ)k²_(V_N₎0

.

IfJ(x_N(µ))≥ −ε α0holds true forx_N(µ):= (u_N(µ),z_N(µ),µ), we infer by using Young’s inequality that

s^N(µ)−s_N(µ)

2≤kr_N^pr(·;µ)k²_(V_N₎0+kr^du_N(·;µ)k²_(V_N₎0

2α₀ =−J(x_N(µ))

α0

≤ε.

Now we consider the following optimization problem:

min

xN∈X_N^ad

J(x_N) subject to (s.t.) x_N= (y_N,µ),y_N=SN(µ), (P) where we have setSN= (S_N^pr,S_N^du):D→Y_N, i.e.,y_N=SN(µ)means thaty_N= (u_N(µ),z_N(µ)). Introducing the reduced cost functional

J(µ)ˆ :=J(SN(µ),µ) forµ∈D, we can express (P) equivalently in the reduced form

(10)

min

µ∈DJ(µ).ˆ (P)ˆ

If (P) has a local solution ¯ˆ µ∈D, then ¯x_N:= (y¯_N,µ)¯ is a local solution to (P), where we set ¯y_N= (u¯_N,p¯_N):=SN(µ). We now give a general existence result.¯

Theorem 2.1.Suppose that the bilinear form a(·,·;µ)satisfies(2). Further, f and

`belong to V⁰. Then, there exists at least one optimal solutionx¯N= (y¯N,µ),¯ y¯N= (u¯_N,z¯_N)∈Y_N, to(P).

Proof. SinceDis assumed to be nonempty andSN:D→Y_N is well-defined, the set of admissible solutions

F(P)=

x_N= (y_N,µ)∈X_N^ad

y_N=SN(µ)

is nonempty. Let{x⁽ⁿ⁾_N }_n∈_N⊂F(P),x⁽ⁿ⁾_N = (y⁽ⁿ⁾_N ,µ⁽ⁿ⁾)andy⁽ⁿ⁾_N = (u⁽ⁿ⁾_N ,z⁽ⁿ⁾_N ), be a minimizing sequence forJ:

inf

x_N∈F(P)J(x_N) =lim

n→∞J(x⁽ⁿ⁾_N ).

Since D is bounded and the a-priori bounds (8), (10) hold, inf_x_N∈F(P)J(x_N) is bounded from below. Moreover, fromµ⁽ⁿ⁾∈D⊂R^Pfor everynwe infer that there exists a subsequence{µ⁽ⁿ^k⁾}k∈NinDand an element ¯µ∈Dso that

k→∞limµ⁽ⁿ^k⁾=µ¯ inR^P.

It follows from the a-priori estimates (8) and (10) that the sequence{(u⁽ⁿ⁾_N ,z⁽ⁿ⁾_N )}n∈N

is bounded inY_N. Consequently, there exist a subsequence {y⁽ⁿ_N^k⁾}k∈N and a pair

¯

y_N= (u¯_N,p¯_N)∈Y_N such that

u⁽ⁿ_N^k⁾*u¯N fork→∞inVN^pr and z⁽ⁿ_N^k⁾*z¯Nfork→∞in ˜V_Ndu. (12) Next we prove that ¯y_N=SN(µ)¯ holds. For 1≤i≤N^prwe have

hf,ψ_ii_V0,V−a(u¯_N,ψ_i; ¯µ) =a(u⁽ⁿ_N^k⁾,ψ_i;µ⁽ⁿ^k⁾)−a(u¯_N,ψ_i; ¯µ) =

=a(u⁽ⁿ_N^k⁾,ψi;µ⁽ⁿ^k⁾)−a(u⁽ⁿ_N^k⁾,ψi; ¯µ) +a(u⁽ⁿ_N^k⁾−u¯_N,ψi; ¯µ)

=

Q q=1

∑

ϑ^q(µ⁽ⁿ^k⁾)−ϑ^q(µ)¯

a^q(u⁽ⁿ_N^k⁾,ψi)

+a(u⁽ⁿ_N^k⁾−u¯_N,ψi; ¯µ).

Let us define the functionalsFi∈V⁰⊂V_N⁰ byhF_i,ϕi_V0,V:=a(ϕ,ψ_i; ¯µ)forϕ∈V and 1≤i≤N^pr. From (12) we infer that

a(u⁽ⁿ_N^k⁾−u¯_N,ψ_i; ¯µ) =F_i(u⁽ⁿ_N^k⁾−u¯_N)→0 fork→∞and 1≤i≤N^pr. Moreover,ku⁽ⁿ_N^k⁾k_V is uniformly bounded and theϑ^q’s are continuous. Thus,

(11)

Q q=1

∑

ϑ^q(µ⁽ⁿ^k⁾)−ϑ^q(µ)¯

a^q(u⁽ⁿ_N^k⁾,ψ_i)

→0 fork→∞and 1≤i≤Q.

Consequently, ¯u_N =S_N^pr(µ)¯ holds. Analogously, we find that ¯z_N =S^du(µ)¯ holds true. Thus, ¯x_N = (y¯_N,µ)¯ ∈F(P) is satisfied. Next, we show that ¯x_N is a minimizer forJ. Note that with the above arguments

ka(u⁽ⁿ_N^k⁾,·; ¯µ)−a(u⁽ⁿ_N^k⁾,·;µ⁽ⁿ^k⁾)k_(V_N₎0

≤

Q q=1

∑

ϑ^q(µ)¯ −ϑ^q(µ⁽ⁿ^k⁾)

ka^q(u⁽ⁿ_N^k⁾,·)k_(V_N₎0

k→∞−→0.

This and (12) imply

k→∞limkf−a(u⁽ⁿ_N^k⁾,·;µ⁽ⁿ^k⁾)k_(V_N₎0=

=lim

k→∞kf−a(u⁽ⁿ_N^k⁾,·; ¯µ)k_(V_N₎0+lim

k→∞ka(u⁽ⁿ_N^k⁾,·; ¯µ)−a(u⁽ⁿ_N^k⁾,·;µ⁽ⁿ^k⁾)k_(V_N₎0

=kf−a(u¯_N,·; ¯µ)k_(V_N₎0.

Analogously, limk→∞k`+a(·,z⁽ⁿ_N^k⁾;µ⁽ⁿ^k⁾)k_(VN)⁰=k`+a(·,z¯_N; ¯µ)k_(VN)⁰and therefore

inf

xN∈F(P)J(x_N) = lim

k→∞J(x⁽ⁿ_N^k⁾) =J(x¯_N),

i.e., ¯x_Nis a solution to (P). ♦

Before we continue, let us collect some notation that will be needed in the sequel.

Let ¯x_N= (y¯_N,µ), ¯¯ y_N= (u¯_N,z¯_N), be an optimal solution to (P) according to Theorem 2.1. Then, define corresponding (optimal) primal and dual residuals as

r¯^pr_N(ϕ^N):=hf,ϕ^Ni_V0,V−a(u¯N,ϕ^N; ¯µ) forϕ^N ∈V^N,

¯

r^du_N(ϕ^N):=h`,ϕ^Ni_V0,V+a(ϕ^N,z¯_N; ¯µ) forϕ^N ∈V^N. We define the corresponding Riesz representations ¯ρ_N^pr,ρ¯_N^du∈V^N, i.e.,

(ρ¯_N^pr,ϕ^N)_V=r¯^pr_N(ϕ^N) =hf,ϕ^Ni_V0,V−a(u¯_N,ϕ^N; ¯µ) for allϕ^N ∈V^N, (ρ¯_N^du,ϕ^N)_V=r¯^du_N(ϕ^N) =h`,ϕ^Ni_V0,V+a(ϕ^N,z¯_N; ¯µ) for allϕ^N ∈V^N. This in particular implies that

(g,r¯^pr_N)_(V_N₎0=hg,ρ¯_N^pri_(V_N₎0,V^N for allg∈(V^N)⁰,

which will be used later. It is noticable to mention that we have in general ¯ρ_N^pr6∈V_N^pr and ¯ρ_N^du6∈V˜_Ndu.

(12)

3 First-order necessary optimality conditions

First we write the equality constraints in (P) in a compact from. For that purpose we introduce the nonlinear mappinge= (e₁,e2):XN→Y_N⁰ by

he(x_N),λNi_Y0

N,Y_N =he₁(x_N),λ_N¹i_V0

Npr,V_Npr+he₂(x_N),λ_N²i_V_˜0 Ndu,V˜

Ndu

forx_N= (u_N,z_N,µ)∈X_N^ad andλN = (λ_N¹,λ_N²)∈Y_N. Here, we identify the dualY_N⁰ withV_N⁰pr×V˜⁰

N^duand we put he₁(x_N),λ_N¹i_V0

Npr,V_Npr =hf,λ_N¹i_V0

Npr,V_Npr−a(u_N,λ_N¹;µ), he₂(x_N),λ_N²i_V_˜0

Ndu,V˜_Ndu=h`,λ_N²i_V_˜0

Ndu,V˜_Ndu+a(λ_N²,z_N;µ).

Using (2b) we infer that ke(x_N)k_Y0

N= sup

kλ_Nk

YN=1

he(x_N),λNi_Y0 N,Y_N

= sup

kλ_N¹k_V=1

he₁(x_N),λ_N¹i_V0

Npr,V_Npr+ sup

kλ_N²k_V=1

he₂(x_N),λ_N²i_V_˜0 Ndu,V˜_Ndu

≤C_e 1+ku_Nk_V+kz_Nk_V withC_e=max(kfk_V0+k`k_V0,γ).

To derive first-order optimality conditions for (P) we have to ensure that the mappingeis continuously (Fr´echet) differentiable and satisfies a standard constraint qualification; see, e.g., [5, 13].

Proposition 3.1.Suppose that the bilinear form a(·,·;µ) satisfies (2). Further, f, `∈V⁰holds and the functionsϑ^qare continuously differentiable for1≤q≤Q.

Then, the mapping e is continuously (Fréchet) differentiable and its (Fréchet) derivative at xN= (y_N,µ)∈X_Nâd, yN= (u_N,zN), is given by

he⁰(x_N)x^δ_N,λ_Ni_Y0

N,YN=he⁰₁(x_N)x^δ_N,λ_N¹i_V0

Npr,V_Npr+he⁰₂(x_N)x^δ_N,λ_N²i_V_˜0 Ndu,V˜_Ndu

for any direction x^δ_N = (u^δ_N,z^δ_N,µ^δ)∈X_Nand forλN= (λ_N¹,λ_N²)∈Y_N, where he⁰₁(x_N)x^δ_N,λ_N¹i_V_˜0

Npr,V˜_Npr =−a(u^δ_N,λ_N¹;µ)−

Q

∑

q=1

a^q(u_N,λ_N¹)∇ϑ^q(µ)^>µ^δ,

he⁰₂(x_N)x^δ_N,λ_N²i_V0 Ndu,V

Ndu

=a(λ_N²,z^δ_N;µ) +

Q q=1

∑

a^q(λ_N²,zN)∇ϑ^q(µ)^>µ^δ

with ∇ϑ^q(µ) = (ϑ_µ^q₁(µ), . . . ,ϑ_µ^q_P(µ))^> ∈R^P and ϑ_µ^q_i = ^{∂ ϑ}^q

∂ µ_i. Furthermore, the (Fr´echet) derivative e⁰(x_N):X_N→Y_N⁰ is a surjective operator for every x_N∈X_N^ad.

(13)

Proof. It follows by standard arguments thateis (Fr´echet) differentiable for ev- eryx_N ∈X_N^ad. Therefore, we only prove that the linear operatore⁰(x_N)is onto. Let F_N= (F_N¹,F_N²)∈Y_N⁰ be chosen arbitrarily. Then,e⁰(x_N)is surjective if there exists an elementx^δ_N= (u^δ_N,z^δ_N,µ^δ)∈X_N satisfying

e⁰(x_N)x^δ_N=FN inY_N⁰. (13) Equation (13) is equivalent with

e⁰₁(x_N)x^δ_N=F_N¹inV_N⁰pr in e⁰₂(x_N)x^δ_N=F_N² in ˜V_N⁰_du. (14) Choosingµ^δ =0 we obtain from (14) that

a(u^δ_N,λ_N¹;µ) =−hF_N¹,λ_N¹i_V0

Npr,V_Npr for allλ_N¹∈V_N^pr, a(λ_N²,z^δ_N;µ) =hF_N²,λ_N²i_V_˜0

Ndu,V˜

Ndu for allλ_N²∈V˜_Ndu. (15) Since the bilinear form a(·,·;µ)is bounded and coercive, there exists a unique pairy^δ_N= (u^δ_N,z^δ_N)∈Y_Nsolving (15). Summarizing,x^δ_N= (y^δ_N,0)solves (13) which implies thate⁰(x_N)is surjective.

Next let us introduce the Lagrange functional L : X_N×Y_N →R for x_N = (x¹_N,x²_N,µ)∈X_NandλN= (λ_N¹,λ_N²)∈Y_Nas

L(x_N,λN) =J(x_N) +he(x_N),λNi_Y0 N,YN

=−1

2 kf−a(u_N,·;µ)k²_(V_N₎0+ka(·,z_N;µ) +`k²_(V_N₎0 +h(f, `),λNi_Y0

N,Y_N−a(u_N,λ_N¹;µ) +a(λ_N²,z_N;µ).

We infer from Proposition 3.1 that first-order necessary optimality conditions are given as follows [5, 13]: Let ¯x_N = (y¯_N,µ)¯ ∈X_N^ad, ¯y_N = (u¯_N,z¯_N)∈Y_N, be a local solution to (P). Then, there exists a Lagrange multiplier ¯λ_N= (λ¯_N¹,λ_N²)∈Y_N solving the following system

LuN(x¯_N,λ¯_N)u^δ_N=0 for allu^δ_N∈V_N^pr, (16a) Lz_N(x¯N,λ¯N)z^δ_N=0 for allz^δ_N∈V˜_Ndu, (16b) Lµ(x¯_N,λ¯N)(µ^δ−µ)¯ ≥0 for allµ^δ ∈D, (16c) where, for instance, LuN denote the (Fr´echet) derivative of the Lagrangian with respect to the argumentu_N. First we study (16a). Foru^δ_N∈V_N^pr we find

LuN(x¯_N,λ¯N)u^δ_N= (f−a(u¯_N,·; ¯µ),a(u^δ_N,·; ¯µ))_(V_N₎0−a(u^δ_N,λ¯_N¹; ¯µ).

Using the Riesz representation ¯ρ_N^pr∈V^N of ¯r^pr_N ∈(V^N)⁰, we get

(14)

Lu_N(x¯_N,λ¯N)u^δ_N= (¯r_N^pr,a(u^δ_N,·; ¯µ))_(V_N₎0−a(u^δ_N,λ¯_N¹; ¯µ)

=a(u^δ_N,ρ¯_N^pr; ¯µ)−a(u^δ_N,λ¯_N¹; ¯µ) =a(u^δ_N,ρ¯_N^pr−λ¯_N¹; ¯µ).

(17)

From (16a) and (17) we infer the first adjoint equation:

a(u^δ_N,λ¯_N¹; ¯µ) =a(u^δ_N,ρ¯_N^pr; ¯µ) for allu^δ_N ∈VN^pr. (18) Remark 3.4.Since in general ¯ρ_N^pr6∈V_N^prholds, we obtain in general ¯λ_N¹6=ρ¯_N^pr. Rather, λ¯_N¹is thea-orthogonal projection of ¯ρ_N^pr∈V onto ¯λ_N¹∈VN^pr. ♦

Further, we have

LzN(x¯_N,λ¯_N)z^δ_N =−(`+a(·,z¯_N; ¯µ),a(·,z^δ_N; ¯µ))_(V_N₎0+a(λ¯_N²,z^δ_N; ¯µ) (19) for any direction z^δ_N ∈V˜_Ndu. Using the Riesz representation ¯ρ_N^du∈V^N of ¯r^du_N ∈ (V^N)⁰, combining (16b) and (19) we get

Lz_N(¯x_N,λ¯N)z^δ_N=a(λ¯_N²−ρ¯_N^du,z^δ_N; ¯µ) =0 for allz^δ_N∈V_Ndu

which gives the second adjoint equation

a(λ¯_N²,z^δ_N; ¯µ) =a(ρ¯_N^du,z^δ_N; ¯µ) for allz^δ_N∈V_Ndu. (20) Remark 3.5.Analogous to Remark 3.4 we infer that ¯λ_N²is thea-orthogonal decom-

position of ¯ρ_N^duonto ˜V_Ndu. ♦

Next we consider (16c). Using the Riesz representations ¯ρ_N^pr,ρ¯_N^du ∈V^N of

¯

r^pr_N,r¯^du_N ∈(V^N)⁰, respectively, it follows that Lµ(x¯_N,λ¯_N)µ^δ =

Q

∑

q=1

∇ϑ^q(µ)¯ ^>µ^δ(¯r_N^pr,a^q(u¯_N,·))_(V_N₎0

+

Q q=1

∑

∇ϑ^q(µ)¯ ^>µ^δ

(−¯r_N^du,a^q(·,z¯_N))_V0+a^q(λ¯_N²,z¯_N)−a^q(u¯_N,λ¯_N¹)

=

Q q=1

∑

a^q(u¯_N,ρ¯_N^pr−λ¯_N¹) +a^q(λ¯_N²−ρ¯_N^du,z¯_N)

∇ϑ^q(µ)¯ ^>µ^δ

(21)

for any directionµ^δ ∈R^P. We define the Jacobi matrix

Dϑ(µ) =¯







∇ϑ¹(µ)¯ ^>

...

∇ϑ^Q(µ)¯ ^>





∈R^Q×P

with∇ϑ^q(µ) = (ϑ_µ^q₁(µ), . . . ,ϑ_µ^q_P(µ))^>∈R^Pandϑ_µ^q_i =^{∂ ϑ}^q

∂ µi. Further, we set ¯ξ = ξ¯(x¯_N,λ¯_N) = (ξ¯₁, . . . ,ξ¯_Q)^>∈R^Qwith

(15)

ξ¯_q=a^q(u¯_N,ρ¯_N^pr−λ¯_N¹) +a^q(λ¯_N²−ρ¯_N^du,z¯_N) for 1≤q≤Q.

Then, we derive from (16c) and (21) Dϑ(µ)¯ ^>ξ¯>

(µ^δ−µ)¯ ≥0 for allµ^δ ∈D. (22) Summarizing we have proved the following result.

Theorem 3.1.Suppose that the bilinear form a(·,·;µ)satisfies(2). Further, f, `∈ V⁰ holds and the functions ϑ^q are continuously differentiable for1≤q≤Q. Let

¯

x_N = (y¯_N,µ)¯ ∈X_N^ad, y¯_N = (u¯_N,z¯_N)∈Y_N, be a local solution to (P). Then, there exists a unique associated Lagrange multiplier pairλ¯_N = (λ¯_N¹,λ_N²)∈Y_N satisfying together withx¯_N the first-order necessary optimality conditions(18),(20)and(22).

The gradient∇Jˆof the reduced cost functional ˆJat a pointµ∈Dis given by the formula [5, 13]

∇J(µ) =ˆ Dϑ(µ)^>ξ ∈R^P, (23) where the components of the vectorξ ∈R^Qare

ξ_q=a^q(u_N,ρ¯_N^pr−λ_N¹) +a^q(λ_N²−ρ¯_N^du,z_N) for 1≤q≤Q, (u_N,z_N) =S(µ)holds andλ_N= (λ_N¹,λ_N²)∈Y_N solves the dual system

a(u^δ_N,λ_N¹;µ) =a(u^δ_N,ρ_N^pr;µ) for allu^δ_N∈V_N^pr, a(λ_N²,z^δ_N;µ) =a(ρ_N^du,z^δ_N;µ) for allz^δ_N∈V˜_Ndu.

Here,ρ_N^pr,ρ_N^du∈V^N are the Riesz representants of the residualsr^pr_N(·;µ),r^du_N(·;µ)∈ (V^N)⁰, respectively.

Remark 3.6.Suppose that the bilinear form is given bya(·,·;µ) =ϑ¹(µ)a¹(·,·) (i.e.,Q=1) andϑ¹(µ)6=0 holds for allµ∈D. Then, solutions to different parameter values are linearly dependent; see Remark 2.1-2) and Remark 2.3-2). Then, it follows fromϑ¹(µ)6=0, (18) and (20) that

a¹(u^δ_N,λ¯_N¹) =a¹(u^δ_N,ρ¯_N^pr) for allu^δ_N∈V_N^pr, a¹(λ¯_N²,z^δ_N) =a¹(ρ¯_N^du,z^δ_N) for allz^δ_N∈V˜_Ndu.

In particular, a¹(u¯_N,ρ¯_N^pr−λ¯_N¹) =a¹(λ¯_N²−ρ¯_N^du,z¯_N) =0 holds true, which gives ξ1=0. Therefore,∇J(µ) =ˆ 0 is satisfied. This coincides with the observation in Remark 2.3-2 that the mappings

µ7→ ka(S_N^pr(µ),·;µ)−fk_(V_N₎0 and µ7→ ka(·,SN^du(µ);µ) +`k_(V_N₎0

are constant. ♦

(16)

4 Second-order derivatives

To solve (P) in our numerical experiments we apply a globalized sequential quadratic programming (SQP) method which is makes use of second-order derivatives of the Lagrange functional; see [10], for example. For that reason we address second-order optimality conditions in this section. We restrict ourselves to simple bounds, i.e., we assume that the bounded and convex parameter setDis given by

D=

µa,1,µb,1

×. . .×

µa,P,µb,P

| {z }

P-times

⊂R^P

with lower and upper bounds µa,i ≤µb,i, 1≤i≤P. Let ¯x_N = (y¯_N,µ)¯ ∈X_N^ad,

¯

y_N= (u¯_N,z¯_N)∈Y_N, be a solution to the first-order necessary optimatity conditions for (P); see Theorem 3.1. Moreover, the pair ¯λN = (λ¯_N¹,λ¯_N²)∈Y_N denotes for the associated unique Lagrange multiplier. We suppose that the functionsϑ^qare twice continuously differentiable. Foru^δ_N,u˜^δ_N ∈V_N^prwe deduce

Lu_Nu_N(x¯_N,λ¯N)(u^δ_N,u˜^δ_N) =−(a(u˜^δ_N,·; ¯µ),a(u^δ_N,·; ¯µ))_(V_N₎0. (24) Analogously, we find forz^δ_N,z˜^δ_N∈V˜_Ndu

LzNzN(x¯_N,λ¯_N)(z^δ_N,z˜^δ_N) =−(a(·,z˜^δ_N; ¯µ),a(·,z^δ_N; ¯µ))_(V_N₎0. (25) Further, it follows that

LuNzN(¯x_N,λ¯_N)(u^δ_N,z^δ_N) =LzNuN(x¯_N,λ¯_N)(z^δ_N,u^δ_N). (26) foru^δ_N ∈V_N^pr andz^δ_N ∈V˜_Ndu. Using ¯r^pr_N =f−a(u¯_N,·; ¯µ)∈V⁰ and the Riesz repre- sentant ¯ρ_N^pr∈V of ¯r_N^prwe observe that

Lµu_N(x¯N,λ¯N)(u^δ_N,µ^δ) =Lu_Nµ(x¯N,λ¯N)(u^δ_N,µ^δ)

=

Q q=1

∑

∇ϑ^q(µ)¯ ^>µ^δ

a^q(u^δ_N,ρ¯_N^pr−λ¯_N¹)−(a^q(u¯N,·),a(u^δ_N,·; ¯µ))_(V_N₎0

.

foru^δ_N∈V_N^pr andµ^δ ∈R^P. Let ¯ζ_N^pr,q∈V^N, 1≤q≤Q, denote the Riesz representants ofa^q(u¯_N,·)∈(V^N)⁰, i.e.

hζ_N^pr,q,ϕ^Ni_V =a^q(u¯_N,ϕ^N) for allϕ^N ∈V^N. Then, we derive that

Lu_Nµ(¯xN,λ¯N)(u^δ_N,µ^δ)

=

Q q=1

∑

∇ϑ^q(µ)¯ ^>µ^δ a^q(u^δ_N,ρ¯_N^pr−λ¯_N¹)−a(u^δ_N,ζ¯_N^pr,q; ¯µ) (27)