An extension of the projected gradient method to a Banach space setting with application in structural topology

(1)

Universit¨ at Regensburg Mathematik

An extension of the projected gradient method to a Banach space setting with application in structural topology

optimization

Luise Blank and Christoph Rupprecht

Preprint Nr. 04/2015

(2)

An extension of the projected gradient method to a Banach space setting with application in

structural topology optimization.

Luise Blank, Christoph Rupprecht

Abstract

For the minimization of a nonlinear cost functional j under convex con- straints the relaxed projected gradient process

ϕ_k+1=ϕ_k+α_k(P_H(ϕ_k−λ_k∇_Hj(ϕ_k)) −ϕ_k)

as formulated e.g. in [12] is a well known method. The analysis is classically performed in a Hilbert space H. We generalize this method to functionals j which are dierentiable in a Banach space. Thus it is possible to perform e.g. an L² gradient method ifj is only dierentiable in L^∞. We show global convergence using Armijo backtracking inα_k and allow the inner product and the scalingλ_k to change in every iteration. As application we present a structural topology optimization problem based on a phase eld model, where the reduced cost functionaljis dierentiable inH¹∩L^∞. The presented numerical results using the H¹ inner product and a pointwise chosen metric including second order information show the expected mesh independency in the iteration numbers. The latter yields an additional, drastic decrease in iteration numbers as well as in computation time. Moreover we present numerical results using a BFGS update of the H¹ inner product for further optimization problems based on phase eld models.

Key words: projected gradient method, variable metric method, convex con- straints, shape and topology optimization, phase eld approach.

AMS subject classication: 49M05, 49M15, 65K, 74P05, 90C.

1 Introduction

Let j be a functional on a Hilbert space H with inner product (., .)^H and induced norm ∥.∥H and let Φ_ad⊆H be a non-empty, convex and closed subset. We consider

(3)

the optimization problem

minj(ϕ) subject toϕ∈Φ_ad. (1) If j is Fréchet dierentiable with respect to ∥.∥H, the classical projected gradient method introduced in Hilbert space in [18] and [23] can be applied, which moves in the direction of the negative H-gradient −∇^Hj ∈ H, which is characterized by the equality (∇Hj(ϕ), η)H = ⟨j^′(ϕ), η⟩H^∗,H ∀η∈H and orthogonally projects the result back on Φ_ad to stay feasible, i.e.

ϕ_k+1=P_H(ϕ_k−λ_k∇^Hj(ϕ_k)). (2) To obtain global convergenceλ_khas to be chosen according to some step length rule, which results in a gradient path method, or one can perform a line search along the descent directionvk=PH(ϕk−λk∇^Hj(ϕk))−ϕk. A typical application isH=L²(Ω), see e.g. [21].

In this paper we consider the case that j is dierentiable with respect to a norm which is not induced by a inner product. Hence noH-gradient∇^Hj exists. However, in Section 2 we reformulate the method such that it is well dened under weaker conditions. We show global convergence when Armijo backtracking is applied along vk and allow the inner product and the scaling λk to change in every iteration. We call this generalization `variable metric projection' type (VMPT) method. In Section 3 we study the applicability of the method to a structural topology optimization problem, namely the mean compliance minimization in linear elasticity based on a phase eld model. Then the reduced cost functional is dierentiable only inH¹∩L^∞. In the last section we show numerical results for this mean compliance problem. As expected choosing the H¹ metric leads to mesh independent iteration numbers in contrast to the L² metric. We also present the choice of a variable metric using second order information and the choice of a BFGS update of the H¹ metric. This reduces the iteration numbers to less than a hundreth. Moreover, we give additional numerical examples for the successful application of the VMPT method. These include a problem of compliant mechanism, drag minimization of the Stokes ow and an inverse problem.

2 Variable metric projection type (VMPT) method

2.1 Generalization of the projected gradient method

The orthogonal projectionP_H(ϕ_k−λ_k∇^Hj(ϕ_k)) employed in (2) is the unique solution of

y∈Φminad

1

2∥(ϕ_k−λ_k∇^Hj(ϕ_k)) −y∥²H,

(4)

which is equivalent to the problem

y∈Φmin_ad 1

2∥y−ϕ_k∥²H +λ_kDj(ϕ_k, y−ϕ_k), (3) since (∇^Hj(ϕ_k), y−ϕ_k)^H = j^′(ϕ_k)(y−ϕ_k) =Dj(ϕ_k, y−ϕ_k) where the last denotes the directional derivative of j at ϕ_k in direction y−ϕ_k. If e.g. Dj(ϕ_k, y) is linear and continuous with respect to y ∈H the cost functional of (3) is strictly convex, continuous and coercive in H, and hence (3) has a unique solution ϕ¯_k [10]. In the formulation (3) the existence of the gradient ∇Hj is not required. Even Gâteaux dierentiability can be omitted.

In the following we formulate an extension of the projected gradient method where P_H(ϕ_k−λ_k∇Hj(ϕ_k)) is replaced by the solution ϕ¯_k of (3).

First we drop the requirement of a gradient as mentioned above. We assume that the admissible setΦ_ad is a subset of an intersection of Banach spaces X∩D, where X and D have certain properties (see (A1)), which are e.g. fullled for X=H¹(Ω) or X=L²(Ω) and D=L^∞(Ω). Furthermore assume that j is continuously Fréchet dierentiable on Φ_ad with respect to the norm ∥.∥X∩D ∶= ∥.∥X+ ∥.∥D. The Fréchet derivative of j at ϕ is denoted by j^′(ϕ) ∈ (X∩D)^∗ and we write ⟨., .⟩ for the dual paring in the space X∩D. Moreover, we use C as a positive universal constant throughout the paper.

Secondly, we also allow the norm∥.∥H in (3) to change in every iteration. Therefore, we consider a sequence{a_k}^k≥0of symmetric positive denite bilinear forms inducing norms∥.∥^ak on X∩D . This approach falls into the class of variable metric methods and includes the choice of Newton and Quasi-Newton based search directions (see for example [2, 13] and [19] for the unconstrained case). In [2] these methods are called scaled gradient projection methods and in the case of a_k=j^′′(ϕ_k) also constrained Newton's method. In nite dimensiona_k is given bya_k(p, v) ∶=p^TB_kv whereB_k can be the Hessian at ϕ_k or an approximation of it.

Hence, in each step of the VMPT method the projection type subproblem

y∈Φmin_ad 1

2∥y−ϕ_k∥²ak+λ_k⟨j^′(ϕ_k), y−ϕ_k⟩ (4) with some scaling parameter λ_k > 0 has to be solved. Problem (4) is formally equivalent to the projection P_a_k(ϕ_k−λ_k∇akj(ϕ_k)). However, j is not necessarily dierentiable with respect to ∥.∥^ak and X∩D endowed with a_k(., .) is only a pre- Hilbert space. Hence ∇^akj(ϕ_k) does not need to exist. For globalization of the method we perform a line search based on the widely used Armijo back tracking, which results in Algorithm 2.1. In the next section it is shown that the algorithm is well dened under certain assumptions and in particular that a unique solution ϕ¯_k of (4) exists, together with the proof of convergence. We denote the solution of (4) also by P^k(ϕ_k) due to the connection to a projection.

(5)

Algorithm 2.1 (VMPT method).

1: Choose 0<β<1, 0<σ<1 and ϕ₀∈Φ_ad.

2: k ∶=0

3: while k≤k_max do

4: Choose λ_k and a_k.

5: Calculate the minimum ϕ_k= Pk(ϕ_k) of the subproblem (4).

6: Set the search direction vk∶=ϕ_k−ϕk

7: if ∥v_k∥X≤tol then

8: return

9: end if

10: Determine the step length α_k∶=β^m^k with minimal m_k∈N0 such that j(ϕ_k+α_kv_k) ≤j(ϕ_k) +α_kσ⟨j^′(ϕ_k), v_k⟩.

11: Update ϕk+1 ∶=ϕk+αkvk

12: k∶=k+1

13: end while

The stopping criterion ∥v_k∥X ≤tol is motivated by the fact that ϕ_k is a stationary point of j if and only if v_k=0 and v_k →0 in X, cf. Corollary 2.6 and Theorem 2.2.

We would like to mention, that this algorithm is not a line search along the gradient path , which is widely used (e.g. in [2, 14, 15, 17, 18, 19, 20, 21, 25]) and which requires to solve a projection type subproblem like (2) in each line search iteration.

This can be unwanted if calculating the projection is expensive compared to the evaluation of j. To avoid this we perform a line search along the descent direction v_k, which is suggested e.g. in nite dimension or in Hilbert spaces in [2, 19, 24] and is also used in [13]. To include the idea of the gradient path approach, we imbed the possibility to vary the scaling factor {λ_k}^k≥0 for the formal gradient in (4) in each iteration. The parameter λ_k can be put into a_k by dividing the cost in (4) by λk. However, we treat it as a separate parameter since this reects the case where a_k is xed for all iterations. Note that under the assumptions used in this paper a line search along the gradient path is not possible since not even the existence of a positive step length can be shown, cf. Remark 2.8.

Moreover, there is a clear connection to sequential quadratic programming, consid- ering that P^k(ϕk) is the solution of the quadratic approximation of minϕ∈Φ_adj(ϕ) with

y∈Φminad

j(ϕ_k) + ⟨j^′(ϕ_k), y−ϕ_k⟩ +1

2a_k(y−ϕ_k, y−ϕ_k).

However, the global convergence result is analysed by means of projected gradient theory.

(6)

2.2 Global convergence result

We perform the analysis of the method with respect to two norms in the spaces X and D, which we assume to have the following properties:

(A1) X is a reexive Banach space. D is isometrically isomorphic to B^∗, where B is a separable Banach space. Moreover, for any sequence {ϕi} in X∩D with ϕ_i →ϕweakly in X and ϕ_i→ϕ˜ weakly-* in D, it holdsϕ=ϕ˜.

We identify D and B^∗and say that a sequence converges weakly-* in D if it converges weakly-* in B^∗. The separability of B is used to get weak-* sequential compactness in D. We would like to mention that the results hold also if D is a reexive Banach space, in particular if D is an Hilbert space. In this case weak-* convergence has to be replaced by weak convergence throughout the paper. However, in the application we are interested in D=L^∞(Ω).

In case of the Sobolev space X=W^k,p(Ω)and D=L^q(Ω)whereΩ⊆R^dis a bounded domain, k≥0, 1<p< ∞ and 1<q≤ ∞ the above assumption is fullled.

In addition to the above conditions on X and D let the following assumptions hold for the problem (1):

(A2) Φ_ad ⊆X∩D is convex, closed in X and non-empty.

(A3) Φ_ad is bounded in D.

(A4) j(ϕ) ≥ −C> −∞ for some C>0 and allϕ∈Φ_ad.

(A5) j is continuously dierentiable in a neighbourhood of Φ_ad⊆X∩D.

(A6) For each ϕ∈Φ_ad and for each sequence {ϕ_i} ⊆X∩D with ϕ_i →0 weakly in X and weakly-* in D it holds⟨j^′(ϕ), ϕ_i⟩ →0 as i→ ∞.

Moreover, we request for the parameters a_k and λ_k of the algorithm that:

(A7) {a_k} is a sequence of symmetric positive denite bilinear forms on X∩D.

(A8) It exists c₁>0 such thatc₁∥p∥²_X≤ ∥p∥²a_k for all p∈X∩D and k∈N0. (A9) For all k∈N0 it exists c₂(k)such that ∥p∥²ak ≤c₂∥p∥²_X∩D for all p∈X∩D.

(A10) For all k ∈N0, p∈Φ_ad and for each sequence {y_i} ⊆Φ_ad where there exists somey∈X∩D with y_i →y weakly in X and weakly-* in D it holdsa_k(p, y_i) → ak(p, y) asi→ ∞.

(A11) For each subsequence{ϕki}ⁱ of the iterates given by Algorithm 2.1 converging in X∩D, the corresponding subsequence {a_k_i}ⁱ has the property that a_k_i(p_i, y_i) →0 for any sequences {p_i},{y_i} ⊆X∩D with p_i →0 strongly in X and weakly-* in D and{y_i}converging in X∩D.

(A12) It holds 0<λ_min≤λ_k≤λ_max for all k∈N0.

(7)

(A1)-(A12) are assumed throughout this paper if not mentioned otherwise.

Assumption (A11) reects the possibility of a point based choice of ak, e.g. de- pendent on the HessianD²j(ϕ_k)or on an approximation of the Hessian. Note that (A9)-(A11) is weaker than the assumption ∥p∥²ak ≤c₂∥p∥²_X. In (21) an example of ak is given, which only fullls these weaker assumptions. Also (A8) is weaker than c₁∥u∥²_X∩D≤ ∥u∥²ak. The main result of the paper is the following, which is proved in Section 2.3.

Theorem 2.2. Let {ϕ_k} ⊆ Φ_ad be the sequence generated by the VMPT method (Algorithm 2.1) with tol=0 and let the assumptions (A1)-(A12) hold, then:

1. limk→∞j(ϕ_k) exists.

2. Every accumulation point of {ϕ_k} in X∩D is a stationary point of j.

3. For all subsequences with ϕ_k_i →ϕ in X∩D where ϕ is stationary, the subsequence {v_k_i}i converges strongly in X to zero.

4. If additionally j∈C^1,γ(Φ_ad) with respect to ∥.∥X∩D for some 0<γ ≤1 then the whole sequence {v_k}^k converges to zero in X.

In the classical Hilbert space setting, i.e. D=X=H for some Hilbert space H, the assumption (A3) can be dropped. Also assumption (A6) is trivial because of (A5).

Moreover, assumptions (A7)-(A11) are fullled for the choice a_k(p, v) = (p, A_kv)^H where Ak ∈ L(H) is a self-adjoint linear operator with m∥p∥²H ≤ (p, Akp)^h ≤M∥p∥²H

and M ≥ m > 0 independent of k. This is e.g. assumed in the local convergence theory in [15, 17] and in nite dimension for global convergence in [2, 24]. For the special choiceak(p, v) = (p, v)^H, global convergence is shown in [19] and for the case of a line search along the gradient path in [14]. Result 4. of Theorem 2.2 is shown in [20] in case of a line search along the gradient path under the same assumption j ∈C^1,γ. Thus the presented method is a generalization of the classical method in Hilbert space.

We would also like to mention the following:

Remark 2.3. If there exists C > 0 such that ∥p∥D ≤ C∥p∥X for all p ∈ X∩D, assumption (A3) can be omitted.

If X is a Hilbert space, the choice a_k(u, v) = (u, v)^H fullls all assumptions (A7)- (A11).

2.3 Analysis and proof of the convergence result of the VMPT method

We rst show the existence and uniqueness of ϕ_k = P^k(ϕ_k) based on the direct method in the calculus of variations using the following Lemma and assumptions

(8)

(A2), (A3) and (A5)-(A10). Note that the standard proof cannot be applied, since ak is indeed X-coercive, but ak and ⟨j^′(ϕk),⋅⟩are not X-continuous. Another diculty is that X∩D is not necessarily reexive.

Lemma 2.4. Let {p_k} ⊆Φ_ad with p_k→p weakly in X for somep∈Φ_ad. Thenp_k→p weakly-* in D.

Proof. Since Φ_ad is bounded in D and the closed unit ball of D is weakly-* sequen- tially compact due to the separability of B, we can extract from any subsequence of {p_k} ⊆Φ_ad another subsequence {p_k_i}with p_k_i →p˜weakly-* in D for somep˜∈D.

Due to the required unique limit in X and D we have p˜= p. Since for any subsequence we nd a subsequence converging to the same p, we have that the whole sequence converges to p.

Theorem 2.5. For any k∈N0 and ϕ∈Φ_ad, the problem

y∈Φminad

1

2∥y−ϕ∥²ak+λ_k⟨j^′(ϕ), y−ϕ⟩ (5) admits a unique solution ϕ¯ ∶= P^k(ϕ), which is given by the unique solution of the variational inequality

a_k(ϕ¯−ϕ, η−ϕ¯) +λ_k⟨j^′(ϕ), η−ϕ¯⟩ ≥0 ∀η∈Φ_ad. (6) Proof. Letk ∈N0 and ϕ∈Φ_ad arbitrary. Problem (5) is equivalent to

y∈Φmin_ad gk(y) ∶=¹₂ak(y, y) + ⟨bk, y⟩ (7) where⟨b_k, y⟩ ∶=λ_k⟨j^′(ϕ), y⟩ −a_k(ϕ, y)and b_k∈ (X∩D)^∗ due to (A5) and (A9). By (A3) and (A8) we get for any y∈Φ_ad with some genericC >0

g_k(y) ≥c₁

2∥y∥²X− ∥b_k∥⁽X∩D)^∗(∥y∥X+ ∥y∥D

±_≤C ) ≥ −C. (8)

Thus g_k is X-coercive and bounded from below on Φ_ad. Hence we can choose an inmizing sequenceϕ_i∈Φ_ad, such thatg_k(ϕ_i)ÐÐ→^i→∞ infy∈Φ_adg_k(y). From the estimate (8) we conclude that{ϕ_i}ⁱis bounded in X. Therefore, we can extract a subsequence (still denoted byϕ_i) which converges weakly in X to someϕ¯∈X. SinceΦ_adis convex and closed in X, it is also weakly closed in X and thusϕ¯∈Φ_ad. By Lemma 2.4 we also getϕ_i →ϕ¯weakly-* in D. Finally we showg_k(ϕ¯) =inf_y∈Φ_adg_k(y). Using (A6), (A8) and (A10) one can show thatlim inf_ia_k(ϕ_i, ϕ_i) ≥a_k(ϕ,¯ ϕ¯)andlim_i⟨b_k, ϕ_i⟩ = ⟨b_k,ϕ¯⟩, thus lim inf_ig_k(ϕ_i) ≥g_k(ϕ¯). We conclude

y∈Φinfad

g_k(y) ≤g_k(ϕ¯) ≤lim inf

i g_k(ϕ_i) = inf

y∈Φad

g_k(y),

(9)

which shows the existence of a minimizer of (7). Using (A8), the uniqueness follows from strict convexity ofgk.

Due to (A5) and (A9), we have that g_k is dierentiable in X∩D, where its directional derivative at ϕ¯in direction η−ϕ¯ for arbitraryη∈Φ_ad is given by

⟨g_k^′(ϕ¯), η−ϕ¯⟩ =a_k(ϕ¯−ϕ, η−ϕ¯) +λ_k⟨j^′(ϕ), η−ϕ¯⟩ .

Since the problem (5) is convex, it is equivalent to the rst order optimality condition, which is given by the variational inequality (6), see [25].

We see that ϕ ∈ Φ_ad is a stationary point of j, i.e. ⟨j^′(ϕ), η−ϕ⟩ ≥ 0 ∀η ∈ Φ_ad, if and only if ϕ=ϕ is the solution of (6), i.e. the xed point equation ϕ= P^k(ϕ) is fullled. This leads to the classical view of the method as a xed point iteration ϕk+1= Pk(ϕ_k) in the case that Pk is independent of k and α_k=1 is chosen.

Corollary 2.6. If there exists some k ∈N0 with Pk(ϕ) =ϕ then ϕ is a stationary point of j. On the other hand, if ϕ∈Φ_ad is a stationary point of j then the x point equationP^k(ϕ) =ϕholds for allk ∈N0. In particular, an iterate ϕ_k of the algorithm is a stationary point of j if and only if v_k= Pk(ϕ_k) −ϕ_k=0.

The variational inequality (6) tested withη=ϕ∈Φ_ad together with (A8) and (A12) yields thatP^k(ϕ) −ϕis a descent direction for j:

Lemma 2.7. Let k∈N0, ϕ∈Φ_ad and v∶= P^k(ϕ) −ϕ. Then it holds

⟨j^′(ϕ), v⟩ ≤ − c1

λ_max∥v∥²_X. (9)

Note that (9) does not hold in the X∩D-norm.

Due to ⟨j^′(ϕ), v⟩ <0for v ≠0the step length selection by the Armijo rule (see step 10 in Algorithm 2.1) is well dened, which can be shown as in [2].

Remark 2.8. For the existence of a step length and for the global convergence proof we exploit that the pathα↦ϕ_k+αv_k is continuous in X∩D. Thus, also the mapping α↦j(ϕk+αvk)is continuous. On the other hand, this does not hold for the gradient path. Backtracking along the gradient path or projection arc means that α_k is set to 1, whereas λ_k=β^m^k is chosen with m_k∈N0 minimal such that the Armijo condition

j(ϕ_k(λ_k)) ≤j(ϕ_k) +σ⟨j^′(ϕ_k), ϕ_k(λ_k) −ϕ_k⟩

is satised, see for instance [21]. By the notation ϕ_k(λ_k) we emphasize that the solution of the subproblem (4) depends on λ_k. However, with the above assumptions it cannot be shown that there exists such a λ_k. The reason is that due to (A8) the gradient path λ↦ϕ_k(λ) is continuous with respect to the X-norm, whereas j is due to (A5) only dierentiable with respect to the X∩D-norm. Thus, j along the gradient path, i.e. the mapping λ↦j(ϕ_k(λ)), may be discontinuous.

(10)

To prove statement 2. of Theorem 2.2 we use, as in [2] for nite dimensions, that vk is gradient related. This is weaker than the common angle condition. Therefor we need the following two lemmata:

Lemma 2.9. For {ϕ_k}^k⊆Φ_ad withϕ_k→ϕin X∩D and{p_k}^k⊆X∩D with p_k→p weakly in X and weakly-* in D for someϕ, p∈X∩D it holds⟨j^′(ϕk), pk⟩ → ⟨j^′(ϕ), p⟩.

Proof. We use (A5) and (A6) and obtain

∣ ⟨j^′(ϕ_k), p_k⟩ − ⟨j^′(ϕ), p⟩ ∣ ≤ ∣ ⟨j^′(ϕ_k) −j^′(ϕ), p_k⟩ ∣ + ∣ ⟨j^′(ϕ), p_k−p⟩ ∣ ≤

≤ ∥j^′(ϕ_k) −j^′(ϕ)∥⁽X∩D)^∗

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶_→0 ∥p_k∥X∩D

´¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¶_≤C + ∣ ⟨j^′(ϕ), p_k−p⟩ ∣

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶_→0 →0.

The preceding lemma is also needed in the proof of Theorem 2.2.

Lemma 2.10. Let for a sequence {ϕ_i}i ⊆ Φ_ad hold ϕ_i → ϕ in X∩D for some ϕ∈X∩D. Then there exists C>0 such that ∥P^k(ϕ_i)∥X∩D≤C for all i, k∈N0. Proof. Lemma 2.7 yields together with (A3) and (A5) the estimate

c1

λmax∥P^k(ϕi) −ϕi∥²_X≤ − ⟨j^′(ϕi),P^k(ϕi) −ϕi⟩

≤ ∥j^′(ϕ_i)∥⁽X∩D)^∗(∥P^k(ϕ_i) −ϕ_i∥X+ ∥P^k(ϕ_i) −ϕ_i∥D)

≤C(∥P^k(ϕ_i) −ϕ_i∥X+1),

thus ∥Pk(ϕ_i) −ϕ_i∥X ≤ C and hence ∥Pk(ϕ_i)∥X ≤ C. Due to (A3) we nally get

∥P^k(ϕ_i)∥X∩D≤C independent of i and k.

Lemma 2.11. Let {ϕ_k} be the sequence generated by Algorithm 2.1, then {v_k}^k is gradient related, i.e.: for any subsequence {ϕ_k_i}i which converges in X∩D to a nonstationary point ϕ∈Φ_ad of j, the corresponding subsequence of search directions {v_k_i}ⁱ is bounded in X∩D and lim sup_i⟨j^′(ϕ_k_i), v_k_i⟩ <0 is satised. Moreover, it holds lim inf_i∥v_k_i∥X>0.

Proof. Let ϕ_k_i →ϕ in X∩D, where ϕis nonstationary. Lemma 2.10 provides that {v_k_i}ⁱ is bounded in X∩D. With (9), the statementlim sup_i⟨j^′(ϕ_k_i), v_k_i⟩ <0follows fromlim inf_i∥v_k_i∥X=C>0, which we show by contradiction.

Assume lim inf_i∥v_k_i∥X=0, thus there is a subsequence again denoted by {v_k_i}ⁱ such that v_k_i →0 in X. Using (6) for ϕ¯_k ∶= P^k(ϕ_k), the positive deniteness of a_k and (A12), it follows for all η∈Φ_ad

⟨j^′(ϕ_k), η−ϕ¯_k⟩ ≥ λ¹_k(a_k(v_k, v_k) +a_k(v_k,ϕ¯_k−v_k−η))

≥ −_λ_min¹ ∣ak(vk,ϕ¯k−vk−η)∣. (10)

(11)

Moreover, ϕ¯_k_i = v_k_i +ϕ_k_i → ϕ in X and also weakly-* in D according to Lemma 2.4. From Lemma 2.9 we get ⟨j^′(ϕki), η−ϕ¯ki⟩ → ⟨j^′(ϕ), η−ϕ⟩. From (A11) we get a_k_i(ϕ¯_k_i−ϕ_k_i, ϕ_k_i−η) →0and we derive from (10) that

⟨j^′(ϕ), η−ϕ⟩ ≥0 ∀η∈Φ_ad, which shows that ϕis stationary, which is a contradiction.

Proof of Theorem 2.2.

Because of Corollary 2.6 we can assume v_k≠0 and α_k>0 for all k. 1.) From the Armijo rule and since v_k is a descent direction we get

j(ϕ_k+1) −j(ϕ_k) ≤α_kσ⟨j^′(ϕ_k), v_k⟩ <0, (11) thus j(ϕk) is monotonically decreasing. Since j is bounded from below we get convergence j(ϕ_k) →j^∗ for some j^∗ ∈R, which proves 1.

2.) The proof is similar to [2] in nite dimension by contradiction. Let ϕ be an accumulation point, with a convergent subsequenceϕ_k_i →ϕin X∩D. The continuity of j on Φ_ad yields then j^∗ = j(ϕ) and (11) leads to α_k⟨j^′(ϕ_k), v_k⟩ → 0. Assuming now that ϕ is nonstationary we have ∣⟨j^′(ϕ_k_i), v_k_i⟩∣ ≥ C > 0, since {v_k} is gradient related by Lemma 2.11, and thus α_k_i → 0. So there exists some ¯i ∈ N such that α_k_i/β ≤ 1 for all i ≥¯i, and thus α_k_i/β does not fulll the Armijo rule due to the minimality of m_k. Applying the mean value theorem to the left hand side, we have for some nonnegative α˜_k_i ≤ ^αβ^ki and alli≥¯ithat

α_ki

β ⟨j^′(ϕ_k_i+α˜_k_iv_k_i), v_k_i⟩ =j(ϕ_k_i+^αβ^kiv_k_i) −j(ϕ_k_i) >^αβ^kiσ⟨j^′(ϕ_k_i), v_k_i⟩ (12) holds. Since, by Lemma 2.11,{v_k_i}ⁱ is bounded in X∩D andα˜_k_i →0, we have that ϕ_k_i+α˜_k_iv_k_i→ϕin X∩D. Alsoϕ¯_k_i =ϕ_k_i+v_k_i is uniformly bounded in X∩D and thus there exists a subsequence, again denoted by{ϕ¯ki}, which converges to somey∈Φad

weakly in X and weakly-* in D. Hence we have that v_k_i = ϕ¯_k_i −ϕ_k_i →v¯ ∶= y−ϕ weakly in X and weakly-* in D. According to Lemma 2.9 we can take the limit of both sides of the inequality (12), which leads to ⟨j^′(ϕ),¯v⟩ ≥σ⟨j^′(ϕ),¯v⟩, andσ<1 yields ⟨j^′(ϕ),¯v⟩ ≥0. This contradicts ⟨j^′(ϕ),v¯⟩ =lim sup_i⟨j^′(ϕ_k_i), v_k_i⟩ <0, which is a consequence of Lemma 2.11.

3.) By proving that out of any subsequence of ⟨j^′(ϕ_k_i), v_k_i⟩we can extract another subsequence, which converges to 0, we can conlude that ⟨j^′(ϕ_k_i), v_k_i⟩ → 0 which yields ∥v_k_i∥X →0 by (9). With Lemma 2.10, we get by the same arguments as in 2. that v_k_i →y−ϕ weakly in X and weakly-* in D for a subsequence and for some y∈Φ_ad, thus ⟨j^′(ϕ_k_i), v_k_i⟩ → ⟨j^′(ϕ), y−ϕ⟩ due to Lemma 2.9. Since v_k_i are descent directions for j atϕ_k_i and ϕis stationary we have ⟨j^′(ϕ), y−ϕ⟩ =0.

(12)

4.) As in 3.) we prove by a subsequence argument that ⟨j^′(ϕ_k), v_k⟩ → 0. For an arbitrary subsequence, which we also denote by indexk, (11) yieldsαk⟨j^′(ϕk), vk⟩ → 0. If α_k≥c>0 for all k, the assertion follows immediately. Otherwise there exists a subsequence (again denoted by index k) such that β ≥α_k →0 and thus the step lengthαk/β does not fulll the Armijo condition. Sincej^′ is Hölder continuous with exponent γ and modulusL we obtain

σ^α_β^k⟨j^′(ϕk), vk⟩ <j(ϕk+^α_β^kvk) −j(ϕk) = ∫₀¹^dt^dj(ϕk+t^α_β^kvk)dt

≤ ^αβ^k⟨j^′(ϕ_k), v_k⟩ + 1+γ^L (^αβ^k)^1+γ∥v_k∥^1+γ_X∩D. It holds ∥v_k∥D≤C due to (A3) and employing (9) we obtain

0< (σ−1) ⟨j^′(ϕk), vk⟩ <C_1+γ^L (^α_β^k)^γ(∥vk∥^1+γ_X +1) ≤Cα^γ_k(∣ ⟨j^′(ϕk), vk⟩ ∣^1+γ² +1). We getx_k∶= ∣ ⟨j^′(ϕ_k), v_k⟩ ∣ →0. Otherwise there exists a subsequence still denoted by {x_k}withx_k→¯c>0. Rearranging the last inequality gives1<Cα^γ_k(x

−1+γ 2

k +x⁻¹_k ) →0, which is a contradiction.

Remark 2.12. Statements 1. and 2. of Theorem 2.2 require only that ϕ_k ∈ Φad

is chosen such that the search directions v_k = ϕ_k−ϕ_k are gradient related descent directions, as can be seen in the proof above. Hence ϕ_k does not have to be Pk(ϕ_k) in Algorithm 2.1. In this case assumption (A3) is also not required.

3 An application in structural topology optimiza- tion based on a phase eld model

In this section we give an example of an optimization problem described in [4], which is not dierentiable in a Hilbert space, so the classical projected gradient method cannot be applied, but the assumptions for the VMPT method are fullled.

We consider the problem of distributing N materials, each with dierent elastic properties and xed volume fraction, within a design domain Ω⊆R^d, d ∈N, such that the mean compliance ∫^Γ^gg⋅u is minimal under the external forceg acting on Γ_g ⊆∂Ω. The displacement eldu∶Ω→R^dis given as the solution of the equations of linear elasticity (14). To obtain a well posed problem a perimeter penalization is typically used. Using phase elds in topology optimization was introduced by Bourdin and Chambolle [8]. Here, theN materials are described by a vector valued phase eldϕ∶Ω→R^N withϕ≥0and ∑ⁱϕ_i=1, which is able to handle topological changes implicitly. The ith material is characterized by {ϕ_i =1} and the dierent materials are separated by a thin interface, whose thickness is controlled by the phase eld parameterε>0. In the phase eld setting the perimeter is approximated

(13)

by the Ginzburg Landau energy. In [5] it is shown that the given problem forN =2 converges as ε → 0 in the sense of Γ-convergence. For further details about the model we refer the reader to [4]. The resulting optimal control problem reads with E(ϕ) ∶= ∫^Ω{^ε₂∣∇ϕ∣²+¹εψ₀(ϕ)}

min ˜J(ϕ,u) ∶= ∫_Γ

g

g⋅u+γE(ϕ) (13) ϕ∈H¹(Ω)^N, u∈H_D¹ ∶= {H¹(Ω)^d∣ξ∣^ΓD =0}

subject to ∫_ΩC(ϕ)E(u) ∶ E(ξ) = ∫_Γ_gg⋅ξ ∀ξ∈H_D¹ (14)

⨏_Ωϕ=m, ϕ≥0,

∑N i=1

ϕⁱ≡1, (15)

where γ > 0 is a weighting factor, ⨏^Ωϕ ∶= ∣Ω∣¹ ∫^Ωϕ, ψ₀ ∶ R^N → R is the smooth part of the potential forcing the values of ϕ to the standard basis e_i ∈ R^N, and A ∶ B ∶= ∑^di,j=1A_ijB_ij for A, B ∈ R^d×d. The materials are xed on the Dirichlet domain Γ_D ⊆ ∂Ω. The tensor valued mapping C ∶ R^N → R^d×d⊗ (R^d×d)^∗ is a suitable interpolation of the stiness tensors C(e_i) of the dierent materials and E(u) ∶=¹₂(∇u+∇u^T)is the linearized strain tensor. The prescribed volume fraction of theith material is given bymi. For examples of the functionsψ₀ andCwe refer to [3, 4]. Existence of a minimizer of the problem (13) as well as the unique solvability of the state equation (14) is shown in [4] under the following assumptions, which we claim also in this paper.

(AP) Ω ⊆ R^d is a bounded Lipschitz domain; Γ_D,Γ_g ⊆ ∂Ω with Γ_D∩Γ_g = ∅ and H^d−1(ΓD) > 0. Moreover, g ∈ L²(Γg)^d and ψ0 ∈ C^1,1(R^N) as well as m ≥ 0, ∑^Nⁱ⁼¹m_i = 1. For the stiness tensor we assume C = (C_ijkl)^di,j,k,l=1 with C_ijkl ∈C^1,1(R^N) and C_ijkl =C_jikl =C_klij and that there exist a₀, a₁, C >0, s.t.

a0∣A∣² ≤ C(ϕ)A∶ A≤a1∣A∣² as well as ∣C^′(ϕ)∣ ≤ C holds for all symmetric matrices A∈R^d×d and for all ϕ∈R^N.

The state u can be eliminated using the control-to-state operator S, resulting in the reduced cost functional ˜j(ϕ) ∶= J˜(ϕ, S(ϕ)). In [4] it is also shown that ˜j ∶ H¹(Ω)^N ∩L^∞(Ω)^N →R is everywhere Fréchet dierentiable with derivative

˜j^′(ϕ)v=γ∫_Ω{ε∇ϕ∶ ∇v+1

εψ^′₀(ϕ)v} − ∫_ΩC^′(ϕ)vE(u) ∶ E(u) (16) for all ϕ,v ∈ H¹(Ω)^N ∩L^∞(Ω)^N, where u = S(ϕ) and S ∶ L^∞(Ω)^N → H¹(Ω)^d is Fréchet dierentiable. By the techniques in [4] one can also show that S^′ is continuous.

In [4, 6] the problem is solved numerically by a pseudo time stepping method with xed time step, which results from an L²-gradient ow approach. An H⁻¹ gradient

(14)

ow approach is also considered in [6]. The drawbacks of these methods are that no convergence results to a stationary point exist, and hence also no appropriate stopping criteria are known. In addition, typically the methods are very slow, i.e.

many time steps are needed until the changes in the solution ϕ or in j are small.

Here we apply the VMPT method, which does not have these drawbacks and which can additionally incorporate second order information.

Since H¹(Ω)^N ∩L^∞(Ω)^N is not a Hilbert space the classical projected gradient method cannot be applied. In the following we show that problem (13) fullls the assumptions on the VMPT method. Amongst others we use the inner product a_k(f,g) = ∫^Ω∇f ∶ ∇g. To guarantee positive deniteness of this a_k we rst have to translate the problem by a constant to gain ∫^Ωϕ =0, which allows us to apply a Poincaré inequality. Therefor we perform a change of coordinates in the form

˜

ϕ=ϕ−mand get the following problem for the transformed coordinates.

minj(ϕ) ∶= ∫_Γ

g

g⋅S(ϕ+m) +γE(ϕ+m) (17) ϕ∈Φ_ad∶= {ϕ∈H¹(Ω)^N ∣ ⨏_Ωϕ=0, ϕ≥ −m,

∑N i=1

ϕⁱ≡0}. On the transformed problem (17) we apply the VMPT method in the spaces

X∶= {ϕ∈H¹(Ω)^N ∣ ⨏_Ωϕ=0}, D∶=L^∞(Ω)^N.

The space of mean value free functions X becomes a Hilbert space with the inner product (f,g)X∶= (∇f,∇g)L² and ∥.∥X is equivalent to theH¹-norm [1].

Theorem 3.1. The reduced cost functional j ∶ X∩D→R is continuously Fréchet dierentiable and j^′ is Lipschitz continuous on Φ_ad.

Proof. The Fréchet dierentiability of j on X∩D is shown in [4]. Let η,ϕ_i ∈X∩D and u_i = S(ϕ_i), i = 1,2. Then with (16), ψ₀ ∈ C^1,1(R^N), C_ijkl ∈ C^1,1(R^N) and

∣C^′(ϕ)∣ ≤C ∀ϕ∈R^N we get

∣(j^′(ϕ₁) −j^′(ϕ₂))η∣ ≤γε∥ϕ₁−ϕ₂∥H¹∥η∥H¹ +Cγ

ε∥ϕ₁−ϕ₂∥L²∥η∥L²

+ ∣ ∫^Ω(C^′(m+ϕ₁) −C^′(m+ϕ₂))(η)E(u₁) ∶ E(u₁)∣

+ ∣ ∫^ΩC^′(m+ϕ₂)(η)E(u₁−u₂) ∶ E(u₁)∣

+ ∣ ∫^ΩC^′(m+ϕ₂)(η)E(u₂) ∶ E(u₁−u₂)∣

≤C∥ϕ₁−ϕ₂∥H¹∥η∥H¹

+ ∥(C^′(m+ϕ₁) −C^′(m+ϕ₂))η∥L^∞∥u₁∥²H¹+ +C∥η∥^L^∞∥u₁−u₂∥^H¹(∥u₁∥^H¹+ ∥u₂∥^H¹)

≤C∥η∥^H¹^∩L^∞{∥ϕ₁−ϕ₂∥^H¹ + ∥ϕ₁−ϕ₂∥^L^∞∥u₁∥²H¹

+ ∥u₁−u₂∥^H¹(∥u₁∥^H¹+ ∥u₂∥^H¹)} (18)

(15)

To show the continuity of j^′, let ϕ_n,ϕ ∈X∩D for n ∈ N with ϕ_n →ϕ in X∩D.

Using (18) yields

∥j^′(ϕ_n) −j^′(ϕ)∥^(H¹^∩L^∞⁾^∗

≤C(∥ϕ_n−ϕ∥H¹∩L^∞(1+ ∥u_n∥²H¹) + ∥u_n−u∥H¹(∥u_n∥H¹ + ∥u∥H¹)), where u_n=S(ϕ_n) and u=S(ϕ). From the continuity of S we get that ∥u_n∥^H¹ is bounded and that ∥u_n−u∥H¹ →0 as n→ ∞. This implies

∥j^′(ϕ_n) −j^′(ϕ)∥^(H¹^∩L^∞⁾^∗ →0 and thus j∈C¹(X∩D).

For the Lipschitz continuity of j^′ we employ estimate (18) with ϕ_i ∈ Φ_ad, i =1,2. Since Φ_ad is bounded in L^∞, we get that S is Lipschitz continuous on Φ_ad and that

∥S(ϕ)∥H¹ ≤C, independent of ϕ∈Φ_ad, see [4]. This yields

∥j^′(ϕ₁) −j^′(ϕ₂)∥^(H¹^∩L^∞⁾^∗≤C∥ϕ₁−ϕ₂∥^H¹^∩L^∞, which proofs the Lipschitz continuity of j^′ inΦad.

Corollary 3.2. The spaces X and D, together with j and Φ_ad given in (17) fulll the assumptions (A1)-(A6) of the VMPT method.

Proof. Given the choices for X and D (A1) is fullled. Forϕ∈Φ_ad we have

−1≤ −m≤ϕ≤1−m≤1 ∀ϕ∈Φ_ad

almost everywhere in Ω. Thus it holds (A3) and Φ_ad ⊆X∩D. Moreover, 0∈Φ_ad, Φ_ad is convex, and since Φ_ad is closed in L²(Ω)^N, it is also closed in X↪L²(Ω)^N. Thus (A2) holds.

Assumption (A4) is shown in [4] and Theorem 3.1 provides (A5).

Given

⟨j^′(ϕ),ϕ_i⟩ = ∫_Ω{γε∇ϕ∶ ∇ϕ_i+ (^γε∇ψ₀(ϕ+m) − ∇C(ϕ+m)E(u) ∶ E(u)) ⋅ϕ_i} the rst term converges to 0 if ϕ_i →0 weakly in H¹. With (AP) and u∈H_D¹ we have that ^γ_ε∇ψ₀(ϕ+m) − ∇C(ϕ+m)E(u) ∶ E(u) ∈L¹(Ω)^N. Hence the remaining term converges to0ifϕ_i→0weakly-* inL^∞, which proves that (A6) is fullled.

Possible choices of the inner producta_k for the VMPT method are the inner product on X, i.e.

a_k(p,y) = (p,y)X= ∫_Ω∇p∶ ∇y (19)

(16)

and the scaled version a_k(p,y) = γε(p,y)X. Both fulll the assumptions (A7)- (A11).We also give an example of a pointwise choice of an inner product, which includes second order information. Since this choice is not continuous in X, it is not obvious that it fullls the assumptions. To motivate the choice of this inner product we look at the second order derivative of j, which is formally given by

j^′′(ϕ_k)[p,y] = ∫_Ω{γε∇p∶ ∇y−2(C^′(m+ϕ_k)(y)E(S^′(ϕ_k)p) ∶ E(u_k))+

+γ

ε∇²ψ₀(m+ϕ_k)p⋅y−C^′′(m+ϕ_k)[p,y]E(u_k) ∶ E(u_k)}. In [4] it is shown thatzp∶=S^′(ϕ_k)p∈H_D¹ is the unique weak solution of the linearized state equation

∫_ΩC(m+ϕ_k)E(z_p) ∶ E(η) = − ∫_ΩC^′(m+ϕ_k)pE(u_k) ∶ E(η) ∀η∈H_D¹ (20) and that ∥z_p∥^H¹ ≤ C∥p∥^L^∞ holds. Since the rst two terms in j^′′ dene an inner product (see proof of Theorem 3.3), we use

a_k(p,y) =γε(p,y)X−2∫_ΩC^′(m+ϕ_k)(y)E(z_p) ∶ E(u_k) (21) as an approximation of j^′′(ϕ_k). Testing equation (20) forz_y =S^′(ϕ_k)y with z_p we can equivalently write

a_k(p,y) =γε(p,y)X+2∫_ΩC(m+ϕ_k)E(z_p) ∶ E(z_y). (22) We would like to mention that the C²-regularity of j is not necessary for this de- nition of a_k.

Theorem 3.3. The bilinear form a_k given in (21) fullls the assumptions (A7)- (A11).

Proof. Due to (AP) and (22) we have

a_k(p,p) ≥γε∥p∥²_X.

Thus, (A7) and (A8) is fullled. Furthermore, (A9) holds due to a_k(p,y) ≤γε∥p∥^H¹∥y∥^H¹+C∥z_p∥^H¹∥z_y∥^H¹

≤γε∥p∥H¹∥y∥H¹+C∥p∥^L^∞∥y∥^L^∞ ≤C∥p∥X∩D∥y∥X∩D. (A10) is proved as in Corollary 3.2.

Finally we prove (A11). For y_k → 0 and p_k → p in X we have (y_k,p_k)X → 0 for k → ∞. With ϕ_k → ϕ, p_k → p in D = L^∞(Ω)^N and S ∶ L^∞(Ω)^N → H¹(Ω)^N

(17)

continuously Fréchet dierentiable, we have u_k = S(ϕ_k) → S(ϕ) =∶ u in H_D¹ and zp_k =S^′(ϕ_k)p_k→S^′(ϕ)p=∶zp in H_D¹. In particular, the sequences are bounded in the corresponding norms, including∥y_k∥^L^∞ ≤C ify_k→yweakly-* inL^∞. Using the Lipschitz continuity and boundedness of C^′ and ∇C(m+ϕ)E(z_p) ∶ E(u) ∈L¹(Ω)^N we have

∣ ∫^ΩC^′(m+ϕ_k)y_kE(z_p_k) ∶ E(u_k)∣

≤ ∣ ∫^Ω(C^′(m+ϕ_k) −C^′(m+ϕ))y_kE(z_p_k) ∶ E(u_k)∣

+ ∣ ∫^ΩC^′(m+ϕ)y_kE(z_p_k−z_p) ∶ E(u_k)∣

+ ∣ ∫^ΩC^′(m+ϕ)y_kE(z_p) ∶ E(u_k−u)∣ + ∣ ∫^ΩC^′(m+ϕ)y_kE(z_p) ∶ E(u)∣

≤L∥ϕ_k−ϕ∥^L^∞∥y_k∥^L^∞∥z_p_k∥H¹∥u_k∥H¹

+ ∥C^′(m+ϕ)∥^L^∞∥y_k∥^L^∞∥zp_k−zp∥H¹∥uk∥H¹

+ ∥C^′(m+ϕ)∥L^∞∥y_k∥L^∞∥z_p∥H¹∥u_k−u∥H¹

+ ∣ ∫^Ω(∇C(m+ϕ)E(z_p) ∶ E(u)) ⋅y_k∣ →0, which gives (A11).

Hence with 0<λ_min ≤ λ_k ≤λ_max, all assumptions of Theorem 2.2 are fullled and we get global convergence in the spaceH¹(Ω)^N∩L^∞(Ω)^N.

4 Numerical results

We discretize the structural topology optimization problem (13)-(15) using standard piecewise linear nite elements for the controlϕ and the state variable u. The projection type subproblem (4) is solved by a primal dual active set (PDAS) method similar to the method described in [7]. Many numerical examples for this problem can be found in [3, 5], e.g. for cantilever beams with up to three materials in two or three space dimensions and for an optimal material distribution within an airfoil.

In [3] the choice of the potential ψ as an obstacle potential and the choice of the tensor interpolation C is discussed. Also the inner products (., .)X and γε(., .)X for xed scaling parameter λ_k =1 are compared, where both give rise to a mesh independent method and the latter leads to a large speed up. Note that the choice of (., .)Xwith λ_k= (γε)⁻¹ leads to the same iterates than choosingγε(., .)X andλ_k=1. Furthermore, it is discussed in [3] that the choice ofγε(., .)Xcan be motivated using j^′′(ϕ) or by the fact that for the minimizers {ϕ_ε}^ε>0 the Ginzburg-Landau energy converges to the perimeter as ε → 0 and hence γε∥ϕ_ε∥²_X ≈ const independent of ε≪1. However, since this holds only for the iteratesϕ_k when the phases are separated and the interfaces are present with thickness proportional to ε, we suggest to adopt λ_k in accordance to this. As updating strategy for λ_k the following method is applied: Start with λ₀ = 0.005(γε)⁻¹, then if α_k−1 = 1 set λ˜_k = λ_k−1/0.75, else λ˜_k =0.75λk−1 and λ_k =max{λ_min,min{λ_max,λ˜_k}}. The last adjustment yields that