Fast local convergence - Reduced order modeling

2.2 Reduced order modeling

3.1.2 Fast local convergence

In consideration of condition (3.3), a nearby choice for search directions is given by

d_k=−∇J(uˆ _k), (3.6)

sinceJˆdecreases most rapidly in the direction of the negative gradient. Those methods are referred to as steepest descent method, or, due to the choice of the search direction, also called gradient method.

Although these directions are favorable to compute since only first-order derivatives have to be considered, the method of steepest descent is usually very inefficient, lacking fast (local) conver-gence (cp. Kelley [Kel99]). For this reason, as indicated, another special class of search directions is of central importance, namely such that we obtain for solving the(generalized) Newton equation

M_kd_k=−∇Jˆ(u_k), (3.7)

where {Mk}_k∈_N⊂L(U, U)is a sequence of positive operators, i.e., they satisfy ckvk²_U ≤ hMkv, viU ≤Ckvk²_U for all v∈U and all k= 1,2, ... ,

for some constants 0< c < C independent of k. We refer to methods that implement (3.7) for direction computation asNewton-type methods. Just like for directions of steepest descent (3.6), also here condition (3.3) is fulfilled for this class of directions (cp. De los Reyes [DlR15]).

Depending on the information provided by {Mk}_k∈_N, it is possible to attain high(er) rates of convergence, at least locally. Obviuosly, forM_k=I_U the identity in U, we obtain the direction of steepest descent (3.6). We introduce now two common methods that improve local convergence by endowingMk with (approximative) second-order derivative information.

The (classical) Newton method

For setting M_k=∇²J(uˆ _k) the Riesz-representation of the second derivative Jˆ⁰⁰ evaluated atu_k, we obtain for (3.7) the (classical) Newton equation

∇²Jˆ(u_k)d_k=−∇J(uˆ _k). (3.8) Provided u◦ is sufficiently close to a (local) solution u¯ of our minimization problem, Algorithm 2 then turns into the so-called (classical) Newton method, where no step size strategy is needed (i.e.,ς ≡1) and always the full Newton stepdk is accepted.

Proposition 3.1.2Let u¯ ∈ U be a local optimal solution to problem (3.1) and Jˆ be twice continuously differentiable. Let ∇²Jˆbe Lipschitz continuous in a neighborhood ofu¯ and

h∇²Jˆ(¯u)v, vi_U ≥Ckvk²_U for all v∈U ,

for some constant C > 0. Then there exists a constant ρ >0 such that for inital guess u◦ ∈U with ku◦−uk¯ _U < ρholds:

(i) The Newton iterates

u_k+1=u_k− ∇²J(uˆ _k)−1

∇Jˆ(u_k) converge tou¯.

(ii) There exists a constantC >¯ 0 such that

ku_k+1−uk¯ _U ≤C¯ku_k−uk¯ ²_U.

For a proof we refer to De los Reyes [DlR15, Theorem 4.3]. In particular, the last result implies that the (classical) Newton method converges locally with quadratic rate.

Remark 3.1.3An inexact solution of the (generalized) Newton equation (3.7) can be interpreted as a solution to the same system, but with M_k replaced by aperturbed operatorM˜_k. As long as the perturbation is sufficiently small, convergence is not affected if the system is solved inexactly in consideration of a suitable controlled accuracy of the solution (cp. Hinze et al. [HPUU09]).

In case of the (classical) Newton equation (3.8), for this purpose so-called Krylov methods such as the minimal residual method (MINRES) or the conjugate gradient method (CG) are generally used. We then speak of inexact Newton methods.

To ensure global convergence to a (local) minimizer u¯, an essential prerequisite is the positivity of ∇²Jˆ(uk) in the neighborhood of u, since otherwise this could result in the computation of a¯ direction of ascent and, hence, convergence to a (local) maximizer. To avoid this unintended case, we introduce a check fornegative curvature, given by

hd_k,∇Jˆ(u_k)iU <0,

which can be considered as a (generalized) angle test. If the negative curvature condition stays unfilled, the Newton direction computed by (3.8) is a direction of descent and therefore feasible.

For the design of an efficient globally convergent method we proceed as follows: In each iteration we generate a trial direction by a locally fast convergent method e.g., the Newton method. If the

3.1 General concept of line search methods generated direction satisfies a (generalized) angle test, the direction is feasible and will be accepted.

Otherwise we reject the direction and choose another (feasible) search direction, e.g., that one of steepest descent. Of course, in the latter case we have to take care for a feasible step length by employing a sufficient decrease condition, e.g., such as the Armijo rule (3.4).

Quasi-Newton methods

Another approach based on solving (3.7) is given by the so-called Quasi-Newton methods. They provide fast local convergence by considering operatorsM_kwhich successively approximate second-order derivative information. At the same time, they preserve positivity and lead therefore to a globally convergent method. Out of this class theBFGS method, named after their inventors C.G.

Broyden, R. Fletcher, D. Goldfarb and D.F. Shanno, is a common representative. Therein the second derivative is approximated by using an operator B_k+1 ∈ L(U, U) that fulfills the secant equation

Bk+1(uk+1−uk) =∇Jˆ(uk+1)− ∇J(uˆ k).

After in iteration ka directiond_k is computed by solving (3.7) forM_k =B_k, the (rank-2) update formula is given by

B_k+1=B_k−B_ks_k⊗B_ks_k

hB_ks_k, s_ki_U + z_k⊗z_k hz_k, s_ki_U ,

where s_k:=u_k+1−u_k andz_k :=∇Jˆ(u_k+1)− ∇Jˆ(u_k)and the operatorw⊗z∈U is defined for w, z∈U by

(w⊗z)(v) =hz, vi_Uw .

Similar to the steepest descent method, for computing the BFGS update only the evaluation of fist-order derivatives is needed.

Proposition 3.1.4Let u¯ ∈ U be a local optimal solution to problem (3.1) and Jˆ be twice continuously differentiable. Let∇²Jˆbe Lipschitz continuous in a neighborhood ofu¯, with bounded inverse. Let B◦ ∈ L(U, U) be an initial positive operator. If B◦ − ∇²Jˆ(¯u) is compact, and ku◦−uk¯ U < ρ and kB◦ − ∇²Jˆ(¯u)k_L_(U,U) < ε for sufficiently small ρ, ε > 0, then the BFGS iterates

u_k+1 =u_k−B_k⁻¹∇J(uˆ _k) converge q-superlinearly to u¯.

For a proof we refer to De los Reyes [DlR15].

Analogue to the (classical) Newton method we obtain global convergence by satisfying in each iteration ka curvature condition, here given by

hzk, ski_U >0, (3.9)

in combination with a sufficient decrease condition providing feasible step lengths. The compact-ness condition in Proposition 3.1.4 is necessary for the superlinear convergence rate (cp. Kelley and Sachs [KS91]), while the bounded inverse condition can be replaced by a convexity condition on the second derivative, see De los Reyes [DlR15].

Remark 3.1.5To assure the presupposed positivity of the operatorsB_k, the updates are performed just as long as the curvature condition (3.9) is fulfilled. Otherwise the update is skipped or the approximation will be reinitialized correspondingly. Especially the reinitialization may cause the loss of too much information and slow down convergence. To overcome this difficulty let us mention here thedamped BFGS updating as presented in Nocedal and Wright [NW06, Procedure 18.2] for a finite dimensional setting inRⁿ. It consistently assures the positive definiteness of the subsequently computed approximationsB_k+1 ∈R^n×n. But even though the damped BFGS updating often works well, it may behave poorly on difficult problems. Furthermore it fails to address the underlying problem that the Hessian that is approximated, actually may not be positive definite at the current iterate. Let us also point to the existence of a modification of the BFGS formula, that allows a direct computation of aninverse Quasi-Newton approximationBe_k:≈(∇²Jˆ(u_k))⁻¹ of the Hessian (cf., e.g., Nocedal and Wright [NW06]). In this case, a direction dk can be computed directly by applying the inverse approximation to the negative gradient

d_k=−Be_k∇Jˆ(u_k).

Im Dokument POD-Based A-posteriori Error Estimation for Control Problems Governed by Nonlinear PDEs (Seite 63-66)