A Global Convergence Result for a Newton-like Iteration

There is no global convergence result if one solely employs the natural level function concept to determine step sizes in a damped Newton iteration. Such is also true for the concepts of the projected natural level function and the approximate projected natural level function which are introduced and discussed in this work. This is due to the fact that every step descent is measured in a diﬀerent metric and cycles in the iterates cannot be excluded.

However, employing the general level function (2.10), i.e.,

T(x|A) :=¹2kAF(x)k²2, A∈R^n×n, (5.1) for afixed nonsingularchoice ofAglobal convergence results for a damped Newton iteration are available, cf. Theorem 3.13 in [11].

In this section we will provide a global convergence result of the type presented in the afore-mentioned Theorem 3.13, however, for anapproximatedamped Newton iteration

xl+1=xl+λlδxl, δxl=−Bl⁻¹Fl, Fl:=F(xl),

where the matricesBlare to be considered as approximations to the respective JacobiansJl:=

F^′(xl). The basic techniques used in the proof of our global convergence statement are adopted from the proof of the above mentioned Theorem 3.13. However, due to the fact that we consider an approximate damped Newton iteration some extra care is necessary. For the proof we will deﬁne a polynomial modelql(λ) as an estimate for the relative change ofT(x|A) atxlin the direction ofδxl. By means of this model the step sizesλlwill be determined. For the development of our polynomial model it is crucial that the approximate correctionδxlfulﬁlls

dλT(xl+λδxl|A)|λ=0= gradT(xl|A)δxl=¡ AFl¢T

AJlδxl

=−kAFlk²2.

(5.2)

This can be achieved by simple scaling of some given directionδxlprovidedδxland gradT(xl|A)^T are not perpendicular, cf. the deﬁnition of the descent approximation from (4.6) in the context of

145

the APNLF. However, as we will see we can also use a generalization of the descent update from Section 4.3 to obtain a correctionδxlwhich fulﬁlls (5.2). We will combine the generalized descent update with a generalization of the purifying techniques from Section 4.2 to provide a suﬃciently good approximation quality of the matricesBl.

As a ﬁrst preparation for our global convergence statement we will develop a polynomial model p(λ) which serves as a basis forql(λ). We introduce this second modelp(λ) also for the reason that it may be exploited to provide the basics for a broader range of step size control algorithms. E.g., we will see that it may be used for a step size control in the context of an approximate natural level function (without projection)—see Remark 5.3.

The modelp(λ) depends on a rather general nonlinearity bound:

For nonsingularW,U∈R^n×nlet 2· kW¡

F(y)−F(x)−F^′(x)(y−x)¢

k26ωkU(y−x)k²2 ∀x, y∈ D. (5.3) As a further generalizationWandUmay depend onx. If this is the case we

assume thatWandUare continuous inxand thatW(x),U(x) are nonsingular for eachx∈ D.

This bound is innately neither aﬃne covariant nor aﬃne contravariant. It depends on the choice ofWandUwhich concept is favored. For exampleW=W(x) =F^′(x)⁻¹andU=Ileads to an aﬃne covariant bound whereasW=IandU=U(x) =F^′(x) to an aﬃne contravariant one. This is on purpose to ensure compatibility of the polynomialp(λ) with both concepts. Furthermore one may consider the above bound only for an appropriate subset ˜DofDwhich leads to a locally deﬁned bound. This way an adaptive choice ofWandUis possible—see again Remark 5.3.

The polynomial modelp(λ) reads as follows.

Theorem 5.1 (Polynomial model)Suppose thatFfulfills Assumption 2.1 and letx∈ Dsuch thatF(x)6= 0. AbbreviateJ:=F^′(x). Assume that the nonlinearity bound(5.3)holds. For given nonsingularA∈R^n×n consider the general level function(5.1). LetB∈R^n×nbe nonsingular such thatδx:=−B⁻¹F(x)fulfills

dλT(x+λδx|A)|λ=0=−kAF(x)k²2. (5.4) Define

η:=kA¡ B−J¢

δxk2

kAF(x)k2 , ˜h:=ω· kAW⁻¹k2· kU δxk²2

kAF(x)k2

and

Λ :={λ∈(0,1]|x+λδx∈ D}. (5.5) Then forλ∈Λone has

T(x+λδx|A)6h¡

1−λ+¹₂˜hλ²¢2

+η²λ²+ ˜hηλ³i T(x|A)

=:p(λ)T(x|A).

(5.6)

For the polynomialpit holds that

I) _dλ^dp(λ)|λ=0=−2,

II)pis strictly convex on[0,1]and has a unique minimizer˜λin[0,1]with0<˜λ6min¡ 1,1/˜h¢ where1/˜h:=∞ifω= 0.

Proof.Considerλ∈Λ. For ease of writing we use φ(λ) :=A¡

All three statements are exploited in the following without any explicit reference.

kAF(x+λδx)k²2=¡

Combining these results and rearranging the terms the suma+b^T1b2reads as follows a+b^T1b2= (1−λ)¡

Recall that

So with an additional adding and subtracting ofλ¡ φ(λ)¢T

By the nonlinearity bound (5.3) and the deﬁnition of ˜hit holds that kφ(λ)k26¹₂˜hλ²kAF(x)k2.

Thus, by means of this estimate, the relation (5.7), and by means of the Cauchy-Schwarz inequality we obtain According to the deﬁnition ofηandp(λ) and by means of some minor rearranging we can write the above relation as

we derive thatpis strictly convex on [0,1] and_dλ^dpis strictly increasing on [0,1]. Together with

d By the quantityηthe inﬂuence of the deviation from the Newton correction is expressed because

η= 0 ⇔ δx=−F^′(x)⁻¹F(x).

For an evaluation ofηthe direct tangent evaluationF^′(x)δxis required. This can easily be done via the forward mode of Automatic Diﬀerentiation. Note that forη= 0,W=W(x) =F^′(x)⁻¹ andU=Ithe above polynomial simpliﬁes to the the square of the polynomial from Theorem 2.10 if additionally the Lipschitz condition (2.23) is considered instead of the above nonlinearity bound (5.3) for this speciﬁc choices ofWandU, i.e. the nonlinearity bound (3.3), and ifAis chosen

asA=F^′(x)⁻¹. An analogous simpliﬁcation is true in the context of aﬃne contravariance for the choicesW =I,U(x) =F^′(x) andA=I. To avoid the introduction of new notation just for the purposes of comparison we omit details here. The interested reader may be referred to Theorem 3.7 in [11]. Summarizing, the polynomial modelpprovides a generalization of existing polynomial models from the literature in the context of both aﬃne invariance concepts.

Remark 5.2 The minimizer ˜λofpin [0,1] can be stated explicitly. We omit the formula here since it is lengthy and does not provide any readily identiﬁable further insight. ¤ Remark 5.3 If we only consider a subset ˜DofDin (5.3), e.g., for givenx∈ Dlety∈ Dwith y−x=λδx,λ∈[0,1], we obtain a local nonlinearity bound. If these bounds are employed iteratively one may also chooseW andU adaptively. Consider for example the context of an approximate natural level function where ˜Blis some nonsingular approximation forF^′(xl) and Bla second nonsingular approximation forF^′(xl) which additionally fulﬁlls (5.2) for the choice A= ˜Bl. ChoosingWl= ˜BlatxlandUl≡I opens the door to have meaningful theoretical quantities available which can also be estimated in a reasonable way, cf. the bound (4.152) and its

estimate (4.157). ¤

For the upcoming global convergence result we must ensure that the deviation ofδxlfrom the Newton correction atxlis uniformly bounded. To meet this condition, we will employ an adaption of the purifying techniques from Section 4.2. Recall that we also have to meet the requirement (5.2). Therefore, we will use an adaption of the descent update from Section 4.3. Letx∈ Dsuch thatF(x)6= 0 andF^′(x) is nonsingular. With the abbreviationsF:=F(x) andJ:=F^′(x) and withδxkdetermined in analogy toδxkin (4.52) the updates we are going to use look as follows:

•descent:

Bk+1=Bk−F(AF)^TA(Bk−J) kAFk²2

(5.8a)

•Newton-philic:

Bk+1=Bk−(Bk−J)δxk¡

A(Bk−J)δxk¢T

A(Bk−J) kA(Bk−J)δxkk²2

(5.8b) Remark 5.4 The above updates are in close relationship to Schlenkrich’s residual update

Sk+1=Sk−F F^T(Sk−J) kFk²2

and transposed tangent Broyden update Sk+1=Sk−(Sk−J)sk

¡(Sk−J)sk

¢T

(Sk−J) k(Sk−J)skk²2

, sk6= 0 s.t. Sksk=





−F ifSkis nonsingular 0 ifSkis singular, respectively: Applying Schlenkrich’s updates toG(x) :=AF(x) and generating approximationsSl,k

hereby then the relationABl,k=Sl,kholds ifABl,0=Sl,0. Hence, regarding global convergence one may also refer to the results in [28]. However, the step size control exploited in [28] is not based on a polynomial such asp(λ) from Theorem 5.1. Thus, we provide an alternative approach.

IfBk+1is constructed via the descent update then

(AF)^TABk+1= (AF)^TAJ (5.9)

and hence for nonsingularBk+1(5.4) is true. Due to heredity all following approximations con-structed via the Newton-philic update also fulﬁll (5.9). If such a matrix is nonsingular then (5.4) is true too. Note that gradT(x|A) is given via (AF)^TAJand hence does not depend on any of the approximationsBk. Therefore, we do not consider adapted versions of the duophilic and the gradientphilic update, (4.49) and (4.51), respectively. The basic purifying process at an iteratexl

is as follows:

Algorithm 5.1 (Purifying process w.r.tT(x|A)atxl∈ DwithF^′(xl)nonsingular)

1:given:Bl,0∈R^n×n,Fl:=F(xl)6= 0,Jl:=F^′(xl) nonsingular, 2:setk= 0

3:while(Bl,ksingular)||(Bl,knonsingular &&B_l,k⁻¹Fl6=J_l⁻¹Fl)do 4: ifk= 0then

5: constructBl,k+1via the descent update (5.8a)

6: else

7: determineδxl,k6= 0 according to Bl,kδxl,k=





−Fl ifBl,kis nonsingular 0 ifBl,kis singular 8: constructBl,k+1via the Newton-philic update (5.8b) 9: end if

10: setk=k+ 1 11: end while

The adapted Newton-philic update (5.8a) is a speciﬁc instance of the basic purifying update, cf.

(4.39). Hence, by Proposition 4.16 the above algorithm terminates after a ﬁnite number of steps providing the Newton correction. Also, every constructed approximationBl,kfulﬁlls an indexed version of (5.9), i.e.,

(AFl¢T

ABl,k= (AFl)^TJl. (5.10)

So there will be an index ¯klsuch thatBl,¯klis nonsingular and of suﬃcient approximation quality.

This vague statement will be concretized in the proof of Theorem 5.5. We useBl,¯klto compute the actual correctionδxl, i.e.,

Bl:=Bl,¯kl,

δxl=−B⁻¹l Fl. (5.11)

Since the quantitiesW andU from (5.3) may depend onxand hence on an iteratexlwe use the notationW(l)andU(l)to indicate that possible dependency. Also, we introduce indices in the deﬁnition ofη, i.e.,

ηl:=kA¡ Bl−Jl¢

δxlk2

kAFlk2 . (5.12)

Verify that

kU(l)(B_l⁻¹−J_l⁻¹)Flk2+kU(l)J_l⁻¹Flk2>kU(l)δxlk2.

Also, it holds that

kU(l)J_l⁻¹Flk26kU(l)J_l⁻¹A⁻¹k2kAFlk2. Hence,

bhl:=ω· kAW_(l)⁻¹k2·

kU(l)(Bl⁻¹−Jl⁻¹)Flk2

kAFlk2 +kU(l)Jl⁻¹A⁻¹k2

·h

kU(l)(B_l⁻¹−J_l⁻¹)Flk2+kU(l)J_l⁻¹Flk2

>ω· kAW_(l)⁻¹k2·kU(l)δxlk²2

kAFlk2 =: ˜hl.

(5.13)

So introducing indices in the deﬁnition ofp(λ) we substitutebhlfor ˜hlto obtain the polynomial modelql(λ) on which the following result is based.

Theorem 5.5 (Global convergence)SupposeFfulfills Assumption 2.1 and additionally let the JacobianF^′(x)be nonsingular for allx∈ D. Assume that the nonlinearity bound (5.3)holds.

Consider the general level function(5.1)for nonsingularA∈R^n×nand letG(x|A)be the level set of the general level function atx, i.e.,G(x|A) :={z∈ D |T(z|A)6T(x|A)}. For somex0∈ D letD0denote the path-connected component ofG(x0|A)which containsx0. Assume thatD0is compact. LetB0,0∈R^n×nnonsingular and nonnegative constantsζ1,ζ2andζ3be given. Consider the iteration

xl+1=xl+λlδxl, δxl=−Bl⁻¹Fl, Fl:=F(xl), withλlbeing the unique minimizer in[0,1]of

ql(λ) :=¡

1−λ+¹₂bhlλ²¢2

+ηl²λ²+bhlηlλ³.

The matrixBlis given as in(5.11). The quantitiesηlandbhlare defined as in(5.12)and(5.13).

Then for eachlwithFl6= 0there exists an indexk¯l6nsuch that I)Blis nonsingular, hence,δxlis well defined,

II) it holds that k¡

B_l⁻¹−J_l⁻¹¢

Flk26ζ1, k¡

B_l⁻¹−J_l⁻¹¢ Flk2

kAFlk 6ζ2, ηl6ζ3

whereJl:=F^′(xl),

III)λlis well defined andλl>ε >0withεindependent ofl, IV) there is a constant06Cε<1independent oflsuch that T(xl+λlδxl|A)6ql(λl)T(xl|A)

6CεT(xl|A)< T(xl|A).

This implies that the sequencexlconverges to somex∗withF(x∗) = 0.

The constantεdepends onD0andζ1,ζ2andζ3.

Proof.We deﬁne forxl∈ Dthe setDlin accordance withD0.

The proof is by induction. Assume thatF06= 0. SinceJ0is nonsingular and by means of Proposition 4.16 and Algorithm 5.1 there is an index 06 k¯0 6 nsuch thatB0,¯k0 =: B0 is nonsingular and (5.10) holds. Hence,δx0 = −B⁻¹0 F0 fulﬁlls (5.2) for l = 0. Since also the nonlinearity bound (5.3) is assumed to hold, we can exploit the results of Theorem 5.1. Introducing indices in (5.5) and (5.6) we obtain

T(x0+λδx0|A)6p0(λ)T(x0|A) ∀λ∈Λ0. Recall from (5.13) thatbh0>˜h0. Hence, substitutingbh0for ˜h0inp0(λ) yields

p0(λ)6q0(λ) ∀λ∈Λ0

and therefore

T(x0+λδx0|A)6q0(λ)T(x0|A) ∀λ∈Λ0.

Also, we may argue like in the proof of Theorem 5.1 that_dλ^dq0(λ)|λ=0=−2 and thatq0is strictly convex on [0,1] and has a unique minimizerλ0 >0 in [0,1]. We show by contradiction that x0+λδx0∈ D0for all 06λ6λ0. For this, assume it is not the case. SinceDis open and nonempty and due to the compactness ofD0the setY0:={0< λ6λ0|x0+λδx0∈ D \ D0}is nonempty. By assumptionDis also convex. This means that for anyλ∈Y0we havex0+sδx0∈ D, 06s6λ. And for any suchs >0 we obtain by the above stated properties ofq0the estimate q0(s)< q0(0) = 1. Consequently,

T(x0+sδx0|A)6q0(s)T(x0|A)< T(x0|A), 0< s6λ, λ∈Y0.

But from this it follows thatx0+λδx0∈ D0for allλ∈Y0which contradicts the deﬁnition ofY0. Thus, we can conclude thatx0+λδx0∈ D0for all 06λ6λ0which means that

T(x0+λδx0|A)< T(x0) for all 0< λ6λ0. Hence, in particular

T(x1|A)< T(x0|A) and therefore

D1⊂ D0.

We turn to the question whether there is anε >0 such thatλ0>ε. Recall thatW andU in the nonlinearity bound (5.3) may depend continuously onx. If this is not the case thenW(x)≡ W∈R^n×nandU(x)≡U∈R^n×n, respectively. Anyway, the compactness ofD0yields for some positive constantsC1,C2andC3,

x∈Dmax0kU(x)F^′(x)⁻¹F(x)k26C1,

x∈Dmax0kAW(x)⁻¹k26C2, maxx∈D0kU(x)F^′(x)⁻¹A⁻¹k26C3.

W.l.o.g. we can also assume that ¯k0is chosen such that for the given nonnegative constantsζ1,ζ2

andζ3the bounds

k¡ B⁻¹0 −J0⁻¹

¢F0k26ζ1, k¡

B⁻¹0 −J0⁻¹

¢F0k2

kAF0k 6ζ2, η06ζ3

hold. Hence, by the deﬁnition ofbhlin (5.13),

bhl6ωC2(ζ2+C3)(ζ1+C1) =:C4

and for

q(λ) := (1−λ+¹₂C4λ²)²+ζ3²λ²+C4ζ3λ³ we obtain

q0(λ)6q(λ) ∀λ∈Λ0.

Sinceqis of the same structure asp0 andq0 the properties stated in paragraph I) and II) of Theorem 5.1 are true forqas well. We denote the unique minimizer byεand deﬁne the nonnegative constantCεvia

Cε:=q(ε)<1.

Exploiting the properties stated in paragraph I) and II) of Theorem 5.1 forq0andqsome elementary considerations yield

λ0>ε >0.

Also,

q0(λ0)6Cε and hence T(x1|A)6CεT(x0|A).

Assuming forl>1 thatxl∈ DwithFl6= 0 andDl⊂ D0holds we argue in the same manner as before to prove that

•there is an index ¯klsuch thatδxlis well deﬁned,

•xl+λlδxl=xl+1∈ DlandDl+1⊂ Dl,

•λl>ε,

•T(xl+1|A)6CεT(xl|A).

With the assumptionDl⊂ D0the second of the above relations impliesDl+1⊂ D0. This yields either convergence in a ﬁnite number of steps or

l→∞limxl=x∗

completing the proof.

Im Dokument Approximate and Projected Natural Level Functions for Newton-type Iterations (Seite 159-169)