• Keine Ergebnisse gefunden

A Global Convergence Result for a Newton-like Iteration

There is no global convergence result if one solely employs the natural level function concept to determine step sizes in a damped Newton iteration. Such is also true for the concepts of the projected natural level function and the approximate projected natural level function which are introduced and discussed in this work. This is due to the fact that every step descent is measured in a different metric and cycles in the iterates cannot be excluded.

However, employing the general level function (2.10), i.e.,

T(x|A) :=12kAF(x)k22, A∈Rn×n, (5.1) for afixed nonsingularchoice ofAglobal convergence results for a damped Newton iteration are available, cf. Theorem 3.13 in [11].

In this section we will provide a global convergence result of the type presented in the afore-mentioned Theorem 3.13, however, for anapproximatedamped Newton iteration

xl+1=xllδxl, δxl=−Bl−1Fl, Fl:=F(xl),

where the matricesBlare to be considered as approximations to the respective JacobiansJl:=

F(xl). The basic techniques used in the proof of our global convergence statement are adopted from the proof of the above mentioned Theorem 3.13. However, due to the fact that we consider an approximate damped Newton iteration some extra care is necessary. For the proof we will define a polynomial modelql(λ) as an estimate for the relative change ofT(x|A) atxlin the direction ofδxl. By means of this model the step sizesλlwill be determined. For the development of our polynomial model it is crucial that the approximate correctionδxlfulfills

d

dλT(xl+λδxl|A)|λ=0= gradT(xl|A)δxl=¡ AFl¢T

AJlδxl

=−kAFlk22.

(5.2)

This can be achieved by simple scaling of some given directionδxlprovidedδxland gradT(xl|A)T are not perpendicular, cf. the definition of the descent approximation from (4.6) in the context of

145

the APNLF. However, as we will see we can also use a generalization of the descent update from Section 4.3 to obtain a correctionδxlwhich fulfills (5.2). We will combine the generalized descent update with a generalization of the purifying techniques from Section 4.2 to provide a sufficiently good approximation quality of the matricesBl.

As a first preparation for our global convergence statement we will develop a polynomial model p(λ) which serves as a basis forql(λ). We introduce this second modelp(λ) also for the reason that it may be exploited to provide the basics for a broader range of step size control algorithms. E.g., we will see that it may be used for a step size control in the context of an approximate natural level function (without projection)—see Remark 5.3.

The modelp(λ) depends on a rather general nonlinearity bound:

For nonsingularW,U∈Rn×nlet 2· kW¡

F(y)−F(x)−F(x)(y−x)¢

k26ωkU(y−x)k22 ∀x, y∈ D. (5.3) As a further generalizationWandUmay depend onx. If this is the case we

assume thatWandUare continuous inxand thatW(x),U(x) are nonsingular for eachx∈ D.

This bound is innately neither affine covariant nor affine contravariant. It depends on the choice ofWandUwhich concept is favored. For exampleW=W(x) =F(x)−1andU=Ileads to an affine covariant bound whereasW=IandU=U(x) =F(x) to an affine contravariant one. This is on purpose to ensure compatibility of the polynomialp(λ) with both concepts. Furthermore one may consider the above bound only for an appropriate subset ˜DofDwhich leads to a locally defined bound. This way an adaptive choice ofWandUis possible—see again Remark 5.3.

The polynomial modelp(λ) reads as follows.

Theorem 5.1 (Polynomial model)Suppose thatFfulfills Assumption 2.1 and letx∈ Dsuch thatF(x)6= 0. AbbreviateJ:=F(x). Assume that the nonlinearity bound(5.3)holds. For given nonsingularA∈Rn×n consider the general level function(5.1). LetB∈Rn×nbe nonsingular such thatδx:=−B−1F(x)fulfills

d

dλT(x+λδx|A)|λ=0=−kAF(x)k22. (5.4) Define

η:=kA¡ B−J¢

δxk2

kAF(x)k2 , ˜h:=ω· kAW−1k2· kU δxk22

kAF(x)k2

and

Λ :={λ∈(0,1]|x+λδx∈ D}. (5.5) Then forλ∈Λone has

T(x+λδx|A)6h¡

1−λ+12˜hλ2¢2

2λ2+ ˜hηλ3i T(x|A)

=:p(λ)T(x|A).

(5.6)

For the polynomialpit holds that

I) dp(λ)|λ=0=−2,

II)pis strictly convex on[0,1]and has a unique minimizer˜λin[0,1]with0<˜λ6min¡ 1,1/˜h¢ where1/˜h:=∞ifω= 0.

Proof.Considerλ∈Λ. For ease of writing we use φ(λ) :=A¡

All three statements are exploited in the following without any explicit reference.

kAF(x+λδx)k22

Combining these results and rearranging the terms the suma+bT1b2reads as follows a+bT1b2= (1−λ)¡

Recall that

So with an additional adding and subtracting ofλ¡ φ(λ)¢T

By the nonlinearity bound (5.3) and the definition of ˜hit holds that kφ(λ)k2612˜hλ2kAF(x)k2.

Thus, by means of this estimate, the relation (5.7), and by means of the Cauchy-Schwarz inequality we obtain According to the definition ofηandp(λ) and by means of some minor rearranging we can write the above relation as

we derive thatpis strictly convex on [0,1] anddpis strictly increasing on [0,1]. Together with

d By the quantityηthe influence of the deviation from the Newton correction is expressed because

η= 0 ⇔ δx=−F(x)−1F(x).

For an evaluation ofηthe direct tangent evaluationF(x)δxis required. This can easily be done via the forward mode of Automatic Differentiation. Note that forη= 0,W=W(x) =F(x)−1 andU=Ithe above polynomial simplifies to the the square of the polynomial from Theorem 2.10 if additionally the Lipschitz condition (2.23) is considered instead of the above nonlinearity bound (5.3) for this specific choices ofWandU, i.e. the nonlinearity bound (3.3), and ifAis chosen

asA=F(x)−1. An analogous simplification is true in the context of affine contravariance for the choicesW =I,U(x) =F(x) andA=I. To avoid the introduction of new notation just for the purposes of comparison we omit details here. The interested reader may be referred to Theorem 3.7 in [11]. Summarizing, the polynomial modelpprovides a generalization of existing polynomial models from the literature in the context of both affine invariance concepts.

Remark 5.2 The minimizer ˜λofpin [0,1] can be stated explicitly. We omit the formula here since it is lengthy and does not provide any readily identifiable further insight. ¤ Remark 5.3 If we only consider a subset ˜DofDin (5.3), e.g., for givenx∈ Dlety∈ Dwith y−x=λδx,λ∈[0,1], we obtain a local nonlinearity bound. If these bounds are employed iteratively one may also chooseW andU adaptively. Consider for example the context of an approximate natural level function where ˜Blis some nonsingular approximation forF(xl) and Bla second nonsingular approximation forF(xl) which additionally fulfills (5.2) for the choice A= ˜Bl. ChoosingWl= ˜BlatxlandUl≡I opens the door to have meaningful theoretical quantities available which can also be estimated in a reasonable way, cf. the bound (4.152) and its

estimate (4.157). ¤

For the upcoming global convergence result we must ensure that the deviation ofδxlfrom the Newton correction atxlis uniformly bounded. To meet this condition, we will employ an adaption of the purifying techniques from Section 4.2. Recall that we also have to meet the requirement (5.2). Therefore, we will use an adaption of the descent update from Section 4.3. Letx∈ Dsuch thatF(x)6= 0 andF(x) is nonsingular. With the abbreviationsF:=F(x) andJ:=F(x) and withδxkdetermined in analogy toδxkin (4.52) the updates we are going to use look as follows:

•descent:

Bk+1=Bk−F(AF)TA(Bk−J) kAFk22

(5.8a)

•Newton-philic:

Bk+1=Bk−(Bk−J)δxk¡

A(Bk−J)δxk¢T

A(Bk−J) kA(Bk−J)δxkk22

(5.8b) Remark 5.4 The above updates are in close relationship to Schlenkrich’s residual update

Sk+1=Sk−F FT(Sk−J) kFk22

and transposed tangent Broyden update Sk+1=Sk−(Sk−J)sk

¡(Sk−J)sk

¢T

(Sk−J) k(Sk−J)skk22

, sk6= 0 s.t. Sksk=



−F ifSkis nonsingular 0 ifSkis singular, respectively: Applying Schlenkrich’s updates toG(x) :=AF(x) and generating approximationsSl,k

hereby then the relationABl,k=Sl,kholds ifABl,0=Sl,0. Hence, regarding global convergence one may also refer to the results in [28]. However, the step size control exploited in [28] is not based on a polynomial such asp(λ) from Theorem 5.1. Thus, we provide an alternative approach.

¤

IfBk+1is constructed via the descent update then

(AF)TABk+1= (AF)TAJ (5.9)

and hence for nonsingularBk+1(5.4) is true. Due to heredity all following approximations con-structed via the Newton-philic update also fulfill (5.9). If such a matrix is nonsingular then (5.4) is true too. Note that gradT(x|A) is given via (AF)TAJand hence does not depend on any of the approximationsBk. Therefore, we do not consider adapted versions of the duophilic and the gradientphilic update, (4.49) and (4.51), respectively. The basic purifying process at an iteratexl

is as follows:

Algorithm 5.1 (Purifying process w.r.tT(x|A)atxl∈ DwithF(xl)nonsingular)

1:given:Bl,0∈Rn×n,Fl:=F(xl)6= 0,Jl:=F(xl) nonsingular, 2:setk= 0

3:while(Bl,ksingular)||(Bl,knonsingular &&Bl,k−1Fl6=Jl−1Fl)do 4: ifk= 0then

5: constructBl,k+1via the descent update (5.8a)

6: else

7: determineδxl,k6= 0 according to Bl,kδxl,k=



−Fl ifBl,kis nonsingular 0 ifBl,kis singular 8: constructBl,k+1via the Newton-philic update (5.8b) 9: end if

10: setk=k+ 1 11: end while

The adapted Newton-philic update (5.8a) is a specific instance of the basic purifying update, cf.

(4.39). Hence, by Proposition 4.16 the above algorithm terminates after a finite number of steps providing the Newton correction. Also, every constructed approximationBl,kfulfills an indexed version of (5.9), i.e.,

(AFl¢T

ABl,k= (AFl)TJl. (5.10)

So there will be an index ¯klsuch thatBl,¯klis nonsingular and of sufficient approximation quality.

This vague statement will be concretized in the proof of Theorem 5.5. We useBl,¯klto compute the actual correctionδxl, i.e.,

Bl:=Bl,¯kl,

δxl=−B−1l Fl. (5.11)

Since the quantitiesW andU from (5.3) may depend onxand hence on an iteratexlwe use the notationW(l)andU(l)to indicate that possible dependency. Also, we introduce indices in the definition ofη, i.e.,

ηl:=kA¡ Bl−Jl¢

δxlk2

kAFlk2 . (5.12)

Verify that

kU(l)(Bl−1−Jl−1)Flk2+kU(l)Jl−1Flk2>kU(l)δxlk2.

Also, it holds that

kU(l)Jl−1Flk26kU(l)Jl−1A−1k2kAFlk2. Hence,

bhl:=ω· kAW(l)−1k2·

"

kU(l)(Bl−1−Jl−1)Flk2

kAFlk2 +kU(l)Jl−1A−1k2

#

·h

kU(l)(Bl−1−Jl−1)Flk2+kU(l)Jl−1Flk2

i

>ω· kAW(l)−1k2·kU(l)δxlk22

kAFlk2 =: ˜hl.

(5.13)

So introducing indices in the definition ofp(λ) we substitutebhlfor ˜hlto obtain the polynomial modelql(λ) on which the following result is based.

Theorem 5.5 (Global convergence)SupposeFfulfills Assumption 2.1 and additionally let the JacobianF(x)be nonsingular for allx∈ D. Assume that the nonlinearity bound (5.3)holds.

Consider the general level function(5.1)for nonsingularA∈Rn×nand letG(x|A)be the level set of the general level function atx, i.e.,G(x|A) :={z∈ D |T(z|A)6T(x|A)}. For somex0∈ D letD0denote the path-connected component ofG(x0|A)which containsx0. Assume thatD0is compact. LetB0,0∈Rn×nnonsingular and nonnegative constantsζ1,ζ2andζ3be given. Consider the iteration

xl+1=xllδxl, δxl=−Bl−1Fl, Fl:=F(xl), withλlbeing the unique minimizer in[0,1]of

ql(λ) :=¡

1−λ+12bhlλ2¢2

l2λ2+bhlηlλ3.

The matrixBlis given as in(5.11). The quantitiesηlandbhlare defined as in(5.12)and(5.13).

Then for eachlwithFl6= 0there exists an indexl6nsuch that I)Blis nonsingular, hence,δxlis well defined,

II) it holds that

Bl−1−Jl−1¢

Flk21, k¡

Bl−1−Jl−1¢ Flk2

kAFlk 6ζ2, ηl3

whereJl:=F(xl),

III)λlis well defined andλl>ε >0withεindependent ofl, IV) there is a constant06Cε<1independent oflsuch that T(xllδxl|A)6qll)T(xl|A)

6CεT(xl|A)< T(xl|A).

This implies that the sequencexlconverges to somexwithF(x) = 0.

The constantεdepends onD0andζ1,ζ2andζ3.

Proof.We define forxl∈ Dthe setDlin accordance withD0.

The proof is by induction. Assume thatF06= 0. SinceJ0is nonsingular and by means of Proposition 4.16 and Algorithm 5.1 there is an index 06 k¯0 6 nsuch thatB0,¯k0 =: B0 is nonsingular and (5.10) holds. Hence,δx0 = −B−10 F0 fulfills (5.2) for l = 0. Since also the nonlinearity bound (5.3) is assumed to hold, we can exploit the results of Theorem 5.1. Introducing indices in (5.5) and (5.6) we obtain

T(x0+λδx0|A)6p0(λ)T(x0|A) ∀λ∈Λ0. Recall from (5.13) thatbh0>˜h0. Hence, substitutingbh0for ˜h0inp0(λ) yields

p0(λ)6q0(λ) ∀λ∈Λ0

and therefore

T(x0+λδx0|A)6q0(λ)T(x0|A) ∀λ∈Λ0.

Also, we may argue like in the proof of Theorem 5.1 thatdq0(λ)|λ=0=−2 and thatq0is strictly convex on [0,1] and has a unique minimizerλ0 >0 in [0,1]. We show by contradiction that x0+λδx0∈ D0for all 06λ6λ0. For this, assume it is not the case. SinceDis open and nonempty and due to the compactness ofD0the setY0:={0< λ6λ0|x0+λδx0∈ D \ D0}is nonempty. By assumptionDis also convex. This means that for anyλ∈Y0we havex0+sδx0∈ D, 06s6λ. And for any suchs >0 we obtain by the above stated properties ofq0the estimate q0(s)< q0(0) = 1. Consequently,

T(x0+sδx0|A)6q0(s)T(x0|A)< T(x0|A), 0< s6λ, λ∈Y0.

But from this it follows thatx0+λδx0∈ D0for allλ∈Y0which contradicts the definition ofY0. Thus, we can conclude thatx0+λδx0∈ D0for all 06λ6λ0which means that

T(x0+λδx0|A)< T(x0) for all 0< λ6λ0. Hence, in particular

T(x1|A)< T(x0|A) and therefore

D1⊂ D0.

We turn to the question whether there is anε >0 such thatλ0>ε. Recall thatW andU in the nonlinearity bound (5.3) may depend continuously onx. If this is not the case thenW(x)≡ W∈Rn×nandU(x)≡U∈Rn×n, respectively. Anyway, the compactness ofD0yields for some positive constantsC1,C2andC3,

x∈Dmax0kU(x)F(x)−1F(x)k26C1,

x∈Dmax0kAW(x)−1k26C2, maxx∈D0kU(x)F(x)−1A−1k26C3.

W.l.o.g. we can also assume that ¯k0is chosen such that for the given nonnegative constantsζ12

andζ3the bounds

k¡ B−10 −J0−1

¢F0k21, k¡

B−10 −J0−1

¢F0k2

kAF0k 6ζ2, η03

hold. Hence, by the definition ofbhlin (5.13),

bhl6ωC22+C3)(ζ1+C1) =:C4

and for

q(λ) := (1−λ+12C4λ2)232λ2+C4ζ3λ3 we obtain

q0(λ)6q(λ) ∀λ∈Λ0.

Sinceqis of the same structure asp0 andq0 the properties stated in paragraph I) and II) of Theorem 5.1 are true forqas well. We denote the unique minimizer byεand define the nonnegative constantCεvia

Cε:=q(ε)<1.

Exploiting the properties stated in paragraph I) and II) of Theorem 5.1 forq0andqsome elementary considerations yield

λ0>ε >0.

Also,

q00)6Cε and hence T(x1|A)6CεT(x0|A).

Assuming forl>1 thatxl∈ DwithFl6= 0 andDl⊂ D0holds we argue in the same manner as before to prove that

•there is an index ¯klsuch thatδxlis well defined,

•xllδxl=xl+1∈ DlandDl+1⊂ Dl,

•λl>ε,

•T(xl+1|A)6CεT(xl|A).

With the assumptionDl⊂ D0the second of the above relations impliesDl+1⊂ D0. This yields either convergence in a finite number of steps or

l→∞limxl=x

completing the proof.

¥