• Keine Ergebnisse gefunden

Conjugate Gradient Method

Im Dokument Computer Simulations (Seite 62-67)

Conjugate Directions

Steepest Descent often finds itself taking steps in the same direction as earlier steps. The idea is now to find a suitable set ofN orthogonalsearch directions d0, d1, . . . , dN−1 (they will be ”A-orthogonal”). Together they form abasisofRN, so that the solution vectorxcan be expanded as

x = x0 +

N−1

X

i=0

λidi . (2.18)

The goal is to take exactly one step in each search direction, with the right coefficientλi. AfterN steps we will be done. To accomplish this with low computational effort will require a special set of directions, and a new scalar product redefining "orthogonality".

For the first step, we setx1 =x00d0. At stepn+ 1we choose a new point at the line minimum

xn+1 =xnndn (2.19)

etc., untilxN =xafterN steps.

Let us collect a few simple relations which follow directly from (2.18) and (2.19). It is useful to introduce the deviation (”error vector”) from the exact solutionx(which contains the steps yet to be done)

en+1 := xn+1−x (2.20a)

= en + λndn (2.20b)

= −

N−1

X

i=n+1

λidi . (2.20c)

Similar relations hold for the residualsrn, namely

rn+1 := −∇f(xn+1) = b−Axn+1 (2.21a)

= b−Ax

| {z }

=0

−Aen+1 (2.21b)

= A

N−1

X

i=n+1

λidi , (2.21c)

and also the recursion

rn+1 =rn − Aλndn. (2.21d)

x2

x = x x3

λ d1 1

2d2

λ

λ 0d = x0 1

N

e3 e2

e1

Figure 2.3: Sketch of the expansion of the solutionxin directionsdi (with x0 = 0). This figure is a projection from a higher dimensional space; there-fore there are more than two "orthogonal" directionsdi.

The expansion is sketched in Fig.2.3.

Let us now see what would happen if we would demand that the di-rectionsdn obey the usual 90 degree orthogonalitydTi dj = 0fori6= j. We want the search directiondnto be orthogonal to all other search directions, including all future directions making up the error vectoren+1:

0 =? dTnen+1 (2.20b)= dTnen + dTnλndn. This would imply

λn =? −dTnen dTndn .

However, this equation is not useful, because thisλncannot be calculated without knowing the en; but if we knew en, then the problem would al-ready be solved. This solution for λn would also mean that we totally ig-nore the line minima during search (e.g. when we try to reach the solution in Fig.2.1 with 2 vectors that are at 90 degrees to each other.)

Line minimum

The successful strategy is instead to stillfollow each search directiondiuntil the line minimumist reached. Thus

0 =! d

dλf(xn+1) = ∇f(xn+1)T d dλxn+1

= −rTn+1dn (2.22)

(2.21d)

= −dTnrn + dTnAdnλn and we get

λn = dTnrn dTnAdn

. (2.23)

This equation for the line minimum is valid for any set of search directions.

It contains the current search direction dn and the current residuum rn, which are both known.

Orthogonality

Eq. (2.22) tells us again that at the line minimum, the current search direc-tion is orthogonal to the residuum. We rewrite this equadirec-tion:

0 = −dTnrn+1 (2.21c)= −dTnA

N−1

X

i=n+1

λidi (2.20c)= dTnAen+1 . (2.24) Our goal is that all search directions are "orthogonal". As argued above, this means that dn should be orthogonal to en+1. This is consistent with (2.24), if weuse a new scalar product

(u,v)A := uT Av. (2.25)

to define orthogonality.

We nowdemandthat all search directions are mutually”A-orthogonal”

dTi Adj = 0 (i6=j) (2.26) We will construct such a set of”conjugate”directions di. Since uT Avis a scalar product, these vectors form an orthogonalbasisofRN, in which the solution vectorxcan be expanded as in (2.18).

We are therefore also guaranteed that the solution will takeat mostN steps(up to effects of rounding errors).

From (2.21c) and (2.26) we can deduce dTmrn = dTmA

N−1

X

i=n

λidi = 0 form < n , (2.27) meaning that the residualrnis orthogonal in the usual sense (90 degrees) to all old search directions.

Gram-Schmidt Conjugation

We still need a set ofN A-orthogonal search directions{di}.There is a sim-ple (but inefficient) way to generate them iteratively: Theconjugate Gram-Schmidt process.

Let{ui} withi = 0, . . . , N −1be a set ofN linearly independent vec-tors, for instance unit vectors in the coordinate directions. Suppose that the search directionsdk fork < i are already mutually A-orthogonal. To construct the next direction di, take ui and subtract out all components that are notA-orthogonal to the previousd-vectors, as is demonstrated in Fig. 2.4. Thus, we setd0 =u0 and fori >0we choose

di =ui+

i−1

X

k=0

βikdk, (2.28)

u0

u1

d0 d

0

d1 u

u*

+

Figure 2.4: Gram-Schmidt conjugation of two vectors. Begin with two lin-early independent vectors u0 and u1. Set d0 = u0. The vector u1 is com-posed of two components: u which is A-orthogonal (in this figure: 90 degrees, in general: A-orthogonal) to d0, and u+ which is parallel to d0. We subtractu+, so that only theA-orthogonal portion remains:d1 =u = u1−u+ =: u110d0.

with the βik defined for k < i. To find their values we impose the A-orthogonality of the new directiondiwith respect to the previous ones:

0 i>j= dTi Adj = uTi Adj+

i−1

X

k=0

βikdTkAdj

= uTi AdjijdTjAdj . (2.29) βij = −uTi Adj

dTjAdj. (2.30)

Note that in the first line, the sum reduces to the term k = j because of mutualA-orthogonality of the previous search vectorsdk,k < i.

Equation (2.30) provides the necessary coefficients for (2.28). However, there is a difficulty in using this method, namely that all the old search vectors must be kept and processed to construct the new search vector.

Construction from Gradients

This method is an efficient version of Gram-Schmidt. We will not need to keep the old search vectors. The new ansatz here is to choose a specific set of directionsui, namely

ui := ri. (2.31)

First we now use the fact that the residual is orthogonal to all previous search directions, (2.27). Together with (2.28) we get, fori < j and for as yet generalui:

0 i<j= dTi rj (2.28)= uTirj+

i−1

X

k=0

βikdTkrj

| {z }

=0(2.27)

0 = uTrj . (2.32a)

In the same way one gets

dTi ri =uTi ri. (2.32b) With our particular choiceui =ri, (2.32a) becomes

rTi rj = 0, i6=j . (2.33) We see that all residue vectorsri will actually be orthogonal to each other.

We can now compute the Gram-Schmidt coefficients (2.30). The recur-sion (2.21d) implies

rTi rj+1 = rTirj−λjrTi Adj .

The last term corresponds to the nominator in (2.30). Because of the or-thogonality (2.33), it simplifies to

rTi Adj =

Thus we obtain all the coefficients needed in (2.28):

βij =

Most of theβij terms have now become zero. It is no longer necessary to store old search vectors to ensure A-orthogonality. We now denote, for simplification,βi :=βi,i−1and plug inλi−1 from (2.23) to get the final form

Putting all our results together we obtain theConjugate Gradient algo-rithm, which is presented in symbolic form as algorithm 4.

Comments

Note that the first step of the Conjugate Gradient method is the same as for steepest descent. Convergence is again estimated by the norm ofrn.

The computationally most expensive part of the algorithm is one Matrix-vector multiplicationAdnper step. It can be implemented efficiently, espe-cially for sparse matrices. All other parts are vector-vector multiplications.

Because of rounding errors, orthogonality can get lost during the iteration.

Algorithm 4Conjugate Gradient method for quadratic functions Choose a suitable initial vectorx0

Setd0 :=r0 =−∇f

x0 =b−Ax0 forn= 0toN do

an=Adn

λn = (rTnrn)/(dTnan) xn+1 =xnndn rn+1 =rn−λnan

βn+1 = (rTn+1rn+1)/(rTnrn) dn+1 =rn+1n+1dn

if convergedthenEXIT end for

It can therefore be advantageous to occasionally calculate the residue di-rectly asrn+1 =b−Axn+1, which involves a second matrix multiplication.

In each step, the Conjugate Gradient algorithm invokes one multipli-cation withA. The solution is thus in fact constructed in the space spanned by the vectors{r0,Ar0,A2r0,A3r0, . . .}, which is called aKrylov space. There are a many other related methods also acting in Krylov space. One of the most important ones is the closely relatedLanczos algorithmfor computing low lying eigenvalues of a big (sparse) matrix. It was developed before CG. Other methods exist for nonsymmetric matrices.

The numerical stability and convergence of CG is again governed by thecondition numberof the matrixA. It can be improved greatly by a class of transformations calledPreconditioning. Instead of solvingAx = b, one solves the equivalent equationM−1Ax=M−1b. The ideal case would be the exact solutionM−1 =A−1, giving the identity matrixM−1Awith con-dition number unity. Much simpler transformations can already improve convergence greatly, e.g. just takingM to be the diagonal ofA.

Im Dokument Computer Simulations (Seite 62-67)