Derivative-free Gauss-Newton-like Algorithm for Parameter Estimation

(1)

NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS

Derivative-free Gauss-Newton-like Algorithm for Parameter Egtimation

Sergei Scherbov E k t o r Golubkov

November

1 9 8 6 WP-86-63

Working P a p e r s a r e interim r e p o r t s on work of t h e International Institute f o r Applied Systems Analysis and have received only limited review. Views o r opinions e x p r e s s e d h e r e i n d o not necessarily r e p r e s e n t those of t h e Institute o r of i t s National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria

(2)

Foreword

The Population Program

at

IIASA deals with t h e analysis of consequences of demographic changes. To estimate t h e consequences one needs

to

analyze t h e d a t a from various s o u r c e s ,

to

develop t h e models, and t o identify t h e i r p a r a m e t e r .

This p a p e r by Dr. Scherbov and Dr. Golubkov develops t h e idea of p a r a m e t e r estimation using t h e derivative-free nonlinear least-squares algorithm. The algorithm w a s found efficient and convenient f o r many applications.

Anatoli Yashin Deputy Leader Population Program

(3)

Derivative-free Gauss-Newton-like Algorithm for Parameter Ektimation

Sergei Scherbov and R k t o r Golubkov Institute f o r Systems Studies (VNIISI)

Prospect 60 Let Octyabria, 9 117312 Moscow

USSR

An important problem of parameter estimation of different models using sta- tistical data about modeled process very often can be reduced t o t h e least-squares problem. In this p a p e r t h e derivative-free nonlinear least-squares algorithm, which w a s found very efficient in t h e sense of response function calculations, i s presented.

Let

4

⁽

⁼

^N ⁾ be t h e components of a n observed data vector

4 =

^(31,32,

.

,$,I,)

-

and $$ ( q ) (i

= TN)

be t h e components of

a

vector valued response function $(q)

=

($1(~),$2(~)...$N(q))T. Let q

=

( f i . ~ ~ . . . . , ~ ~ ) ~ be t h e vector of estimated parameters. Then, according t o t h e Generalized Least Squares Method (GLSM), t h e estimate

c ⁼

^(cl,c2,

^{. .} ^.

^,

^cnlT

could b e found

as

a solution

to

t h e following nonlinear programming problem

where P t i s

a

given symmetric, positively half-defined s q u a r e matrix of weights, and <a, b

>

means scalar production.

Gauss-Newton-like algorithms

are

known

to

be t h e most efficient iterative methods for solving problem (1). According

to

those algorithms the linear approximation of #(q) about t h e c u r r e n t value of parameter vector g i s calculated

at

each iteration, and a linear least-squares problem is solved t o obtain

a

new value of parameter q

.

In t h e presented algorithm, as in many o t h e r derivative-free Gauss-Newton- like algorithms [1,2,3,4], t h e linear approximation of #(q) is evaluated according

to

n +1 values of #(q) calculated

at

t h e previous iterations in o r d e r t o estimate

(4)

t h e new parameter v e c t o r and

to

pass t h e updated set of parameters

to

t h e next iteration. Gauss-Newton-like algorithms differ from each o t h e r generally by t h e linear approximation of # ( q ) and by use of t h e information obtained at t h e previous iterations. Let us dwell upon t h e most important f e a t u r e s t h a t distinguish the presented algorithms among o t h e r s of t h e same type.

Let us assume t h a t on t h e k-th iteration w e have q! .q$ ,

. . .

,q,k ^+l-estimates of solution computed

at

t h e previous iterations and a correspondent

set

of

#f,#$. . . .

which are necessary f o r the linear approximation of # ( q ) , where

#f =

# ( q f ) . The upper index shows t h e iteration number. Let us assume on t h e f i r s t iteration

F0, =

min

q

, where

q =

F ( q t ) .

1st <n +1

The new estimate of parameters q on t h e k-th iteration q;, will be

where A Q ~ is a n QP n matrix of parameter increments and Aqk i s a v e c t o r t h a t will be evaluated on t h e k-th iteration.

The linear approximation y ( q ) of t h e v e c t o r # ( q ) could be written in a form

whose matrix ( AX - i s derived from n + l previously calculated values AQ

q f ,

# f ,

^{( i}

=

1," + 1 ) t o satisfy conditions

Supposing nonsingularity of matrix A@, w e obtain

The c l o s e r qf are

to

each o t h e r , t h e higher is the accuracy of approximation ( 3 ) . Vector is a solution of

(5)

and i s calculated from

The new (k +I)-th estimation qk

zi

^{i s}

where h k i s a s t e p length along Aqfj, a direction which is obtained from a one- dimensional function minimization

@

(A)

=

F(Q;

+

h A&)

or

under t h e condition t h a t F(q; f,;)

<

^F(q; In t h e p r e s e n t version of t h e algorithm, a one- dimensional minimization w a s based on a second-order approximation of

@

(A) and use of t h e well-known f a c t t h a t in r e g u l a r cases h k , which minimizes F k ( h ) , ap- p r o a c h e s

to

a value close

to

1 during t h e Gauss-Newton algorithm's convergency.

The ( k +l)-th i t e r a t i o n begins with t h e calculation of t ( q k + l ) and t h e n t h e

AS

^{and AQk} ^matrices^arecalculated. Before describing t h e s e matrix calculations we should mention t h a t during t h e i t e r a t i v e p r o c e s s t h e non-singularity of

-

AQk must b e controlled, where

and ( ( Aqf ( ( i s a s t a n d a r d Euclidian norm of t h e v e c t o r Aqf.

That is because ill conditionality of AQ leads -k

to

ill conditionality of AQk and A*, and consequently of A ~ , which, in t u r n , leads

to

t h e worsening of t h e algorithm's convergency.

Let Ed b e t h e minimal feasible value of

1

det -1: AQ

1

when convergency is still normal. Then AQk (k

r

1 ) should b e constructed in a way t h a t

( d e t

Gk+ll ^{r E d}

when k 2 1 ; O < E d

.

⁽⁴) Let

us

assume t h a t (4) i s t r u e . Then AQk+l building s t a r t s with changing in AQk t h e column whose number i s Lk according to:

where L i s defined from

(6)

Lk

=

Argmin IS: I

.

1Si Sn

The values of

sf

( i

= G)

^ares c a l a r coefficients in t h e r e p r e s e n t a t i o n of AqH

which may b e obtained from (3'), (3"). and (3"').

Such a choice of Lk provides t h e best conditionality of

L\gk+l.

This could b e easily derived from

From this expression and from (5) i t follows t h a t in

&

t h e column whose changing provides t h e best conditionality of

Zk+l

and i s substituted by hkAqj$. If

1

det

zk ¹

²^{E d ,}^{then A$} i s constructed by changing one column with number Lk according

to:

One of t h e distinctions of t h e presented algorithm from o t h e r v a r i a n t s of derivative-free Gauss-Newton-like algorithms i s t h e choice of t h e column in A@

t h a t will be substituted by

qi:i - qk+l

a f t e r t h e k-th iteration. In those algorithms t h e column t h a t w a s calculated b e f o r e t h e o t h e r (on iteration with t h e s m a l - lest number) and hence usually c o r r e s p o n d s

to

t h e values of p a r a m e t e r s which d i f f e r mostly from t h e i r c u r r e n t values i s substituted by

qt fi -

q t + l , or which i s t h e same as ( h A&). Thus t h e v e c t o r s of p a r a m e t e r s which contribute

to

t h e com- putation of AQk are all t h e time "pulled up"

to

t h e i r latest value. I t means t h a t t h e columns of AQk are generally calculated with t h e p a r a m e t e r s which a r e close

to

t h e i r c u r r e n t value.

In t h e algorithms mentioned above, all t h e columns of AQ will b e renewed during

n

i t e r a t i o n s and t h e approximation c o r r e c t n e s s (3) will t a k e place. For p r a c - tically widely-spread objective functions with long valley, t h e i r lead s u r f a c e in p a r a m e t e r s p a c e i s badly s t r e t c h e d along some directions. It i s a well-known f a c t t h a t descent directions f o r t h e s e functions are e a r l y tangent

to

level s u r f a c e , and t h e r e f o r e they are v e r y close

to

t h e direction in which level s u r f a c e i s s t r e t c h e d .

(7)

Thus, while i t e r a t i o n number k i n c r e a s e s i t c a n o c c u r t h a t t h e columns of

zk

^and

ACk, and t h e direction of descent A& may be calculated i n c o r r e c t l y due

to

computational e r r o r s .

In t h e p r e s e n t e d algorithm t h e worsening of conditionality of

sk

o c c u r s essentially r a r e l y t h a n in many o t h e r algorithms of t h e same type. This was proved by numerical experiments. That i s because in t h e

Gk

of t h e algorithm presented h e r e i t i s not t h e columns which w e r e calculated e a r l i e r are substituted, as in [2,3,4], but r a t h e r those which provide t h e maximum l i n e a r independence of t h e columns in

Gk

^+I. This i s illustrated by Figures 1 and 2. In F'igure 1 , substitution of columns in AGk i s shown as i t i s always done in most of Gauss-Newton-like algorithms. In Figure 2 , t h e same i s shown f o r a n algorithm presented h e r e .

During functioning of t h e presented algorithm i t could happen t h a t some of t h e columns of AQk w e r e not substituted f o r a long period. That means t h a t in this case t h e s e columns w e r e calculated with p a r a m e t e r s t h a t w e r e f a r from t h e i r c u r r e n t values. In t h i s case l i n e a r approximation (3) may not b e valid, and descent direction A q h will b e calculated i n c o r r e c t l y although condition (4) i s satisfied.

Getting r i d of t h e algorithm o p e r a t e s in such a way t h a t during a given number of iterations all t h e columns of AQk, and hence A x k are renewed. This i s achieved in t h e following way.

Let us assume w e are a t k ' s iteration and h a s a l r e a d y been calculated.

Let us denote n: ( i

=

1 5 ) as t h e number of substitutions of t h e i - t h column of AQk during k iterations, and lk i s t h e s e t of t h e column's numbers

where

n, k

=

max n: , k

=

0.1.2

,... .

1st sn

Ng-algorithm's c o n t r o l p a r a m e t e r which a c c e p t s i n t e g e r values and satisfies t h e condition NO 2 1 (No

=

0 c o r r e s p o n d s t o t h e case without "pulling up" t h e columns because in this case lk r l o ) . Then t h e column with number Lk t h a t will b e substi-

(8)

F i g u r e 1.

F i g u r e 2 .

(9)

tuted by h k A g k instead of (5) and (5') we shall find from:

Ik

=

Argmax (

$

(

t € I k

- k + l

If a f t e r substitution of Ik column ( d e t AQ

I

r E d , then w e pass

to

t h e (k +I)-th iteration as a l r e a d y described.

Otherwise t h e ortogonalization p r o c e d u r e i s specified in t h e algorithm. (This i s a l s o t h e important f e a t u r e of t h e algorithm.) According

to

t h i s p r o c e d u r e each column Aqf +I; i E lk \ Ik of i s tested f o r substitution by Agf which i s a n

-k +l

ortogonal complement

to

t h e subspace spanned upon t h e o t h e r columns of AQ and f o r e a c h c a s e t h e determinant of t h u s modified matrix i s calculated. Vector Agf i s calculated by:

&fS1

^, ^when

¹

Aq&pll 2

r j

66'' ⁼

@bk+'/ ( 1

+

bk +') when

1

AqfG1 (

< r j

where a and @ a r e positive c o n t r o l p a r a m e t e r s of algorithm;

rj

(j

= -

1 . n ) i s a vec- t o r of precisions f o r p a r a m e t e r estimation (iterative p r o c e s s s t o p s when simul-

k

-

taneously iqt::,j -qn+l,jl S r j , and JAqk, ( S r j ; j = l , n ) ;

Gf+l

^{i s}a unit vec-

tor

which i s ortogonal

to

t h e o t h e r columns Aqt+' ( r

+

^{i ,}

^r ⁼ ^i,h);

^and^ji s t h e v e c t o r ' s component number.

If (4) i s t r u e then substitution of corresponding column by ortogonal column i s performed and t h e i t e r a t i o n p r o c e s s continues. If during examination of all t h e columns Aq;+', i E r k \ L k , condition (4) could not be satisfied, t h e substitution t h a t provides a maximum value of ( d e t G k

+' ¹

i s made. Then t h e p r o c e d u r e i s r e - peated b u t those columns of A@+' a r e substituted by ortogonal columns which have not been substituted yet.

(10)

If all Aq;

+',

ⁱ^E

^rk

^\^Lk a r e substituted but condition (4) i s still not satisfied, then i t is allowed t o examine for substitution the columns whose numbers initially were not included in

rk

besides columns with number Lk. The condition (4) will be satisfied not more than for

n

-1 substitutions of columns of A@+' by ortogonal

as

all columns in AQk+' are ortogonal. During ortogonalization a f t e r substitution of t h e i - t h column in AQk" by the corresponding ortogonal one

&;+',

t h e value

#(qk

+

^&!+') i s calculated and the i-th column of A C ~ + ' i s substituted by

#(qk

+

^AQt+')

-

^{#(qk) and}

nt+' ⁼ nt ⁺

1. I t should be mentioned t h a t while

test-

ing (4) during ortogonalization after substituting column by the corresponding ortogonal one, determinant d e t ( Z k + ' ) i s calculated recursively. This reduces the computational efforts.

The n + l initial values of parameters q required f o r t h e algorithm are gen-

-

₀

e r a t e d f r o m starting value Q:+~. For i

=

^{l , n ,}

QP

i s computed f r o m q, +' by changing i t s i-th component by non-zero s t e p s h i . The corresponding

set

of

(10

^{( i}

=

^{1 7 )}is also calculated. The convergency of the algorithm presented h e r e w a s analytically proved.

The comparative numerical analysis of this algorithm with several well-known and efficient derivative-free algorithms f o r minimization t h e sum of square w a s made. The total number of funation evaluations

to

find minimum with a given precision w a s considered

as

efficiency c r i t e r i a of algorithms.

This c r i t e r i a is computer-independent and c h a r a c t e r i z e s r e a l computational expenditures in cases when #(q) evaluation demands much more efforts than algorithm needs itself. In this p a p e r w e present some results of comparison of discussed algorithm with

t w o

derivative-free Gauss-Newton-like algorithms.

First i s Powell's method for least-square problems. I t s code w a s taken from the Harwell Subroutine Library (HSL). I t s code i s VAOZA. The second method is

a

compromise between t h r e e different algorithms f o r minimizing a sum of squares, namely Newton-Raphson, Steepest Descent, and Marquardt. I t s code is VA05A, also from HSL.

The last two were compared by many authors with different o t h e r algorithms and were discovered

as

being very efficient. Comparison of the discussed algo- rithm with VAOZA and V A 0 5 A w a s performed on standard

test

problems found in the literature.

(11)

Four

test

problems w e r e considered.

1. F ( q

=

100(qf

-

^q2)

⁺

^{( 1}

-

^q112

6 =

( l , l ) T , F ( C )

=

0

H e r e

6

^{i s}a v e c t o r of p a r a m e t e r s where minimum occurs. Minimization of functions 1-4 begins from different s t a r t i n g values. The obtained r e s u l t s are p r e s e n t e d in Table 1. The algorithm described in this p a p e r i s denoted MMGN.

H e r e NFE i s t h e number of function calculations; log(F) i s t h e logarithm of convergency e r r o r : log(F)

=

logF(q k I) where k, i s t h e

last

i t e r a t i o n number. The sym- bol

*

means t h a t t h e algorithm failed

to

find t h e solution.

From Table 1 it c a n b e s e e n t h a t o u r algorithm in all cases e x c e p t one needs less function evaluations t h a n t h e o t h e r two

to

converge with t h e same precision.

Only in one case f o r function 1 and s t a r t i n g v e c t o r (-1.2,l) algorithm VAOSA p e r - formed less function evaluations than MMGN. But, f o r example, f o r function 3 VAOSA failed once, And VAOZA failed all t h e times while MMGN succeeded. If w e compare all presented r e s u l t s then VAOSA and VAOZA spend, on t h e a v e r a g e , 2.2 times more of function evaluations

to

find t h e solution t h a n MMGN. (The cases where VAOSA and VAOZA failed are not taken into account.)

(12)

Table 1.

Method NFE log(F) NFE log(F) NFE log(F) NFE log(F) Function 1

MMGN VA05A VAOZA Function 2 MMGN VAO5A VA02A

Function 3 (0.1) (-1.1) (01-1) @lo)

MMGN 35 -14 73 -0 119 -14 7 2 -0

VAOSA 1 2 3 -10 1 4 3 -11

* *

²⁴⁴ ^-0

VAOZA

* * * * * * * *

Function 4 (10110,10,-10 (10,10,10,10)

MMGN 25 -15 35 -15

VA05A 57 -15 57 -15

VA02A 6 3 -15 6 3 -15

Thus MMGN w a s found more efficient than algorithms f o r HSL l i b r a r y . This gives r e a s o n t o e x p e c t t h a t i t will b e m o r e efficient than many of t h e algoritms t h a t w e r e compared with t h e l a t e s t two and w e r e found less efficient. Besides as i t follows from [6] t h e MMGN algorithm was more efficient in many cases f o r demograph- i c d a t a fitting in comparison with t h e algorithms from t h e IMSL l i b r a r y . In conclu- sion i t should b e noted t h a t t h e p r e s e n t e d algorithm w a s widely used f o r d a t a pro- cessing and f o r p a r a m e t e r estimation in demographic, socio-economic, ecological and o t h e r models. Big p r a c t i c a l e x p e r i e n c e dealing with t h i s algorithm proved

its

reliability and efficiency.

(13)

REFERENCES

[I] Gelovani, V., V. Golubkov, and S. S c h e r b o v (1979) Parameter E s t i m a t i o n b y

Derivative-free Analog of Gauss-Newton Method, Vol. 8 (in Russian). Moscow:

VNIISI.

[2] Ralston, M. and R. J e n r i c h (1978) Dud, a derivative-free algorithm f o r non- l i n e a r l e a s t s q u a r e s . Tecnometrics Vol. 20, No. 1.

131

Peckham, G. (1970) A new method f o r minimizing a sum of s q u a r e s without calcu- lating gradients. Computer Journal, No. 7.

[4] Vereskov, A. N. Levin, and V. Fedorov (1980) Regularised derivative-free l e a s t s q u a r e s algorithm. I s s u e s of C y b e t n e t i c s , No. 7 1 (in Russian).

[5] Hoppez, M.J. (Ed.) (1978) H a m e l l S u b r o u t i n e L i b r a r y , WI3.A Report. AERE-R- 9185.

[6] Rogers, A. and F. Planck (1983) A General Program for E s t i m a t i n g P a r a m e t r i z e d Model Schedules of F e r t i l i t y, Mortality, Migration, a n d Mari- t a l a n d Labor Force S t a t u s R a n s i t t o n s . Working P a p e r W-83-102. Laxen- burg, Austria: International Institute f o r Applied Systems Analysis.

Derivative-free Gauss-Newton-like Algorithm for Parameter Estimation

Derivative-free Gauss-Newton-like Algorithm for Parameter Egtimation

November

at

to

to

Derivative-free Gauss-Newton-like Algorithm for Parameter Ektimation

4

=

4 =

.

-

= TN)

a

=

=

c =

. . .

cnlT

as

to

a

>

are

to

to

at

a

.

to

at

to

to

. . .

at

set

#f,#$. . . .

#f =

F0, =

q

q =

# f ,

=

to

zi

@

=

+

or

<

@

to

to

AS

-

to

to

1

1

r

Gk+ll r E d

.

us

=

.

sf

= G)

L\gk+l.

&

Zk+l

1

zk 1

to:

qi:i - qk+l

to

qt fi -

to

to

to

n

⁼

c ⁼

^{. .} ^.

^cnlT

Gk+ll ^{r E d}

zk ¹

¹

66'' ⁼

^r ⁼ ^i,h);

+' ¹

^rk

nt+' ⁼ nt ⁺

⁺