• Keine Ergebnisse gefunden

Adaptive Nonmonotonic Methods With Averaging of Subgradients

N/A
N/A
Protected

Academic year: 2022

Aktie "Adaptive Nonmonotonic Methods With Averaging of Subgradients"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

ADAPTIW NONMONOTONIC METHODS WITH AVERAGING OF SUBGRADIENTS

N.D.

Chepurnoj

July 1987 WP-87-62

Working Papers are interim r e p o r t s on work of t h e International Institute f o r Applied Systems Analysis and have r e c e i v e d only limited review. Views o r opinions e x p r e s s e d h e r e i n d o not n e c e s s a r i l y r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria

(2)

FOREWORD

The numerical methods of t h e nondifferentiable optimization are used f o r solv- ing decision analysis problems in economic, engineering, environment and agricul- t u r e . This p a p e r i s devoted t o t h e adaptive nonmonotonic methods with averaging of t h e subgradients. The unified a p p r o a c h is suggested f o r construction of new deterministic s u b g r a d i e n t methods, t h e i r s t o c h a s t i c finite-difference analogs and a p o s t e r i o r i estimates of a c c u r a c y of solution.

Alexander B. Kurzhanski Chairman System a n d Decision Sciences Program

(3)

CONTENTS

1 Overview of Results in Nonmonotonic Subgradient Methods

2 Subgradient Methods with Program-Adaptive Step-Size Regulation

3 Methods with Averaging of Subgradients and

Program-Adaptive Successive Step-Size Regulation 4 Stochastic Finite-Difference Analogs t o Adaptive

Nonmonotonic Methods with Averaging of Subgradients 5 A P o s t e r i o r i Estimates of Accuracy of Solution to

Adaptive Subgradient Methods and Their Stochastic Finite-Difference Analogs

R e f e r e n c e s

(4)

ADAPTIVE N O ~ O N O T O N I C MlCI'HODS WITH AVERAGING OF SUBGRADEXTI'S

N.D. Chepurnoj

1. OVEBYIEW OF

RESULTS

IN NONMONOTONIC SUBGEWDIENT METHODS

Among t h e existing numerical methods of solution of nondffferentiable optimi- zation problems, t h e nonmonotonic subgradient methods hold an important position.

The pioneering work by N.Z. S h o r [26] gave impetus t o t h e i r explosive pro- g r e s s . In 1962, h e suggested an i t e r a t i v e p r o c e s s of minimization of convex piecewise-linear function named afterwards t h e generalized gradient descent

(GGD):

where gS E a f ' ( x S ) i s a set of subgradients of a function f ' ( x ) at a point x S ; rs r 0 i s a s t e p size.

For t h e differentiable functions this method a g r e e s v e r y closely with t h e well-known subgradient method. The fundamental difference between them i s t h a t t h e motion direction (- g ) in ( 1 . 1 ) is, a s a r u l e , not a descent direction.

A t t h e f i r s t attempts to substantiate theoretically t h e convergence of pro- c e d u r e s of t h e type ( 1 . 1 ) t h e r e s e a r c h e r s immediately faced two difficulties. For one thing, t h e objective function lacked t h e p r o p e r t y of differentiability. For a n o t h e r , method ( 1 . 1 ) w a s not monotonic. These combined f e a t u r e s r e n d e r e d im- p r a c t i c a l t h e use of known g r a d i e n t p r o c e d u r e convergence theorems.

New t h e o r e t i c a l a p p r o a c h e s t h e r e f o r e became a must.

One more "misfortune" came on t h e neck of t h e others: numerical computa- tions demonstrated t h a t GGD h a s a low convergence r a t e .

Initially g r e a t hopes were pinned on t h e step-size selection s t r a t e g y as a way towards overcoming t h e crisis.

(5)

By t h e e a r l y 1970s difficulties caused by t h e formal substantiation of conver- gence of nonmontonic subgradient p r o c e d u r e s had been mastered and different ap- p r o a c h e s to t h e step-size regulation had been offered [6, 7 , 8 , 1 9 , 20, 261. However t h e computations continued t o prove t h e p o o r convergence of GGD in p r a c t i c e .

I t c a n b e said t h a t t h e f i r s t s t a g e in GGD evolution w a s o v e r in 1976.

Thereupon t h e numerical methods of nondifferentiable optimization developed in t h r e e directions, i.e., methods with s p a c e dilation, monotone, and adaptive non- monotonic methods w e r e explored.

Let us dwell on e a c h of t h e s e approaches.

In a n e f f o r t t o enhance t h e GGD efficiency,

N.Z.

S h o r e l a b o r a t e d methods where t h e o p e r a t i o n of s p a c e dilation in t h e direction of a subgradient and a difference between t w o successive subgradients w a s employed. Literally t h e n e x t f e w y e a r s w e r e prolific f o r p a p e r s [27, 28, 291 investigating i n t o t h e s p a c e dilation o p e r a t i o n in nondifferentiable function minimization problems. A high rate of con- v e r g e n c e of t h e suggested methods w a s c o r r o b o r a t e d theoretically.

Computational p r a c t i c e a t t e s t e d convincingly to t h e advantageousness of ap- plication of t h e algorithms with s p a c e dilation, especially t h e r-algorithm [29], as a l t e r n a t i v e t o GGD, providing dimensions of t h e s p a c e d o not e x c e e d 200 t o 300.

However, if dimensions are ample, f i r s t , a considerable amount of computations is s p e n t on t h e s p a c e dilation matrix transformation, second, some e x t r a capacity of computer memory i s r e q u i r e d .

The monotonic methods became a n o t h e r essential direction.

Even though t h e f i r s t p a p e r s on t h e monotonic methods a p p e a r e d back in 1968 (V.F. Dem'janov [30]), t h e i r p r o g r e s s r e a c h e d i t s peak in t h e e a r l y 70's. Two classes of t h e s e algorithms should b e distinguished h e r e : t h e &-steepest d e s c e n t 15, 301 a n d t h e &-subgradient algorithms [31-341. W e s h a l l not examine them in de- tail b u t note, t h a t t h e monotonic methods o f f e r e d h i g h e r r a t e of convergence a s against GGD. J u s t as with t h e methods using t h e s p a c e dilation, v a s t dimensions of problems to be solved still remained Achilles' heel f o r t h e monotonic algorithms.

Thus, t h e nonmonotonic s u b g r a d i e n t methods have come into p a r t i c u l a r impor- t a n c e in t h e solution of large-scale nondifferentiable optimization problems.

The nonmonotonic p r o c e d u r e s have a n o t h e r important o b j e c t of application, a p a r t from t h e large-scale problems, i.e., t h e problems in which t h e subgradient cannot b e precisely defined at a point. The latter encompass problems of identifi- cation, learning, a n d p a t t e r n recognition [I, 211. The minimized function i s t h e r e a

(6)

mathematical expectation whose distribution law is unknown. E r r o r s in subgradient calculation may s t e m from computation errors and many o t h e r r e a l p r o c e s s e s .

Ju.M. Ermol'ev and Z.V. Nekrylova [9] w e r e t h e f i r s t to investigate t h e like p r o c e d u r e s . Stochastic programming problems have increasingly drawn t h e atten- tion to t h e nonmonotonic subgradient methods.

However, as pointed out e a r l i e r . GGD, widely used, r e s i s t a n t to e r r o r s in subgradient computations, saving memory capacity, s t i l l had a p o o r rate of conver- gence. Of g r e a t importance t h e r e f o r e w a s t h e construction of nonmonotonic methods s u c h t h a t , on t h e one hand, r e t a i n all advantages of GGD and, on t h e o t h e r , possess a high rate of convergence.

I t h a s been t h i s requirement t h a t h a s l e t to elaboration of t h e adaptive non- monotonic p r o c e d u r e s .

An analysis r e v e a l e d t h a t t h e Markov n a t u r e of GGD is t h e chief c a u s e of i t s slow convergence. I t is quite obvious t h a t t h e use of t h e m o s t intimate knowledge of p r o g r e s s of t h e computations is indispensable t o selection of t h e direction and re- gulation of t h e stepsize.

S e v e r a l ideas provided t h e basis f o r t h e development of adaptive nonmonoton- i c methods.

The major c o n c e p t of all techniques f o r selecting t h e direction and regulating t h e step-size w a s t h e use of information about t h e fulfillment of n e c e s s a r y condi- tions to have t h e extremal-value function.

I t s implementation are t h e methods with averaging of t h e subgradients.

In t h e most g e n e r a l case by t h e operation of averaging i s meant a p r o c e d u r e of "taking" t h e convex hull of a n a r b i t r a r y finite number of v e c t o r s .

The operation of averaging in t h e numerical methods w a s f i r s t applied by Ja.2.

Cypkin [ Z Z ] and Ju.M. Ermol'ev [ll].

The p a p e r by A.M. Gupal and L.G. Bazhenov [3] also dealing with t h e use of o p e r a t i o n of averaging of s t o c h a s t i c estimates of t h e generalized g r a d i e n t s ap- p e a r e d in 1972.

However all t h e above p a p e r s considered t h e program regulation of t h e step- size, i.e., a sequence [ r , ] independent of computations w a s selected such t h a t

(7)

The n e x t n a t u r a l s t a g e in t h e evolution of this concept w a s t h e construction of adaptive step-size regulation using t h e operation of averaging of preceding subgradients.

In 1974, E.A. Nurminskij and L.A. Zhelikovskij [I81 suggested a successive p r o g r a m a d a p t i v e regulation of t h e step-size f o r t h e quasigradient method of minimization of weakly convex function.

The c r u x of this relation consists in t h e following.

Let an i t e r a t i v e sequence be constructed according to t h e r u l e

where g S E a j ( z s ) i s a quasi-gradient of t h e function j ( z ) at t h e point z S , r o i s a constant step-size.

Assume t h a t t h e r e e x i s t

z

E En and numerical p a r a m e t e r s t

>

0 ,

6 >

0 such t h a t f o r any s

=

0 , 1, 2,

...

llzS

- G 11

5 6. Let us suppose also t h a t a convex combi- nation of subgradients

tgi if

S

LO

e x i s t s such t h a t IlesolI S t,

e S D E conv

tgi

ji S :0

.

Then t h e point

z

i s sufficiently close to t h e set

X ' =

argmin j ( z ) according to t h e n e c e s s a r y extremum conditions. In t h e given case t h e step-size h a s to be reduced and t h e p r o c e d u r e r e p e a t e d with t h e new step-size value r l s t a r t i n g at t h e ob- tained point z S D . The numerical realization of t h e described algorithm r e q u i r e s a specific r u l e f o r constructing v e c t o r s eS'. In [10] t h e v e c t o r eS' is c o n s t r u c t e d by t h e r u l e os'

=

P r o j O/conv g k k ' s , S t h a t is, all quasi-gradients are included into t h e convex hull s t a r t i n g from t h e m o s t r e c e n t instant of t h e step-size change.

Numerical computations b o r e o u t t h e expediency of making allowances f o r such re- gulation. However a g r a v e disadvantage w a s i n h e r e n t in it: t h e g r e a t laboriousness of i t e r a t i o n . Considering t h a t t h e a p p r o a c h as a whole holds promise, averaging schemes had to be developed f o r t h e efficient use when selecting t h e d i r e c t i o n and regulating t h e step-size.

This p a p e r treats such averaging schemes. They s e r v e as a foundation f o r new nonmonotonic s u b g r a d i e n t methods, f o r t h e description of s t o c h a s t i c finite- difference analogs, a p o s t e r i o r i estimates of solution a c c u r a c y . P r i o r to discuss- ing r e s u l t s , let us make s o m e g e n e r a l assumptions. Presume t h a t t h e minimization problem i s being solved on t h e e n t i r e s p a c e of t h e function j ( z ) :

(8)

where En i s a n n-dimensional Euclidean space. The function j'(z) will b e every- where thought of as being t h e convex eigenfunction j'(z), dom j' = E n , t h e sets

[z : /(z) 5 c j being bounded f o r any bounded constant C. The set of solutions of t h e problem (*) will b e believed to be t h e s e t

2. SUBGRADIENT KETHODS WITFi PROGRAM-ADAPTIVE STEP-SIZE ImGULATION

The concept of adaptive successive step-size regulation h a s a l r e a d y been set f o r t h . In 1231 a way of determining t h e instant of t h e step-size variation w a s sug- gested. Central to i t was t h e simplest scheme of averaging of t h e preceding subgra- dients. This method i s e a s y to implement and e f f e c t s a saving in computer memory capacity. Compared t o t h e program regulation, t h e adaptive regulation improves convergence of t h e s u b g r a d i e n t methods.

Description of Algorithm 1

Let z 0 b e a n a r b i t r a r y initial point, b

>

0 b e a constant, itk j, [ r k j b e number sequences such t h a t ck

>

0, tk -4 0, rk

>

0, rk -+ 0. P u t s

=

0, j

=

0 , k

=

0, L O

=

E aj'(zO).

S t e p 1. Construct

S t e p 2. If j'(zS +') >j'(zO)

+

b, t h e n s e l e c t zS E Iz :j'(z) zSj'(zO)j and g o to S t e p 5.

S t e p 3. Define

S t e p 4 . 1f

lies +41 >

c k , then s

=

s

+

1 and g o t o s t e p 1.

S t e p 5. S e t k

=

k

+

1, j

=

s

+

1, s

=

s

+

1 and g o t o S t e p 1.

(9)

THEOREM 1.1 A s s u m e t h a t t h e problem ( 8 ) i s s o l v e d b y a l g o r i t h m 2. T h e n a l l Limit p o i n t s of t h e s e q u e n c e [ z S

1

belong t o

x*.

PROOF Denote t h e instants of step-size variations by s m . Let us p r o v e t h a t t h e step-size rk v a r i e s a n infinite number of times. Suppose i t i s not s o , i.e., t h e s t e p - size does not v a r y s t a r t i n g from a n instant s, and i s equal r , . Then t h e points z S f o r s 5 s, belong t o t h e set

and are r e l a t e d by

Considering t h a t t h e step-size does not v a r y , llesll

>

E ,

>

0 f o r s

r

s , . In passing t o t h e limit by s -4 in t h e inequality

we obtain a contradiction in t h e boundedness of t h e set

The f u r t h e r proof of Theorem 1.1 amounts to checking t h e g e n e r a l conditions of algorithm convergence d e r i v e d by E.A. Nurminskij [17].

NURMINSKIJ THEOREM Let t h e s e q u e n c e lzS

1

a n d t h e s e t of s o l u t i o n s

X *

be s u c h t h a t t h e f o l l o w i n g c o n d i t i o n s a r e s a t i s f i e d : D l . F o r any sequence [ z s k

1

such t h a t

D2 There e x i s t s t h e closed bounded set S such t h a t

D3 F o r any subsequence [ z nk

1

such t h a t

(10)

t h e r e e x i s t s co

>

0 s u c h t h a t f o r all 0

<

E S ro and a n y k inf m : [IIzm -znkII

>

rj = m k

< =

I.

m " ' k

D4 The continuous function W ( z ) e x i s t s s u c h t h a t f o r a n a r b i t r a r y subsequence [ z nk j s u c h t h a t

and f o r t h e subsequence [ z m k j c o r r e s p o n d i n g to i t by condition D3 f o r a n a r b i - t r a r y 0

<

E S r0

D5. The function W ( z ) of condition D4 assumes no more t h a n c o u n t a b l e number of values on t h e set x*.

Then a l l limiting points of t h e s e q u e n c e [ z S

1

belong to x*.

Select t h e function f ( z ) as t h e function W ( z ) . Conditions Dl, D5 are s a t i s f i e d in view of t h e algorithm s t r u c t u r e and t h e e a r l i e r assumptions.

The rest of t h e conditions will b e verified by t h e following scheme. W e will p r o v e t h a t conditions D3, D4 hold t h e points being t h e i n n e r p o i n t s of t h e set

s

=

[ z : f ( z ) 3 f ( z O )

1.

I t is t h e r e w i t h obvious t h a t max W ( z )

<

inf W ( z )

t ES

Then t h e s e q u e n c e [ z S

1

f a l l s outside t h e set S only finite number of times. Conse- quently, condition D2 is s a t i s f i e d a n d t h i s automatically e n t a i l s t h e validity of D3 a n d D4.

S o , let t h e s u b s e q u e n c e [znpj e x i s t s such t h a t znp --, z ' x*. Assume at t h i s s t a g e of t h e proof t h a t z ' E int S. W e will p r o v e t h a t t h e r e e x i s t s ro

>

0 s u c h t h a t f o r all 0

<

E 3 r0 at a n a r b i t r a r y p:

Now s u p p o s e condition (2.1) i s not s a t i s f i e d , t h a t i s , f o r a n y r

>

0 t h e r e e x i s t s n p such t h a t l1zs

-

zn41 3 r f o r all s

>

np.

(11)

W e have

f o r sufficiently l a r g e n p and s

>

n,,. By the supposition 0 Z B f ( x ' ) . By v i r t u e of t h e closedness, convexity and u p p e r semi-continuity of t h e many-valued mapping a f ( x ) t h e r e e x i s t s E

>

0 such t h a t 0

=

conv G q c ( z '), where conv

1.1

i s a convex hull and G 4 r ( x ' ) i s a set

I t i s easily s e e n t h a t E

>

0 can b e always s e l e c t e d in s u c h a way t h a t U I e ( x ') C int S, where ( z)

=

x : z

-

x

1

5 4

.

Let 6

=

min

11; 11,

f E conv G 4,(x '). Obviously 6

>

0. As ek

-

0, t h e r e e x i s t s a n i n t e g e r

L(6)

s u c h

t h a t f o r k 2 K ( 6 ) we have S r ) / Z . P u t n p 2 K ( 6 ) . Then i t i s readily s e e n t h a t f o r s 2 n p t h e step-size r k can v a r y no m o r e t h a n once within t h e set U I c ( z ' ) . Ex- amine t h e sequence IsS

1

s e p a r a t e l y on t h e intervals n p 5 s

<

s p

*

, where

s;

=

min sm : sm a n p ' .

When n p S s

<

s p t h e points z S are r e l a t e d as follows

where t h e index L i s r e c o n s t r u c t e d with r e s p e c t to s p

.

Let us consider t h e s c a l a r p r o d u c t s

where z np

= grip,

(12)

S i n c e z s E conv G q , ( z ' ) , s 2 n p , i t i s possible t o p r o v e t h a t N 1 N 1 + l

( z , g ) 2 y , y = 1 / 2 l 9 ~ . Thus,

We n e x t c o n s i d e r t h e scalar p r o d u c t s

ds

=

( z N 1 + l

- z s ,

g s ) = r 1 ( s - N l - l ) ( z s - l , g s ) , where s 2 N1

+

1.

N e N e + l

The index N2 e x i s t s s u c h t h a t ( z , g ) 2 y and d N e + l h r l ( N 2

-

Then in a s i m i l a r way we c a n p r o v e t h e e x i s t e n c e of indices Nt

( t

2 3 ) s u c h t h a t

I t i s e a s y t o p r o v e t h a t Nt +

-

Nt S N

<

=,

t =

1, 2,.

. . .

Let Nt b e t h e maximal of indices Nt t h a t d o e s not e x c e e d s;

.

Then

Since s p

-

Nt0 5 N , t h e n with p --r

=

t h e l a s t t e r m on t h e right-hand side of t h e inequality

a p p r o a c h e s z e r o . W e finally o b t a i n

where E; -+ 0 with p' --r w.

I t i s not difficult t o notice t h a t t h e reasoning which u n d e r l i e s t h e d e r i v a t i o n of inequality (2.2) may b e also r e p e a t e d without c h a n g e s f o r t h e i n t e r v a l s L s p t o I

g e t

f ( z r n )

-

f ( z n p )

s -

7 1

-

s;) y

+

c;

Adding (2.2) t o (2.3) we o b t a i n

(13)

In passing t o t h e limit by m --, = in inequality ( 2 . 4 ) we are l e d t o a c o n t r a d i c t i o n with r e s p e c t t o t h e boundedness of continuous function on t h e closed bounded set U q t ( x d ) . Consequently, condition ( 2 . 1 ) i s p r o v e d .

Let

m p

=

inf m : l / x m

-

x n

PI\ >

r

.

m > n p

By s t r u c t u r e xmp

F

u , ( x n p ) , b u t f o r sufficiently l a r g e p

All t h e reasoning involved in d e r i v a t i o n of inequality ( 2 . 4 ) r e m a i n s valid f o r t h e in- s t a n t m p , t h a t is,

we h a v e

In passing to t h e limit by p --, -we g e t

-

lim w(xmp)

<

lim w ( a n p )

.

P - - P - -

The f u r t h e r proof of t h i s t h e o r e m follows from t h e Nurminskij theorem.

To fix more p r e c i s e l y t h e i n s t a n t when t h e i t e r a t i o n p r o c e s s g e t s i n t o t h e neighborhood of t h e solution we c a n employ t h e following modification of algorithm 1 provided t h e c o m p u t e r c a p a c i t y allows.

(14)

Let z 0 be a n a r b i t r a r y initial point, d

>

0 be a constant, [ E

1 ,

~Irk

1

b e number sequences such t h a t ck

>

0, ck 4 0 , r k

>

0 , rk 4 0 ; k l , k 2 ,

. . . , k,

b e i n t e g e r positive bounded constants.

P u t s

=

0, j

=

0 , k

=

0 , e0

=

g o E B f ( z O ) . S t e p 1 Construct

S t e p 2 If f ( z S f l ) > f ( z O ) + d , t h e n ~ ~ + ~ ~ [ z : f ( z ) ~ f ( z ~ ) ] and go t o S t e p 5.

S t e p 3 Define

"0s

+ 1

=

s + l e,S

+

1 g s + l

s - j + 2 s - j + 2

Each of t h e notations P i ( - ,

.,

-) designates a n a r b i t r a r y convex combination of a finite number of t h e indicated preceding subgradients.

Find

-

min IIe; +l

I I .

LLs + l - o s p s m

S t e p 4 If p s + l

>

E ~t h e n s , = s + l a n d g o t o S t e p l .

S t e p 5 S e t k = k + 1 , j = s +1, s = s + 1 , e S = g S a n d g o L o S t e p 1 .

THEOREM 2.1 m p p o s e t h a t the problem (*) is solved b y the modiJ%ed algo- r i t h m I . Then all l i m i t points of the sequence iz

1

belong to

x*.

(15)

3. METHODS WITH AVERAGING OF SUBGRADIENTS AND PROGRAM-ADAPTIVE SUCCESSIVE STEP-SIZE REGULATION

S u c c e s s i v e Step- Size R e g u l a t i o n

A s noted in a number of works [Z, 3, 1 2 , 161 i t i s expedient t o a v e r a g e subgra- dients calculated at t h e previous iterations s o t h a t t h e s u b g r a d i e n t methods will b e more r e g u l a r . F o r instance, when t h e "ravineu-type functions are minimized, t h e a v e r a g e d direction points t h e way along t h e bottom of t h e "ravine".

I t will b e demonstrated in Section 5 t h a t t h e operation of averaging enables t h e improvement of a p o s t e r i o r i estimates of t h e solution a c c u r a c y along with t h e upgrading of r e g u l a r i t y of t h e d e s c r i b e d methods.

Methods with averaging of subgradients and consecutive p r o g r a m a d a p t i v e re- gulation of t h e step-size are set f o r t h in t h i s section.

Results obtained h e r e stem from [24].

Description of Algorithm 2.

Let z 0 b e a n a r b i t r a r y initial approximation;

3 > o

be a constant; Irk

1,

i r k j b e number sequences such t h a t

P u t s

=

0, j = 0 , k = 0 ,

S t e p 1 Construct

S t e p 2 If f ( z S + I )

>

f ( z O )

+ 8,

then g o t o S t e p 7.

S t e p 3 Define v S according t o t h e schemes a ) o r b).

S t e p 4 ~ o n s t r u c t e ~ + ~ = e ~ + ( s - j + ~ ) - l ( v ~ + ~ - e ~ ) . S t e p 5 1f \ l e S + 4 1

>

el:, t h e n s = s + l a n d g o t o S t e p l .

S t e p 6 S e t k = k + 1 , j = s + 1 , s = s + 1 , e S = v S a n d g o t o S t e p 1 .

S t e p 7 ~ e t z ~ + ~ E i z : f ( z ) ~ f ( z ~ ) ] , s = s + l , j = s , k = k + l a n d g o t o S t e p 1.

(16)

In construction of t h e direction v S t h e following schemes of subgradient averaging a r e dealt with.

a ) The "moving" a v e r a g e . Let

K +

1 be a n integer. Then

where gi E a f ( z i ), hi,

+

0.

b) The "weighted" a v e r a g e . Let M

+

1 be a n integer. Then

v S = g S + h S ( v S - l - g S ) , where O S h s S 1 f o r s f 0 (mod M), 0 S As S

6

<1 f o r s = O (modM).

THEOREM 3.1 Assume that t h e probLem (*) is solved by aLgorithm 2. Then a L L l i m i t p o i n t s of t h e sequence [zS j belong to t h e s e t

x*.

4. STOCHASTIC FINITE-DIPFEENCE ANALOGS TO ADAPTIVE NONBEONOTONIC METHODS WITH AVERAGING OF SUBGBADIENTS

I t should b e emphasized t h a t t h e p r a c t i c a l value of t h e subgradient-type methods essentially depends upon t h e existence of t h e i r finite-difference analogs.

Of g r e a t importance t h e finite-difference methods a r e primarily in situations when subgradient computation programs a r e unavailable. This generally o c c u r s in t h e solution of large-scale problems. Construction of t h e finite-difference methods in t h e nonsmooth optimization originated two approaches: t h e nondeterministic and t h e stochastic ones. Each of them h a s i t s own advantages and disadvantages. The stochastic a p p r o a c h i s favored h e r e .

One of t h e advantages of t h e introduced averaging operation i s t h e f a c t t h a t t h e construction of stochastic analogs t o subgradient methods p r e s e n t s no special problems.

The offered methods a r e close t o those with smoothing [4] which, in t h e i r t u r n , a r e closest to t h e schemes of stochastic quasi-gradient methods [IZ]. R e s e a r c h into t h e stochastic quasi-gradient methods with successive step-size regulation i s quite a new and underdeveloped field. Ju. M. Ermol'jev s p u r r e d f i r s t t h e investiga- tions in this direction. His and Ju. M. Kaniovskij r e s u l t s [13] a r e undoubtedly of

(17)

t h e o r e t i c a l i n t e r e s t . However implementation of methods described in [14] c r e a t e s complications as t h e r e is no r u l e to regulate variations in t h e step-size.

Let us f i r s t dwell on functions f ( x , i ) of t h e form

where ai

>

0.

P r o p e r t i e s of t h e functions f ( x , i ) have been studied by A.M. Gupal [4]

proceeding from t h e assumption t h a t f ( z ) satisfies t h e Lipschitz local condition.

THEOREM 4.1 f!, f ( z ) is a convez e i g q f b n c t i o n , dom f

=

E" , t h e n f ( z , i ) is a l s o a convez eige@unction, dom f ( z , i )

=

E n , f o r a n y ai

>

0.

THEOREM 4.2 A s e q u e n c e of j b n c t i o n s f ( z , i ) u n ~ r m l y converges to f ( s ) w i t h ai -+ 0 i n a n y b o u n d e d d o m a i n X.

Now we shall g o t o t h e description of stochastic finite-difference analogs t o algorithms with successive program-adaptive regulation of t h e step-size and with averaging of t h e direction.

D e s c r i p t i o n of Algorithm 3 Let s o b e a n a r b i t r a r y initial approximation, b

>

0

be a constant, [ t i j, [ t i j, [ai

1,

Ipi j be number sequences.

P u t s

=

0, i

=

0, j

=

0.

S t e p 1 Compute 1

"

<s

=- (f(si8

. - . , ~ t + Q i ,

. .

. , X n ) 'IS 2 a i I:

=

1

where S;, k

=

1 ,

-

n are independent random values distributed uniformly on inter- vals [zi

-

a i , z;

+

a i l , ai

>

0.

S t e p 2 Construct e S in compliance with t h e schemes a ) and b), where t h e subgradients a r e r e p l a c e by t h e i r stochastic estimates.

S t e p 3 F i n d s S + ' = z S - t i e S .

S t e p 4 I f f ( z S + ' ) > f ( z 0 ) + b , t h e n g o t o S t e p 9 .

(18)

S t e p 5 ~ e f i n e z ~ " = z S + ( s - j + l ) - ' ( e s - z S ) S t e p 6 If s

-

j

<

p i , then s

=

s

+

1 and g o t o S t e p 1 . S t e p 7 ~ f I l z ~ + ~ I I > t ~ , t h e n s = s + l a n d g o t o S t e p l . S t e p 8 P u t i = i + l , j = s + l , s = s + l a n d g o t o S t e p l .

S t e p 9 ~ e t z ~ + ~ € I z : f ( z ) S f ( z 0 ) ] , j = s + 1 , i = i + 1 , s = s + l a n d g o t o S t e p 1.

THEOREM

4.3 Let t h e problem (*) be solved by a l g o r i t h m 3 a n d t h e n u m b e r se- q u e n c e s

satisfy t h e following conditions

Then almost f o r all o t h e sequence f ( z S (o)) converges and a l l Limit points of t h e sequence [ z S ( a ) j belong t o t h e set of solutions

x*.

Theorem 4.3 is proved in detail in [25].

5.

A POSTERIORI ESTIMATES OF ACCURACY OF SOLUTION TO ADAPTIYE SUBGRADIENT METHODS

AND

THEIR STOCHASTIC

FINITE-DIFFERENCE ANALOGS

In numeric solution of extremum problems of nondifferentiable optimization s t r o n g emphasis i s placed on t h e c h e c k of obtained solution accuracy. Given t h e solution a c c u r a c y estimates, f i r s t , a v e r y efficient r u l e of algorithm stopping c a n b e formulated, second, t h e obtained estimates c a n form t h e basis f o r justified con- clusions with r e s p e c t t o t h e s t r a t e g y of selection of algorithm parameters.

(19)

Using r a t h e r simple p r o c e d u r e a p o s t e r i o r i estimates of solution a c c u r a c y for t h e introduced adaptive algorithms are c o n s t r u c t e d h e r e . The estimates provide a means f o r s t r i c t l y evaluating efficiency of t h e averaging o p e r a t i o n use.

Thus, assume t h a t t h e convex function minimization problem

i s being solved.

Suppose t h e set

X*

contains only one point x

*.

To solve t h e problem (0) consider algorithm 1. The spin-off from t h e proof of theorem 2.1 i s t h e proof t h a t t h e sequence l x S j falls outside t h e set

lx : p ( x ) 5 f ( x O )

+ 61

a finite number of times only. T h e r e f o r e ,

c

2 0 e x i s t s such t h a t f o r s 2 ?

Then t h e s t e p size will v a r y only if t h e condition l i e s

+'I1

5 r k is satisfied, where

Without loss of generality w e will assume t h a t t h e f i r s t instant of t h e change from t h e s t e p r o to r l o c c u r r e d just because t h e condition

is satisfied.

From t h e convexity of t h e function f ( x ) i t i s i n f e r r e d t h a t

Summation of inequalities (5.1), (5.2),

. . .

(5.3) yields

(20)

Denote the expression ( s o

+

I)-'

C ; O ~

x i

-

x O ) by A,,.

W e have obvious inequalities

where with s o d s d s l the points x S a r e related by x S +

' =

x

-

r ' g S . For these

values of s it i s possible t o derive that

s o + 1

l x : j ' ( x ) d min l f ( z ),

. . .

, f ' ( x s ' ) ] ]

.

Thus, f o r s k + I d s d ~ ~ + have ~ w e

where

22;'

E i x : J ( x > 5 min [ p ( x S k + l ) ,

. . .

, j ( x S k + l ) ] ] ,

(21)

I t i s easily proved t h a t Ak 4 0.

THEOREM 5.1 A s s u m e that t h e p r o b l e m (*) is s o l v e d b y a l g o r i t h m 2. T h e n t h e i n e q u a l i t i e s

hold f o r s u c h instants sk at which t h e step-size v a r i e s because t h e condition lleskIl S ,rk i s satisfied.

REMARK I t follows from theorem 5.1 t h a t t h e s a m e estimate o c c u r s both f o r t h e subsequence of "records"

11 2,

{ a n d f o r C e s a m subsequence

(8"

{.

Let t h e problem (*) b e solved by algorithm 2 where t h e o p e r a t i o n of averaging of proceeding s u b g r a d i e n t s i s used. Denote instants of changes in t h e step-size by s i , i

=

0 , 1, 2,

....

Suppose t h e f i r s t instant of t h e change from r o to r l t a k e s place because t h e inequality lleSolI S E O holds. Examine t h e scheme of averaging by

"moving" a v e r a g e . W e have

gS

s p *

+

( g " , 2 s - Z e ) ,

s

Designate t h e e x p r e s s i o n

C

X i , by

j ' .

i = O

Then

Whence f o r s

s

K w e have

(22)

F o r s

>

K w e s h a l l h a v e

Thus,

From t h e formula

t h e following recommendations c a n b e o f f e r e d with r e s p e c t t o t h e s e l e c t i o n of p a r a m e t e r s Xi,, :

(2) min

5

X i , s ( g i , x i - x O ) ,

f:

X i , s

= I

h , S * O i

= o

i = o

The s u b g r a d i e n t a v e r a g i n g t h e r e b y allows improving a p o s t e r i o r i estimates of t h e solution a c c u r a c y . This may s u b s t a n t i a t e formally t h a t i t is of a d v a n t a g e to in- t r o d u c e a n d s t u d y t h e o p e r a t i o n of s u b g r a d i e n t a v e r a g i n g .

F o r a n a r b i t r a r y i n s t a n t of step-size v a r i a t i o n s f

>

K we c a n e a s i l y o b t a i n t h e estimate

THEOREM 5.1 Let t h e problem ( a ) be solved b y algorithm 2 w i t h t h e u s e of averaging scheme a). Then for t h e i n s t a n t s s f , for w h i c h

11

es'l

1

5 c i , i n e q u a l i t y (5.9) holds. The scheme of averaging b y "weighted" average b) i s treated in a

(23)

similar way.

The a p o s t e r i o r i estimates of t h e solution a c c u r a c y attained f o r t h e adaptive subgradient methods c a n b e extended t o t h e i r s t o c h a s t i c finite-difference analogs with t h e minimum of a l t e r a t i o n s . The way of getting them is illustrated with algo- rithm 3 . We will use notations introduced in Section 4 . When proving theorem 4.3 i t is possible t o demonstrate t h a t t h e step-size r f v a r i e s an infinite number of times.

A s algorithm 3 converges with a probability of unity, t h e n f o r almost all o i t i s pos- sible t o indicate E(o) such t h a t with s 2

T h e r e f o r e , with s 2 E(o) t h e step-size r f v a r i e s because t h e condition

holds, where sf 2 pi

+

j , zs'

=

zs'

-' +

(sf

-

j ) l ( # s '

-

z

'

) sequences Itf )

and Ipf

1

comply with p r o p e r t i e s formulated in theorem 4.3, j is r e c o n s t r u c t e d by

S f .

Consider t h e e v e n t

where st is t h e instant of step-size change t h a t p r e c e d e s s f . There e x i s t s t h e constant 0

<

c

<

such t h a t with t h e probability g r e a t e r than 1

-

Cdi i t i s possi- ble t o state t h a t

Then f o r t h e instant si t h e inequality

holds with t h e s a m e probability.

(24)

Theorem 5.3 i s readily formulated and proved. Assume t h a t t h e problem (*) i s solved by algorithm 3. Then f o r almost all w i t i s possible t o isolate a subsequence of points jxs'(w)j f o r which with t h e probability g r e a t e r t h a n 1

-

C bi t h e inequal- ities hold

where

f i Y l =

min f (x , i

-

I ) ,

2 €En

x i Y l E Argmin f (x , i

-

1 )

.

BEFERENCES

Ajzerman, M.A., E.M. Braverman and L.I. Rozonoer: Potential Functions Method in Machine Learning Theory. M.: Nauka, 1970, p. 384.

Glushkova, O.V. and A.M. Gupal: About Nonmonotonic Methods of Nonsmooth Function Minimization with Averaging of Subgradients. Kibernetika, 1980, No.

6, pp. 128-129.

Gupal, A.M. and L.G. Bazhenov: Stochastic Analog t o Conjugate Gradient Method. Kibernetika, 1972, No. 1, pp. 125-126.

Gupal, A.M.: Stochastic Methods of Solution of Nonsmooth Extremum Problems.

Kiev: Naukova dumka, 1979, p. 152.

Dem'janov, V.F. and V.N. Malozemov: Introduction t o Minimax. M. : Nauka, 1972, p. 368.

Eremin, 1.1.: The Relaxation Method of Solving Systems of Inequalities with Convex Functions on t h e Left Side. Dokl. AN SSSR, 1965, Vol. 160, No. 5 , pp.

994-996.

Ermol'ev, Ju.M.: Methods of Solution of Nonlinear Extremum Problems. Kiber- netika, 1966, No. 4, pp. 1-17.

Ermol'ev, Ju.M. and N.Z. Shor: On Minimization of Nondifferentiable Functions.

Kibernetika, 1967, No. 1, pp. 101-102.

Ermol'ev, Ju.M. and Z.V. Nekrylova: Some Methods of Stochastic Optimization.

Kibernetika, 1966, No. 6 , pp. 96-98.

Ermol'ev, Ju.M.: On t h e Method of Generalized Stochastic Gradients and Sto- chastic Quasi-Fejer Sequences. Kibernetika, 1969, No. 2 , pp. 73-83.

Ermol'ev, Ju.M.: On One General Problem of Stochastic Programming. Kiber- netika, 1971, No. 3 , pp. 47-50.

Ermol'ev, Ju.M. : Stochastic Programming Methods. M. : Nauka, 1976, p. 240.

(25)

Ermol'ev, Ju.M. and Ju.M. Kaniovskij: Asymptotic P r o p e r t i e s of Some Stochas- t i c Programming Methods with Constant Step-Size. Zhurn. Vych. Mat. i Mat.

Fiziki, 1979, Vol. 19, No. 2, pp. 356-366.

Kaniovskij, Ju.M., P.S. Knopov and Z.V. Nekrylova: Limit Theorems f o r Sto- c h a s t i c Programming. Kiev: Naukova dumka, 1980, p. 156.

Loev, hi.: Probability Theory. M.: Izd-vo inostr. lit., 1967, p. 720.

Norkin, V.N.: Method of Nondifferentiable Function Minimization with Averag- ing of Generalized Gradients. Kibernetika, 1980, N o . 6, pp. 86-89, 102.

Nurminskij, E.A.: Convergence Conditions f o r Nonlinear Programming Algo- rithms. Kibernetika, 1973, N o . l, pp. 122-125.

Nurminskij, E.A. and A.A. Zhelikovskij: Investigation of One Regulation of S t e p in Quasi-Gradient Method f o r Minimizing Weakly Convex Functions. Kiberneti- k a , 1974, No. 6, pp. 101-105.

Poljak, B.T.: Generalized Method of Solving Extremum Problems. Dokl. AN SSSR, 1967, Vol. 174, No. 1, pp. 33-36.

Poljak, B.T.: Minimization of Nonsmooth Functionals. Zhurn. vychisl. mat. i mat. fiziki, 1969, Vol. 9, No. 3, pp. 509-521.

Tsypkin, Ja.Z.: Adaptation and Learning in Automatic Systems. M.: Nauka, 1968.

Tsypkin, Ja.Z.: Generalized Learning Algorithms. Avtomatika i telemekhanika, 1970, No. 1, pp. 97-103.

Chepurnoj, N.D.: One Successive Step-Size Regulation f o r Quasi-Gradient Method of Weakly Convex Function Minimization. Collection: Issledovanie Operacij i ASU. Kiev: Vyshcha shkola, 1981, No. 19, pp. 13-15.

Chepurnoj, N.D.: Averaged Quasi-Gradient Method with Successive Step-Size Regulation t o Minimize Weakly Convex Functions. Kibernetika, 1981, No. 6, pp.

131-132.

Chepurnoj, N.D.: One Successive Step-Size Regulation in Stochastic Method of Nonsmooth Function Minimization. Kibernetika, 1982, No. 4, pp. 127-129.

S h o r , N.Z.: Application of Gradient Descent Method f o r Solution of Network Transportation Problem. In: Materialy nauchnogo seminara p o prikladnym voprosam kibernetiki i issledovanija o p e r a c i j . Nauchnyj sovet p o kibernetike IK AN USSR, Kiev, 1962, vypusk 1, pp. 9-17.

S h o r , N.Z.: Investigation of S p a c e Dilation Operation in Convex Function Minimization Problems. Kibernetika, 1970, N o . 1, pp. 6-12.

S h o r , N.Z. and N.G. Zhurbenko: Minimization Method Using S p a c e Dilation, in t h e Direction of Difference of Two Successive Gradient. Kibernetika, 1971, No. 3, pp. 51-59.

S h o r , N.Z.: Nondifferentiable Function Minimization Methods and Their Appli- cations. Kiev, Nauk. dumka, 1979, p. 200.

Demjanov, V.F.: Algorithms f o r Some Minimax Problems. Journal of Computer a n d Systems Science, 1968, 2, No. 4, pp. 342-380.

Lemarechal, C. : An Algorithm f o r Minimizing Convex Functions. In: Information Processing'74 /ed. Rosenfeld/, 1974, North-Holland, Amsterdam, pp. 552-556.

Lemarechal, C.: Nondifferentiable Optimization: Subgradient and Epsilon Subgradient Methods. L e c t u r e Notes in Economics and Mathematical Systems /ed. Oettli W./, 1975, 117, S p r i n g e r , Berlin, pp. 191-199.

(26)

33 B e r t s e k a s , D.P. a n d S.K. Mitter: A Descent Numerical Method f o r Optimization Problems with Nondifferentiable Cost Functions. SIAM J o u r n a l on Control, 1973, 11, No. 4, pp. 63'7-652.

34 Wolfe, P.: A Method of Conjugate S u b g r a d i e n t s f o r Minimizing Non- d i f f e r e n t i a b l e Functions. In: Nondifferentiable Optimization /eds. Balinski M.L., Wolfe P./, Mathematical Programming Study 3, 1975, North-Holland, Am- s t e r d a m . pp. 145-1'73.

Referenzen

ÄHNLICHE DOKUMENTE

Das Rote Kreuz Baselland und die Ökumenische Koordinationsstelle Palliative Care BL schliessen sich in der Freiwilligenbegleitung zusammen. Palliative Care beginnt da, wo Menschen

Im Finanzplan 2021 – 2025 schreibt die Ge- meinde selbst: «Aufgrund der Investitionen in die Schulliegenschaften werden sich die Auswirkungen ab 2026 auf die Erfolgsrech- nung

Wenn sich wie hier und heute Kompetenzträger wie die Firma Solaris und Fronius vernetzen, um dieses evolutionäre Pilotprojekt voranzutreiben und zu evaluieren, ist das

Wir setzen umfassende Maßnahmen am Arbeitsmarkt, etwa im Rahmen des kürzlich unterzeichneten ‚Pakt für Arbeit und Qualifizierung‘, die Investitionsprämie des Bundes wird

Natur- und Artenschutz sind die gelebte Verantwortung für unsere Heimat und müssen daher langfristig mit Weitsicht und Hausverstand gedacht werden.. Die Erhaltung

Ärztinnen und Ärzte aus etwa 600 Ordinationen haben sich bereit erklärt, Impfungen durchzuführen und werden dafür gezielt auf ihre Patientinnen und Patienten, auf welche

„Gerade im Sommer bieten die öffentlichen Badeplätze – sowohl des Landes Oberösterreich als auch das vielfältige Angebot der Bundesforste und der Gemeinden –

Littering bedeutet auch eine große finanzielle Belastung: für die Reinigung von Straßen und öffentlichen Plätzen werden rund 3 Millionen Euro pro Jahr in Oberösterreich