Optimization of Functionals Which Depend on Distribution Functions: 1. Nonlinear Functional and Linear Constraints

(1)

NOT FOR QUOTATION WITHOUT PERMISSIOE;

OF THE AUTHOR

OPTIM3ZATION OF FWNCTIONALS WHICH DEPEND ON DISI'RIBUTION FUNCIIONS: 1. NONLINEAR FUNCTIONAL AND

LWEAR

CONSiTAUWS

k Gaivoronski

November 1983 WP-83-114

Working Papers a r e interim reports on work of t h e International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of t h e Institute or of its National Member Organiza- tions.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg. Austria

(2)

PREFACE

The development of optimization techniques for solving complex decision problems u n d e r uncertainty is c u r r e n t l y a major topic of research in t h e Sys- t e m a n d Decision Sciences Area a t IIASA, and in t h e Adaptation and Optimiza- tion g r o u p in particular.This paper deals with methods for t h e solution of problems in which t h e objective function depends on probability distributions.

Such problems a r e common in reliability theory and various o t h e r branches of operations r e s e a r c h , but methods for dealing with t h e m have only recently begun t o emerge.

ANDRZEJ WIERZBICKl C h a i r m a n

S y s t e m a n d Decision S c i e n c e s

(3)

ABSTRACT

The main purpose of this paper is to discuss numerical optimization pro- cedures for problems in which both t h e objective function and t h e constraints depend on d s t r i b u t i o n functions. The objective function is assumed to be nonlinear a n d t o have directional derivatives, while t h e constraints a r e taken as linear. The proposed algorithm involves linearization of t h e objective function a t t h e c u r r e n t point and solution of a n auxiliary linear subproblem. This last problem is solved using duality relations and cutting-plane techniques.

(4)

OPTIMIZATION OF FUNCTIONALS P33ICH DEPEND 04.;' C?TSI?EIBUTZON FLTNCTIONS: 1. NONLINEAR FUNCTIONAL AND LINEAR CQNSTmdTS

A. Gaivoronski

1. INTRODUCTION

The purpose of this paper is to present numerical methods for solving optimization problems in which both t h e objective function and t h e constraints depend on distribution functions. Such problems occur in stochastic programming [1,2], reliability theory and various branches of operations research ( surveyed in [3] ), and robust statistics [4] , among others

.

If both the objective function and the constraints a r e linear with respect to t h e distribution functions then the problem can be s t a t e d as follows:

min

f

(x ) d ~ ( x )

X

where

X

is a s e t in Euclidean space Rn

.

Some specific cases of problem (1)- (3) arise in the theory of Markovian moments and can be solved analytically

(5)

[5]. However, t h e success of analytical methods is very limited even in t h e linear case (1)-(3) and therefore numerical algorithms a r e needed. Such methods began t o appear in the middle of the last decade.

One idea is t o approximate set X by a sequence of finite subsets

X1.X2, . . . , &

,... , where

This sequence of s e t s has t h e property t h a t

a s s t e n d s t o infinity, i.e., t h e g r e a t e s t distance between points in s e t X and s e t

X,

t e n d s to zero. Let G, be t h e s e t of all distribution functions which correspond to probabilistic measures concentrated in points from

4

1 n ns

G,=[(x,. p l )

. ^{. .}

^.

^,

^{. C p i}

⁼

¹^{. P i}²⁰¹

^.

i =l

If we include one more constraint.

problem (1)-(4) becomes a finite-dimensional linear programming problem;

t h i s raises t h e possibility of approximating t h e original problem (1)-(3) by a s e q u e n c e of finite-dimensional problems (1)-(4). This idea was explored in [3,6], where i t was also used t o solve nonlinear and minimax problems involv- ing distribution functions. However, this approach can be used only for s e t s X of small dimension ( in fact not exceeding t h r e e ) due t o t h e high dimensional- ity of t h e associated linear programming problems.

Another possible way of solving (1)-(3) is based on t h e duality relations between problem (1)-(3) a n d some finite-dimensional minimax problem. First

(6)

proposed in [ I ] , this idea was taken further in [ 2 , 7 ] , where methods of t h e cutting-plane type a r e given.

The purpose of this paper is t o use this last approach t o develop solution techniques for nonlinear problems of t h e kind :

min +(H) subject to

where G is defined in t h e following way:

We propose an analogue of t h e linearization ( Frank - Wolfe ) method in which we a r e required to solve subproblems of type (1)-(3). It appears t h a t it is not necessary t o solve these subproblems precisely: duality relations make i t possible t o utilize rough solutions of (1)-(3) so t h a t only a limited number of cal- culations a r e needed a t each iteration.

2.

CHARAC-TION OF OPTIMAL DISllUBUTIONS

We shall use t h e s a m e l e t t e r , say H , t o denote both t h e distribution function and t h e underlying probabilistic measure, where this will not cause con- fusion. For a given probabilistic measure H we shall denote by B+ the collection of all subsets of X with positive measure H , a n d by dom H t h e support s e t of H , i.e.,

Let u s first introduce t h e class of functionals considered, + ( H ) . What we actually need i s some analogue of directional differentiability. Suppose t h a t

(7)

Q ( H * + ~ ( H - H * ) ) = + ( H * ) + a

f

^f^'(z,,Li")d(hr(z)-hr*(s)) +.r(a.H*,hr) _{( 8 )} X

where

r ( a , H * , H ) = o ( a )

In what follows we a s s u m e t h a t functions f O ( X , H ) , f i ( x ) a r e such t h a t expressions (7) a n d (8) a r e meaningful.

The following simple conditions a r e necessary, a n d in t h e convex c a s e also sufficient, for distribution H t o be a solution of problem (5)-(7) :

Lemma 1

If + ( H e ) r +(H) for some H * E G a n d all H E G t h e n

Proof

The proof is of t h e traditional type for n e c e s s a r y conditions. Note t h a t z * g f f O ( z . ~ * ) d ~ * ( z )

X

i s always t r u e . Suppose, on t h e contrary, t h a t t h e r e exists a y>O s u c h t h a t z

*-

f f O ( z , ~ * ) d ~ * ( z ) ^G-27

X

Then t h e r e exists a n

H

^EG s u c h t h a t

f f

O ( ~ . H * ) H ( Z ) - f f O ( ~ . ~ * ) d ~ * ( ~ )

<

7

X X

Consider now distributions Ha :

H a = a H + ( l - a ) H *

.

According t o ( 8 ) ,

*(Ha)

=

.I.(H*)+~

f ^f

O(z . ~ * ) d ( H ( z ) - ~ ' ( z ) ) + T ( ~ , H * . H ) X

a n d for small a we obtain

(8)

which contradicts t h e optimelity of H *

.

This completes the prcof.

Remark. I f , additionally, . k ( H ) is convex, i.e.,

then this l e m m a also gives sufficient conditions for a global minimum.

Lemma 1 implies t h a t to check the necessary conditions for problem ( 5 ) - ( 7 ) requires solution of a linear problem of t h e form ( 1 ) - ( 3 ) . where q ^{( 2 )}

=

f ' ( 2 , ~ ) . The solution of problem ( 1 ) - ( 3 ) can be characterized through the duality relations summarized in t h e following theorem, which was proved in [ 7 ] .

Theorem 1

Suppose t h a t t h e following assumptions a r e satisfied:

1. Set X is compact and functions f i ( z ) , i = O,m are continuous on X

.

2. c o Z # $ where

Then

1. A solution of problem ( 1 ) - ( 3 ) exists and t h e optimal value o f f ' ( z ) is equal t o t h e optimal value of t h e following minimax problem:

where I/+

= t

u : u E Rm ,

q

1

Oj .

2. For any solution H* of problem ( 1 ) - ( 3 ) t h e r e exists a u * E I/+ such t h a t V ( u *)

=

max

p ( u )

, dom H *

r

X*(U *)

U E V + where

(9)

3. There exists a finite s e t

4 =

[ z l ,

.

. . , s t

1

^,t I m + l , and a solution

H *

of problem (1)-(3) such t h a t dom H *

=

Xt , i.e., t h e probabilistic measure

H * can be expressed as a collection of t < m + l pairs

t ( z l # p l )

^,

. .

^.,

( z t , p f ) j -

The probabilities jil ,

. . .

,

pf

a r e solutions of t h e following linear programming problem:

Combining Theorem 1 a n d Lemma 1 we obtain t h e following result:

Theorem 2

Suppose t h a t + ( H e )

s

+ ( H ) for all H E G and t h e following assumptions are satisfied :

1. Set X i s compact a n d functions f O ( z , H C ) , f i(z) , i

= l,m

a r e continuous o n X .

2. c o Z # $I where

Then 1. W e have

where

f

O ( z . H e ) d H * ( z )

=

max p ( u )

X U E V +

m

p ( u )

=

min (f O ( Z , H * )

+ C

u i f i ( z ) )

z EX i = l

(10)

2. There exists a u, , p ( ~ *) = ma>: p ( u ) , u E ' CJ such t h a t U E V +

dom H

c X*(U

*) where

3. NUMERICAL

METHOD FOR

SOL?TNG PROELEES WTH L I h W CON-15

I t is now possible t o construct a method which finds points satisfying t h e necessary conditions of Lemma 1 or, in t h e convex case, global minima

.

This method is of t h e linearization type.

A l g o r i t h m 1

1. Begin with an initial distribution HI.

2. Suppose we have an approximate solution HS before starting iteration number s

.

Then a t t h e s - t h iteration we do t h e following :

(i) Find a distribution

ps

^E^Gsuch t h a t

J f O ( ~ . ~ s ) d R S ( ~ ) L Z,

+ 5

X

where

z,= inf J f O ( x . ~ s ) ~ ( x ) H E G

a n d E ,

>

0 is t h e accuracy with which problem (9) is solved. I t is n o t necessary to know t h e value of E, , only t h a t E , -, 0

(ii) Check whether functional +(H) decreases in t h e direction

HS -

HS

.

If not, r e t u r n t o s t e p (i) and solve problem (9) with higher accuracy. Other- wise go to s t e p (iii).

(iii) Choose a stepsize p, : O<p,

r

1 and calculate a new approximation t o t h e optimal solution:

(11)

HS+1 = _(1-p,)HS

+ psHS

Then go t o s t e p (i).

Remark. The stepsize can be chosen according t o a n u m b e r of different rules:

- ^OD

( a ) p,

. CFs

^{= m}

s ^=O

(b) p, = a r g min \k(HS

+

a ( p - H S ) ) a20

( c ) Take a sequence a, ,where

a n d

p,

=

minlarg min \ k ( ~ ~

+

a ( H S

-

H')), a, j

.

a20 (13)

We now introduce topology on s e t G . We shall use weak ( s t a r ) convergence topology, which by definition i s t h e weakest topology in t h e s p a c e of all probability m e a s u r e s s u c h t h a t t h e m a p

H

-.

j g ( 4 W z )

is continuous wherever g(z) i s bounded a n d continuous. Note t h a t in t h i s topology t h e s e t of all m e a s u r e s with c o m p a c t support X is c o m p a c t . In t h e discussions t h a t follow we shall use t h i s topology when speaking a b o u t t h e continuity of c e r t a i n functionals with r e s p e c t t o probabilistic m e a s u r e s .

Let v ( H ' . H ~ ) denote s o m e distance between distributions ^HI a n d H2 which will i n d u c e weak ( s t a r ) topology on t h e s e t of all distributions - for example, t h e Levy-Prokhorov distance would do.

(12)

We shall now prove t h e convergence of t h e a l g o r i t l ~ m given a b o v ~ . Theorem 3

Suppose t h a t t h e following s t a t e m e n t s are t r u e : 1. X cRn is compact s e t .

2. Functional +(H) satisfies (8) where r ( a , H * . H ) ⁺

a uniformly over H* E G , H E G

.

3. Function f ' ( x , ~ ) is continuous with r e s p e c t t o X E X and satisfies t h e Lipschitz condition with respect to HEG :

for

H I

E G ^,

H2

E

G

, x E X

.

Functions f i ( z ) , i

= %

a r e continuous with respect to z E X .

4. E s + O .

5. Stepsize p, is chosen according to one of (11)-(13)

.

Then

lim inf ff o ( z , ~ s ) d ( ~ ( z ) - ~ ( ~ ) )

=

⁰

s+- HEG

and for all limit distributions H* of sequence HS inf

f

^fO ( z , ~ ) d ( ~ ( x ) - ~ s ( z ) )

=

⁰

.

HEG

x

Proof

1. Note t h a t under assumption 2 of t h e t h e o r e m s e t G is compact in weak ( s t a r ) topology a n d therefore function f O(z,H) is bounded on X x G

.

This together with (8) implies t h e continuity of functional \k(H) on G. Therefore

\k(H) h a s a minimum on G.

(13)

2. The a r g u m e n t given in the proof of Lemma 1 lezds to the follo~ving inequality:

where ~ ~ ( p , ) -' 0 a s p , -, 0 ; E, -, 0 due t o assumption 1 of t h e theorem and y,

=

inf J ~ O ( I . H ~ ) ~ ( H ( ~ ) - H,(Z))

.

H E G

x

The remaining part of t h e proof depends on t h e way in which t h e stepsize p, is chosen.

2(a) Suppose t h a t p, is chosen according to (12). Define

~ ( 8 ) =

SUP

IP

^:⁰^(P)⁴

BPI

^,

6 1

From assumption 1 of t h e theorem we know t h a t U(P)

>

0 if

8 >

0

.

^Taking

Ys

-

^-s ^.

8 =

₂ ^~f^7,

^>

^E, ^{and p,}

⁼

a(8) we now get from (14):

Inequality (15) immediately gives max 17,

-

^E, ^,⁰¹^-,0, which implies t h a t y, -, 0 because E, -, 0

2(b) Now l e t p, be chosen according t o (11)

.

Suppose t h a t t h e r e exists an

E

such t h a t for s

>

F

where a

>

0. Now from (14) we g e t :

\k(HS+l) I

mint+(^') -

^ap, ^,\k(HS

)I =

+(HS )

-

^ps ^, (1 6)

(14)

Summing ( 1 6 ) from s

> s

to k we obtain :

i =s

m

which contradicts + ( H )

>

-= because C p i = =. Therefore t h e r e is a subse- i = I

quence nk s u c h t h a t

maxt0 ⁹ ynk - znk

- T ~ ( P ~ ~ ) I

⁺⁰

.

Now suppose t h a t t h e r e is a subsequence m k s u c h t h a t

We may assume without loss of generality t h a t

Let u s take a sequence Lk s u c h t h a t

7

-

- 7 a for mk < i < l k ,

71k+l -'lk+l - r I ( ~ ~ k + l ) < a

We shall now estimate t h e value of functional + ( H ) on elements of sequence HLk

.

From ( 1 4 ) we can show t h a t

1, -1

To proceed with this estimate f u r t h e r i t is necessary t o estimate

C

^{pj ,}

j =n, which can be done as follows :

(15)

where & l k -, 0 a s k -, m

.

Now l e t us e s t i m a t e t h e t e r m being s u m m e d . From t h e definitio of t h e algorithm we have

,

H i )

^I^{c p i} _{( 1 9 )}

where C i s some positive c o n s t a n t . Assumption 2 of t h e t h e o r e m t o g e t h e r with (19) yields t h e following e s t i m a t e :

1

^f O(z,IP+')

-

^fO ( z , H i ) h ( H i , H l f 1 ) ^IC l p , leading t o

Here we also u s e d t h e f a c t t h a t t h e function f O ( z , H ) is bounded on s e t X x G

.

Combining ( 1 8 ) a n d ( 2 0 ) we g e t :

which implies

lk -I

a -

^E~~

C

P i >

i =mk K ^I Substituting ( 2 1 ) i n t o ( 1 7 ) yields

which c o n t r a d i c t s

inf + ( H )

> --

H E C since E~~ + 0

.

^This^again^gives

r,

^-r^0.

(16)

2(c) Proof for t h e case when t h e stepsize is chosen according t c (13) is similar t o 2(a) a n d 2(b).

This completes t h e proof.

Remark. Assumption 2 of t h e t h e o r e m can be easily s t a t e d without introducing t h e notions of weak ( s t a r ) topology and Levy-Prokhorov distance. One possible way is t o a s s u m e t h e following:

If O ( z . ~ r ) - f O(z,HZ)

1 ₁ J

A ( X . H ~ . H ~ ) d(H1(x) - Hz(.))

1

X

where

I

h(x.H1. Hz)

1 <

K

<

= for some positive K a n d H I E G

.

^{H Z}^E

^G .

^z^E

^X

and1 f O(X,H)\

<

C

<

^mfor some positive C and H E G , x

E X .

In order t o obtain a practical method from t h e general framework described in t h i s section, we have t o specify ways of performing s t e p 2(i). This is t h e purpose of t h e n e x t section.

4. SOLVING THE

LWEAR

SUBPROBUCM USING CUTTINGPLANE TECHNIQUES

We shall now consider a m e t h o d for solving l i n e a r subproblem (9) which r e d u c e s s t e p 2(i) of algorithm 1 t o t h e solution of one finite-dimensional linear programming problem. This m e t h o d uses some of t h e s a m e ideas a s generalized l i n e a r programming

[a],

cutting-plane algorithms [9], and h a s m u c h in common with t h e m e t h o d proposed in [?I for solution of linear problem (1)-(3).

The m e t h o d is based on t h e duality relations for problem (1)-(3), which were studied in [?I.

Let u s a s s u m e t h a t t h e assumptions of-Theorems 1 and 3 a r e fulfilled.

Then, according t o Theorem 1,

where

(17)

Suppose t h a t d s t r i b u t i o n

F

is fixed. Then i t is possible to solve t h e problem max p S ( u )

U E V +

with t h e help of t h e following cutting-plane method.

Algorithm 2

1. First select m + l points x l , x 2 ,

. . .

^,x m + l and set u = m + l . These points a r e used t o approximate function pS ( u ) by t h e function

m

p S ( u , O ) = min (f O ( X ~ , H ~ )

+ C

u i f i ( ~ j ) )

l s j s v i = l

The initial approximation u0 to t h e solution of problem ( 2 2 ) maximizes t h e function pS ( u ,0 ) :

u0

=

arg max pS ( u . 0 ) U E V +

so t h a t we have t o solve a linear programming problem.

2. Suppose t h a t before beginning iteration number k we have v points z 1 , z 2 ,

. . .

, z v and t h e c u r r e n t estimate of t h e minimum uk-'

.

Then iteration n u m b e r k involves t h e following stages:

(i) Take v

=

v + l ( i i ) Find

rn

z v

=

arg min ( f O ( Z , H S ) +

C

u i k - 1 f i ( Z ) )

t € X i = l

( i i i ) Calculate the next approximation t o t h e optimal solution u k + l :

uk

=

arg rnax p S ( u , k ) U E V +

where p S ( u , k ) is t h e c u r r e n t approximation of function p S ( u ) :

(18)

I t should be realized t h a t t h i s is only a general framev~~ork for so!uiion - m u c h h a s already been done t o avoid increasing t h e n u m b e r of points x i stored and t o implement approximate solutions of problem (23) (for details see [7], [ l o ] ) . The advantage of t h i s m e t h o d is t h a t it becomes possible t o obtain approxi- m a t e solutions of t h e initial problem (9) during t h e solution of problem (22).

These approximations a r e discrete distributions containing no more t h a n m + l points with positive probabilities:

where t h e

Fi

a r e nonzero solutions of t h e following linear programming problem :

v

m i n

C

^pi^f^*(xi,

^HS

⁾^,

?' i = ]

a n d t h e x i a r e t h e corresponding points. Note t h a t t h e above problem is a c t u - ally dual t o t h e linear program equivalent t o (24), a n d therefore both prob- l e m s c a n b e solved simultaneously.

What we actually n e e d while implementing algorithm 1 i s n o t a precise solution of problem (9) at e a c h s t e p , b u t r a t h e r t o t r a c k i t s changing e x t r e m u m value. The approximate solutions of (9) m a y be very rough for t h e first few iterations, gradually increasing in accuracy. I t appears t h a t algo- r i t h m 2 c a n be u s e d t o follow t h e e x t r e m u m value by tracking t h e changing optimal solution of dual problem (22). It i s only necessary t o m a k e one i t e r a - tion of algorithm 2 for e a c h i t e r a t i o n of algorithm 1.

(19)

In what follows we shall simplify the notation, writing f ' ( 2 , ~ ~ )

=

f:(z) .

We now want an algorithm which allows u s t o follo~-; t h e optimal solution of problem ( 2 2 ) a s t h e c u r r e n t distribution H S changes.

A l g o r i t h m 2a

1. First select m + l points z l , z 2 ,

. . .

^,zm+l and s e t v = m + l . The initial approximation u0 t o t h e solution of problem ( 2 2 ) maximizes t h e function pO(u,O) :

u 0

=

arg max cpO(u,~)

U E U + m

po(u,O) = min ( f : ( z j )

+ C

u i f i ( z j ) )

lsjsv i = l

2. Suppose t h a t before beginning iteration number s we have v points z 1 . z 2 ,

. . .

^,z Y and t h e c u r r e n t estimate of t h e minimum us-'

.

Then iteration n u m b e r s involves t h e following stages:

( i ) Take v

=

v+l ( i i ) Find

n

zY = arg min ( f : ( z ) + C uis-If i ( z ) ) z EX i =l

( i i i ) Calculate the next approximation t o t h e optimal solution us+' : us

=

arg max p S ( u , s )

U E V +

The following theorem proves t h e convergence of this method.

Theorem 4 Assume that:

1. S e t X is compact and functions f i(z) , i

=

T r n a r e continuous.

(20)

2. Functions f :(x) a r e conkinuous for x E X uniformly on s and

a s s + m .

3. There exists a y

>

0 such t h a t for any u E

u+,

^IIu

1 1 =

¹t h e r e exists an i E 1,

...,

m + l j for which

Then

max pS ( u ) - p S ( u S ) ⁺0

U E V + Proof

1. We shall first prove t h a t sequence us is bounded. Take any point ZLE U + and estimate t h e value of pS ( u , s ) a t this point. Select i Ir rn +1 such t h a t

which from assumption 3 of t h e theorem will always be possible. Then

P " ( " , ~ )

"

^f:(xi)-yI

^l ^{" I} ^I ^.

(25)

The uniform continuity of functions f:(x) implies t h e existence of a constant C such t h a t

1

f:(x) ¹⁵C

.

Combining this with ( 2 5 ) yields

p S ( ' l l * s ) ~ C - y I I

G O

and

These two inequalities lead t o

p S ( 0 , s ) - p ~ ( i i , s ) > y I 1'111

I

- 2 C

(21)

which implies t h a t t h e n o r m of a n y point u * which maximizes ~ + ~ ( u . , s ) is bounded, i.e.,

where c o n s t a n t s C a n d y do not depend on s

.

This proves t h a t s e q u e n c e us is bounded, because us maximizes rpS ( u , s )

.

2. Now suppose, arguing by contradiction, t h a t t h e t h e o r e m is not t r u e and t h a t t h e r e exists an a

>

0 a n d a sequence sk s u c h t h a t

m a x p S k ( u ) - p S k ( u S k )

>

a

.

U E V+

From t h e boundedness of sequence u s a n d assumptions 1 a n d 2 we m a y a s s u m e without loss of g e n e r a l i t y t h a t

where

vk =

sk

+

¹

+

m

+

¹

.

We shall now e s t i m a t e t h e difference

We have

for s o m e

K >

0 f r o m assumption 1. We also know t h a t pSk(u *) + p * , which t o g e t h e r with assumptions 1 a n d 2 implies t h a t

p S k * l ( u S k ) ⁺p* a n d p S k ( u S k )

-

^p*

^.

Hence

p S k + l ( u S k ) ²p S k ( u S k )

-

c l ( k )

where c l ( k ) + 0 a s k ⁺^m. We have a s s u m e d t h a t r n a ~ I f : + ~ ( z ) - f P ( z ) I ⁺

o

zEX

(22)

as s -+ and t h i s gives

sk + I

(pSk(usk) 2 $0 ( u S k ) - c 2 ( k ) where c Z ( k ) ⁺0 as k ^-+^m. But from t h e algorithm we have

where

vk =

^sk

+

¹

+

^m

+

1

.

From estimate ( 2 9 ) , we obtain:

m m

f ; + , ( z v k )

+ C

u i S k f i ( x v k ) 2 m i n [ f & ( 2 ' )

+ C

^uiSkf i ( x j ) ] 2

i = l l S j l ~ ~ + ~ - l a = I

min

[

f;+l ( z j )

+ 9

u i S k + ' f ( z j ) ] -

l s j s ~ ~ + ~ - l i = l

m a x min

[

f;+, ( x j )

+ 2

y f i ( x j ) ]

-

u ^EU+ I s j s v k + l - l i =l

K ]

I I

^uSk

-

^uSk+l

I I

²m a x ipSk+'(u)

-

K ]

I I

^{u S k}

-

^uSk+l

I I .

U E U + (30)

Combining (26)-(30) gives

m a x p S k + ' ( u ) - p S k + l ( u S k + l ) ^IK

I I

uSk

-

^uSk+l

I I +

U E V+

e l ( k )

+

e Z ( k )

+

^cq(k⁾

+

^K1

1 1

^uSk

-

^usk+l

I 1

^I^e5(k)

where c 5 ( k ) + 0 as k + m

.

This contradicts t h e initial assumption

a n d t h u s completes t h e proof.

(23)

We shall now give an algorithm based on t h e r e s u l t s obtained in Sections 3 a n d 4. I t is a s s u m e d t h a t t h e conditions of Theorem 3 a r e m e t .

Algorithm 1 ^a

1. We begin by choosing a n initial distribution

H0

which satisfies assumption 3 of Theorem 4

.

Let x l , x 2 , . . . , x m + l be t h e m t l points which form t h e ini- tial distribution HO. Consider t h e following linear programming problem:

Assumption 3 of Theorem 1 is satisfied if and only if problem (31)-(33) h a s a solution .ELI , iiz ^,

. . .

, - s u c h t h a t urn+]

<

0

.

If t h i s is t h e c a s e , t h e solution is t h e s a m e a s t h a t of t h e dual problem

where t h e optimal value of p m + ~ is less t h e n 0

.

The distribution

I?,

^where

R =

t ( z l , p l ) , ( x 2 , p z ) ,

. . .

^,( z m + l Frn+l)j

a n d & i s t h e optimal solution of (34)-(36), h a s t h e property

J

^{f i}( z ) d H ( x )

< o

X ⁽³⁷⁾

a n d t h e r e f o r e condition 2 of Theorem 1 is satisfied. The converse i s also t r u e (see [ 7 ] ) . Thus if condition 2 of Theorem 1 is fulfilled, i t is possible t o find m t l points z l , . . . , z m + l s u c h t h a t problem (34)-(36) h a s a solution jil ,

. . .

, pm+z - with

h+z ^<

⁰

^.

This g u a r a n t e e s t h a t condition 3 of Theorem 4

(24)

is fulfilled. These points, together with t h e probabilities

pi,

now form t h e desired initial distribution H O , which is a solution of t h e following problem :

min p H

Jf

~ ( z ) d H ( z ) r p , j =

i;;;;

X

This problem is of t h e form (1)-(3) and may be solved using algorithm 2 (see [7]). We do not need t o solve (38)-(40) exactly - we can stop when t h e c u r r e n t solution satisfies (37). This will occur after a finite number of iterations of algorithm 2 if condition 2 of Theorem 1 is satisfied. The initial s t e p of algo- r i t h m l a therefore involves the following stages:

(i) Take vo

=

m +l , where us is t h e number of points in distribution HS.

(ii) Obtain t h e initial distribution H1, where

by applying algorithm 2 to problem (38)-(40). This algorithm will produce a sequence of distributions

ES

which after substitution in problem (38)- (40) gives corresponding pS

.

Take as H1 t h e first

Bs

with p,

<

0.

(iii) Take t h e initial point u 1 E

U+

for solution of t h e dual problem.

2. Suppose t h a t before beginning iteration number s we have t h e c u r r e n t approximation t o optimal solution

HS

:

and point u s

.

Iteration number s t h e n involves t h e following operations, where steps (i)-(iii) correspond t o steps (i)-(iii) of algorithm 2a and s t e p (i) of algorithm 1, and steps (iv)-(v) correspond t o steps (ii)-(iii) of algorithm 1 :

(25)

(i) Take

J

= $

+

1

(ii) Find a new point xVst1 , where

z v s t ~

=

arg min

[f

'(x ,HS)

+ C

m ^uf

^f

^(x)]

z EX i=l

(iii) Solve the following linear programming problem :

a=l together with its dual:

This will give u s t h e next approximation t o t h e solution of t h e dual problem, us+], a n d also vector

pS+'

:

g+l=

(p;+l

^,

.

^,

.

^,pv*+i -s+l )

which will have n o more t h a n m+1 nonzero elements, say ( ^,

. . .

^I_Pkmtl^{-s + I} ^).

(iv) Take t h e family of distributions

H"+l(a)

= I(

zl,p;+' (a))

. ^{. . .}

^,⁽~ " + ~ . p ~ ~ + ~ ( a ) ) j where

, i f ( # kj for j = l , m + l ( l

-

^a)⁺

.

^otherwise

^.

Then find

a, =

arg rnin \II(HS+l(a))

(ka4 1

(26)

a n d take p, = minl a,

,fl,

j , where

(v) Take

HS+1 =

Hs +YP,

⁾

a n d go to s t e p 2(i).

Remark. It is not necessary to solve nonlinear programming problem (41) with g r e a t accuracy. All we need is a point zVs+' such t h a t

m

lim

i[

f O ( z V s + 1 , ~ )

+ C ut

f i ( Z v s + l ) ]

-

s +- i =I

m

min [f O ( Z , H ~ )

+ C

^uff i ( z ) ] ] = 0 ,

z EX i = l

I t is also possible t o avoid increases in t h e dimension of linear programming problem (42)-(44) by considering only points zi which satisfy some additional inequality (see [7]).

Remark. Algorithm 2a adds one additional point to t h e c u r r e n t approximation of optimal solution

HS

a t each iteration, which may not be convenient if we have restrictions on t h e amount of memory available for storing t h e distribution. In this case measures should be taken t o avoid this expansion, perhaps a t the expense of accuracy. Some possible ways of achieving this are discussed below.

1. Suppose we want t o find t h e best possible approximation, in no more than

N

points, t o t h e optimal solution of (5)-(7). (It is assumed t h a t some additional memory i s available for storing N further points.) We then proceed as follows:

(i) Run algorithm 2a until t h e c u r r e n t distribution HS contains 2N points.

Arrange t h e s e points in order of decreasing probabilities :

(27)

(ii) S t a r t algorithm 2a again from the distribution

(iii) Continue this process as long as t h e new ZN-point distribution has a b e t t e r value of + ( H ) t h a n t h e previous one.

2. Suppose t h a t we want t o find an approximation t o t h e optimal solution using a t m o s t N points.

(i) Run algorithm 2a until t h e c u r r e n t distribution contains N points. Let

- m a x p t . Divide t h e s e t 1 ,

. . .

, ZN i n t o two subsets:

=

^lsisN

where

x >

0 should be chosen previously.

(ii) S t a r t algorithm 2a again from distribution HI :

(iii) Continue t h i s process a s long as t h e value of *(H) in consecutive 2 N - point distributions improves a n d s e t

I2

is not empty.

3. Another possibility is t o u s e approximation t e c h n i q u e s t o fit d i s c r e t e distri-

butions by continuous ones. For example, splines c a n be used when t h e dimen- sions a r e small or when t h e distributions

HS

have independent components.

This approach needs f u r t h e r study.

The convergence of algorithm 2 follows directly from Theorems 3 a n d 4.

The n e x t r e s u l t m a y b e derived using t h e r e m a r k following Theorem 3.

(28)

Theorem 5

Let t h e following conditions be satisfied:

1. X is a compact set.

2. Functions f i ( z ) a r e continuous for z ^EA' , i

=

r m . 3. 9 ( H ) is convex and satisfies (8) and

If O ( z . ~ l ) - f O ( ~ , ~ Z ) I ~ I ~ X ( ~ . H I . H Z ) d ( H 1 - H Z ) \ X

f o r some summable ~ ( z ,-Y1,H2) such t h a t

I

A(z,H,,H,)

1 <

K

<

^m

for any H I , H Z E G

.

4. There exists an H ( x ) such t h a t

Then

lim

[+(PI -

min + ( H ) ]

=

0 s + = H E C

It is interesting to compare this algorithm with t h e methods for solving stochastic problems with recourse recently proposed by Wets [ l l ] . Although applied to quite different problems, they both use generalized linear programming techniques and on each iteration require solution of one linear programming problem and one nonlinear optimization problem.

More complex problems with nonlinear constraints can be t r e a t e d in t h e same way. If t h e definition of s e t G of constraints is changed to t h e following:

and functions \ k i ( H ) have directional derivatives of form ( 8 ) then analogues of Lemma 1 and Theorem 2 will hold

.

In t h i s case we can construct a feasible- direction type algorithm, which inherits all of t h e important characteristics of algorithm 1. The same ideas can be used t o solve minimax problems which

(29)

d e p e n d on d i s t r i b u t i o n f u n c t i o n s . This, t o g e t h e r x i t h t h e r e s u l t s of s o m e n u m e r i c a l e x p e r i m e n t s , will provide t h e s u b j e c t of a s u b s e q u e n t p a p e r .

(30)

REFERENCES

1. Yu. Ermoliev, "Method for stochastic programming in randomized stra- tegies," Kibernetica 1 (1970).

2. Yu. Ermoliev and C. Nedeva, "Stochastic optimization problems with par- tially h o w n distribution functions", CP-82-60, International Institute for Applied Systems Analysis, Laxenburg, Austria (1982).

3. A. Golodnikov, "Optimization problems with distribution functions", Disser- tation, V. Glushkov Institute for Cybernetics, Ukrainian Academy of Sci- ences, Kiev (1979).

4. P.J. Huber, Robust S a t i s t i c s , John Wiley and Sons (1981).

5. M.G. Krein, "The ideas of P.L. Tchebysheff and A A Markov in t h e theory of limiting values of integrals and their further developments", Am. Math.

Soc. Series 2, 12(1951), pp. 1-122. (translation).

6. A Golodnikov, A Gaivoronski and Le Ding Fung, "A method for solving limit extremum problems applied t o t h e minimization of functionals", fiber-

(31)

n e t i c a 1 (1980).

7. Yu. Ermoliev, A. Gaivoronslci and C. Nedeva, "Stochastic optimization prob- l e m s with incomplete information on distribution functions", \\'P-B3-xx, I n t e r n a t i o n a l I n s t i t u t e for Applied S y s t e m s Analysis, Laxenburg, Austria (1983).

8. G.B. Dantzig, L i n e a r P r o g r a m m i n g a n d E x t e n s i o n s , P r i n c e t o n University P r e s s (1963).

9. J.E. Kelley. "The cutting-plane m e t h o d for solving convex programs", J.

S o c . h d z s t . AppZ. M a t h . , B (1960), pp. 703-712.

10. B. Curtis Eaves a n d W.I. Zangwill, "Generalized c u t t i n g plane algorithms", SIM J. C o n t r o l , 9(4), (1971).

11. R. J.-B. Wets, "Stochastic programming: solution t e c h n i q u e s a n d approxi- m a t i o n schemes". In M a t h e m a t i c a l P r o g r a m m i n g , m e .&ate o f t h e Art

.

Springer-Verlag ( 1 983).