NOT FOR QUOTATION WITHOUT PERMISSIOE;
OF THE AUTHOR
OPTIM3ZATION OF FWNCTIONALS WHICH DEPEND ON DISI'RIBUTION FUNCIIONS: 1. NONLINEAR FUNCTIONAL AND
LWEAR
CONSiTAUWSk Gaivoronski
November 1983 WP-83-114
Working Papers a r e interim reports on work of t h e International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of t h e Institute or of its National Member Organiza- tions.
INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg. Austria
PREFACE
The development of optimization techniques for solving complex decision problems u n d e r uncertainty is c u r r e n t l y a major topic of research in t h e Sys- t e m a n d Decision Sciences Area a t IIASA, and in t h e Adaptation and Optimiza- tion g r o u p in particular.This paper deals with methods for t h e solution of prob- lems in which t h e objective function depends on probability distributions.
Such problems a r e common in reliability theory and various o t h e r branches of operations r e s e a r c h , but methods for dealing with t h e m have only recently begun t o emerge.
ANDRZEJ WIERZBICKl C h a i r m a n
S y s t e m a n d Decision S c i e n c e s
ABSTRACT
The main purpose of this paper is to discuss numerical optimization pro- cedures for problems in which both t h e objective function and t h e constraints depend on d s t r i b u t i o n functions. The objective function is assumed to be nonlinear a n d t o have directional derivatives, while t h e constraints a r e taken as linear. The proposed algorithm involves linearization of t h e objective func- tion a t t h e c u r r e n t point and solution of a n auxiliary linear subproblem. This last problem is solved using duality relations and cutting-plane techniques.
OPTIMIZATION OF FUNCTIONALS P33ICH DEPEND 04.;' C?TSI?EIBUTZON FLTNCTIONS: 1. NONLINEAR FUNCTIONAL AND LINEAR CQNSTmdTS
A. Gaivoronski
1. INTRODUCTION
The purpose of this paper is to present numerical methods for solving optimization problems in which both t h e objective function and t h e con- straints depend on distribution functions. Such problems occur in stochastic programming [1,2], reliability theory and various branches of operations research ( surveyed in [3] ), and robust statistics [4] , among others
.
If both the objective function and the constraints a r e linear with respect to t h e distribution functions then the problem can be s t a t e d as follows:
min
f
(x ) d ~ ( x )X
where
X
is a s e t in Euclidean space Rn.
Some specific cases of problem (1)- (3) arise in the theory of Markovian moments and can be solved analytically[5]. However, t h e success of analytical methods is very limited even in t h e linear case (1)-(3) and therefore numerical algorithms a r e needed. Such methods began t o appear in the middle of the last decade.
One idea is t o approximate set X by a sequence of finite subsets
X1.X2, . . . , &
,... , whereThis sequence of s e t s has t h e property t h a t
a s s t e n d s t o infinity, i.e., t h e g r e a t e s t distance between points in s e t X and s e t
X,
t e n d s to zero. Let G, be t h e s e t of all distribution functions which correspond to probabilistic measures concentrated in points from4
1 n ns
G,=[(x,. p l )
. . .
.,
. C p i=
1 . P i 2 01.
i =l
If we include one more constraint.
problem (1)-(4) becomes a finite-dimensional linear programming problem;
t h i s raises t h e possibility of approximating t h e original problem (1)-(3) by a s e q u e n c e of finite-dimensional problems (1)-(4). This idea was explored in [3,6], where i t was also used t o solve nonlinear and minimax problems involv- ing distribution functions. However, this approach can be used only for s e t s X of small dimension ( in fact not exceeding t h r e e ) due t o t h e high dimensional- ity of t h e associated linear programming problems.
Another possible way of solving (1)-(3) is based on t h e duality relations between problem (1)-(3) a n d some finite-dimensional minimax problem. First
proposed in [ I ] , this idea was taken further in [ 2 , 7 ] , where methods of t h e cutting-plane type a r e given.
The purpose of this paper is t o use this last approach t o develop solution techniques for nonlinear problems of t h e kind :
min +(H) subject to
where G is defined in t h e following way:
We propose an analogue of t h e linearization ( Frank - Wolfe ) method in which we a r e required to solve subproblems of type (1)-(3). It appears t h a t it is not necessary t o solve these subproblems precisely: duality relations make i t pos- sible t o utilize rough solutions of (1)-(3) so t h a t only a limited number of cal- culations a r e needed a t each iteration.
2.
CHARAC-TION OF OPTIMAL DISllUBUTIONS
We shall use t h e s a m e l e t t e r , say H , t o denote both t h e distribution func- tion and t h e underlying probabilistic measure, where this will not cause con- fusion. For a given probabilistic measure H we shall denote by B+ the collec- tion of all subsets of X with positive measure H , a n d by dom H t h e support s e t of H , i.e.,
Let u s first introduce t h e class of functionals considered, + ( H ) . What we actually need i s some analogue of directional differentiability. Suppose t h a t
Q ( H * + ~ ( H - H * ) ) = + ( H * ) + a
f
f '(z ,,Li")d(hr(z)-hr*(s)) +.r(a.H*,hr) ( 8 ) Xwhere
r ( a , H * , H ) = o ( a )
In what follows we a s s u m e t h a t functions f O ( X , H ) , f i ( x ) a r e such t h a t expressions (7) a n d (8) a r e meaningful.
The following simple conditions a r e necessary, a n d in t h e convex c a s e also sufficient, for distribution H t o be a solution of problem (5)-(7) :
Lemma 1
If + ( H e ) r +(H) for some H * E G a n d all H E G t h e n
Proof
The proof is of t h e traditional type for n e c e s s a r y conditions. Note t h a t z * g f f O ( z . ~ * ) d ~ * ( z )
X
i s always t r u e . Suppose, on t h e contrary, t h a t t h e r e exists a y>O s u c h t h a t z
*-
f f O ( z , ~ * ) d ~ * ( z ) G -27X
Then t h e r e exists a n
H
E G s u c h t h a tf f
O ( ~ . H * ) H ( Z ) - f f O ( ~ . ~ * ) d ~ * ( ~ )<
7X X
Consider now distributions Ha :
H a = a H + ( l - a ) H *
.
According t o ( 8 ) ,
*(Ha)
=
.I.(H*)+~f f
O(z . ~ * ) d ( H ( z ) - ~ ' ( z ) ) + T ( ~ , H * . H ) Xa n d for small a we obtain
which contradicts t h e optimelity of H *
.
This completes the prcof.Remark. I f , additionally, . k ( H ) is convex, i.e.,
then this l e m m a also gives sufficient conditions for a global minimum.
Lemma 1 implies t h a t to check the necessary conditions for problem ( 5 ) - ( 7 ) requires solution of a linear problem of t h e form ( 1 ) - ( 3 ) . where q ( 2 )
=
f ' ( 2 , ~ ) . The solution of problem ( 1 ) - ( 3 ) can be characterized through the duality relations summarized in t h e following theorem, which was proved in [ 7 ] .Theorem 1
Suppose t h a t t h e following assumptions a r e satisfied:
1. Set X is compact and functions f i ( z ) , i = O,m are continuous on X
.
2. c o Z # $ where
Then
1. A solution of problem ( 1 ) - ( 3 ) exists and t h e optimal value o f f ' ( z ) is equal t o t h e optimal value of t h e following minimax problem:
where I/+
= t
u : u E Rm ,q
1Oj .
2. For any solution H* of problem ( 1 ) - ( 3 ) t h e r e exists a u * E I/+ such t h a t V ( u *)
=
maxp ( u )
, dom H *r
X*(U *)U E V + where
3. There exists a finite s e t
4 =
[ z l ,.
. . , s t1
, t I m + l , and a solutionH *
of problem (1)-(3) such t h a t dom H *
=
Xt , i.e., t h e probabilistic measureH * can be expressed as a collection of t < m + l pairs
t ( z l # p l )
,. .
. ,( z t , p f ) j -
The probabilities jil ,
. . .
,pf
a r e solutions of t h e following linear program- ming problem:Combining Theorem 1 a n d Lemma 1 we obtain t h e following result:
Theorem 2
Suppose t h a t + ( H e )
s
+ ( H ) for all H E G and t h e following assumptions are satisfied :1. Set X i s compact a n d functions f O ( z , H C ) , f i(z) , i
= l,m
a r e continuous o n X .2. c o Z # $I where
Then 1. W e have
where
f
O ( z . H e ) d H * ( z )=
max p ( u )X U E V +
m
p ( u )
=
min (f O ( Z , H * )+ C
u i f i ( z ) )z EX i = l
2. There exists a u, , p ( ~ *) = ma>: p ( u ) , u E ' CJ such t h a t U E V +
dom H
c X*(U
*) where3. NUMERICAL
METHOD FOR
SOL?TNG PROELEES WTH L I h W CON-15I t is now possible t o construct a method which finds points satisfying t h e necessary conditions of Lemma 1 or, in t h e convex case, global minima
.
This method is of t h e linearization type.A l g o r i t h m 1
1. Begin with an initial distribution HI.
2. Suppose we have an approximate solution HS before starting iteration number s
.
Then a t t h e s - t h iteration we do t h e following :(i) Find a distribution
ps
E G such t h a tJ f O ( ~ . ~ s ) d R S ( ~ ) L Z,
+ 5
Xwhere
z,= inf J f O ( x . ~ s ) ~ ( x ) H E G
a n d E ,
>
0 is t h e accuracy with which problem (9) is solved. I t is n o t necessary to know t h e value of E, , only t h a t E , -, 0(ii) Check whether functional +(H) decreases in t h e direction
HS -
HS.
If not, r e t u r n t o s t e p (i) and solve problem (9) with higher accuracy. Other- wise go to s t e p (iii).(iii) Choose a stepsize p, : O<p,
r
1 and calculate a new approximation t o t h e optimal solution:HS+1 = (1-p,)HS
+ psHS
Then go t o s t e p (i).
Remark. The stepsize can be chosen according t o a n u m b e r of different rules:
- OD
( a ) p,
. CFs
= ms =O
(b) p, = a r g min \k(HS
+
a ( p - H S ) ) a20( c ) Take a sequence a, ,where
a n d
p,
=
minlarg min \ k ( ~ ~+
a ( H S-
H')), a, j.
a20 (13)
We now introduce topology on s e t G . We shall use weak ( s t a r ) convergence topology, which by definition i s t h e weakest topology in t h e s p a c e of all proba- bility m e a s u r e s s u c h t h a t t h e m a p
H
-.
j g ( 4 W z )is continuous wherever g(z) i s bounded a n d continuous. Note t h a t in t h i s topology t h e s e t of all m e a s u r e s with c o m p a c t support X is c o m p a c t . In t h e discussions t h a t follow we shall use t h i s topology when speaking a b o u t t h e con- tinuity of c e r t a i n functionals with r e s p e c t t o probabilistic m e a s u r e s .
Let v ( H ' . H ~ ) denote s o m e distance between distributions HI a n d H2 which will i n d u c e weak ( s t a r ) topology on t h e s e t of all distributions - for example, t h e Levy-Prokhorov distance would do.
We shall now prove t h e convergence of t h e a l g o r i t l ~ m given a b o v ~ . Theorem 3
Suppose t h a t t h e following s t a t e m e n t s are t r u e : 1. X cRn is compact s e t .
2. Functional +(H) satisfies (8) where r ( a , H * . H ) +
a uniformly over H* E G , H E G
.
3. Function f ' ( x , ~ ) is continuous with r e s p e c t t o X E X and satisfies t h e Lipschitz condition with respect to HEG :
for
H I
E G ,H2
EG
, x E X.
Functions f i ( z ) , i= %
a r e continuous with respect to z E X .4. E s + O .
5. Stepsize p, is chosen according to one of (11)-(13)
.
Then
lim inf ff o ( z , ~ s ) d ( ~ ( z ) - ~ ( ~ ) )
=
0s+- HEG
and for all limit distributions H* of sequence HS inf
f
f O ( z , ~ ) d ( ~ ( x ) - ~ s ( z ) )=
0.
HEG
x
Proof
1. Note t h a t under assumption 2 of t h e t h e o r e m s e t G is compact in weak ( s t a r ) topology a n d therefore function f O(z,H) is bounded on X x G
.
This together with (8) implies t h e continuity of functional \k(H) on G. Therefore\k(H) h a s a minimum on G.
2. The a r g u m e n t given in the proof of Lemma 1 lezds to the follo~ving inequal- ity:
where ~ ~ ( p , ) -' 0 a s p , -, 0 ; E, -, 0 due t o assumption 1 of t h e theorem and y,
=
inf J ~ O ( I . H ~ ) ~ ( H ( ~ ) - H,(Z)).
H E G
x
The remaining part of t h e proof depends on t h e way in which t h e stepsize p, is chosen.
2(a) Suppose t h a t p, is chosen according to (12). Define
~ ( 8 ) =
SUPIP
: 0 (P) 4BPI
,6 1
From assumption 1 of t h e theorem we know t h a t U(P)
>
0 if8 >
0.
TakingYs
-
-s .8 =
2 ~f 7,>
E, and p,=
a(8) we now get from (14):Inequality (15) immediately gives max 17,
-
E, , 01 -, 0, which implies t h a t y, -, 0 because E, -, 02(b) Now l e t p, be chosen according t o (11)
.
Suppose t h a t t h e r e exists anE
such t h a t for s>
Fwhere a
>
0. Now from (14) we g e t :\k(HS+l) I
mint+(^') -
ap, , \k(HS)I =
+(HS )-
ps , (1 6)Summing ( 1 6 ) from s
> s
to k we obtain :i =s
m
which contradicts + ( H )
>
-= because C p i = =. Therefore t h e r e is a subse- i = Iquence nk s u c h t h a t
maxt0 9 ynk - znk
- T ~ ( P ~ ~ ) I
+ 0.
Now suppose t h a t t h e r e is a subsequence m k s u c h t h a t
We may assume without loss of generality t h a t
Let u s take a sequence Lk s u c h t h a t
7
-
- 7 a for mk < i < l k ,71k+l -'lk+l - r I ( ~ ~ k + l ) < a
We shall now estimate t h e value of functional + ( H ) on elements of sequence HLk
.
From ( 1 4 ) we can show t h a t1, -1
To proceed with this estimate f u r t h e r i t is necessary t o estimate
C
pj ,j =n, which can be done as follows :
where & l k -, 0 a s k -, m
.
Now l e t us e s t i m a t e t h e t e r m being s u m m e d . From t h e definitio of t h e algorithm we have
,
H i )
I c p i ( 1 9 )where C i s some positive c o n s t a n t . Assumption 2 of t h e t h e o r e m t o g e t h e r with (19) yields t h e following e s t i m a t e :
1
f O(z,IP+')-
f O ( z , H i ) h ( H i , H l f 1 ) I C l p , leading t oHere we also u s e d t h e f a c t t h a t t h e function f O ( z , H ) is bounded on s e t X x G
.
Combining ( 1 8 ) a n d ( 2 0 ) we g e t :
which implies
lk -I
a -
E~~C
P i >i =mk K I Substituting ( 2 1 ) i n t o ( 1 7 ) yields
which c o n t r a d i c t s
inf + ( H )
> --
H E C since E~~ + 0
.
This again givesr,
-r 0.2(c) Proof for t h e case when t h e stepsize is chosen according t c (13) is similar t o 2(a) a n d 2(b).
This completes t h e proof.
Remark. Assumption 2 of t h e t h e o r e m can be easily s t a t e d without introducing t h e notions of weak ( s t a r ) topology and Levy-Prokhorov distance. One possible way is t o a s s u m e t h e following:
If O ( z . ~ r ) - f O(z,HZ)
1 1 J
A ( X . H ~ . H ~ ) d(H1(x) - Hz(.))1
X
where
I
h(x.H1. Hz)1 <
K<
= for some positive K a n d H I E G.
H Z EG .
z EX
and1 f O(X,H)\
<
C<
m for some positive C and H E G , xE X .
In order t o obtain a practical method from t h e general framework described in t h i s section, we have t o specify ways of performing s t e p 2(i). This is t h e purpose of t h e n e x t section.
4. SOLVING THE
LWEAR
SUBPROBUCM USING CUTTINGPLANE TECHNIQUESWe shall now consider a m e t h o d for solving l i n e a r subproblem (9) which r e d u c e s s t e p 2(i) of algorithm 1 t o t h e solution of one finite-dimensional linear programming problem. This m e t h o d uses some of t h e s a m e ideas a s general- ized l i n e a r programming
[a],
cutting-plane algorithms [9], and h a s m u c h in common with t h e m e t h o d proposed in [?I for solution of linear problem (1)-(3).The m e t h o d is based on t h e duality relations for problem (1)-(3), which were studied in [?I.
Let u s a s s u m e t h a t t h e assumptions of-Theorems 1 and 3 a r e fulfilled.
Then, according t o Theorem 1,
where
Suppose t h a t d s t r i b u t i o n
F
is fixed. Then i t is possible to solve t h e problem max p S ( u )U E V +
with t h e help of t h e following cutting-plane method.
Algorithm 2
1. First select m + l points x l , x 2 ,
. . .
, x m + l and set u = m + l . These points a r e used t o approximate function pS ( u ) by t h e functionm
p S ( u , O ) = min (f O ( X ~ , H ~ )
+ C
u i f i ( ~ j ) )l s j s v i = l
The initial approximation u0 to t h e solution of problem ( 2 2 ) maximizes t h e function pS ( u ,0 ) :
u0
=
arg max pS ( u . 0 ) U E V +so t h a t we have t o solve a linear programming problem.
2. Suppose t h a t before beginning iteration number k we have v points z 1 , z 2 ,
. . .
, z v and t h e c u r r e n t estimate of t h e minimum uk-'.
Then itera- tion n u m b e r k involves t h e following stages:(i) Take v
=
v + l ( i i ) Findrn
z v
=
arg min ( f O ( Z , H S ) +C
u i k - 1 f i ( Z ) )t € X i = l
( i i i ) Calculate the next approximation t o t h e optimal solution u k + l :
uk
=
arg rnax p S ( u , k ) U E V +where p S ( u , k ) is t h e c u r r e n t approximation of function p S ( u ) :
I t should be realized t h a t t h i s is only a general framev~~ork for so!uiion - m u c h h a s already been done t o avoid increasing t h e n u m b e r of points x i stored and t o implement approximate solutions of problem (23) (for details see [7], [ l o ] ) . The advantage of t h i s m e t h o d is t h a t it becomes possible t o obtain approxi- m a t e solutions of t h e initial problem (9) during t h e solution of problem (22).
These approximations a r e discrete distributions containing no more t h a n m + l points with positive probabilities:
where t h e
Fi
a r e nonzero solutions of t h e following linear programming prob- lem :v
m i n
C
pi f *(xi,HS
) ,?' i = ]
a n d t h e x i a r e t h e corresponding points. Note t h a t t h e above problem is a c t u - ally dual t o t h e linear program equivalent t o (24), a n d therefore both prob- l e m s c a n b e solved simultaneously.
What we actually n e e d while implementing algorithm 1 i s n o t a precise solution of problem (9) at e a c h s t e p , b u t r a t h e r t o t r a c k i t s changing e x t r e m u m value. The approximate solutions of (9) m a y be very rough for t h e first few iterations, gradually increasing in accuracy. I t appears t h a t algo- r i t h m 2 c a n be u s e d t o follow t h e e x t r e m u m value by tracking t h e changing optimal solution of dual problem (22). It i s only necessary t o m a k e one i t e r a - tion of algorithm 2 for e a c h i t e r a t i o n of algorithm 1.
In what follows we shall simplify the notation, writing f ' ( 2 , ~ ~ )
=
f:(z) .We now want an algorithm which allows u s t o follo~-; t h e optimal solution of problem ( 2 2 ) a s t h e c u r r e n t distribution H S changes.
A l g o r i t h m 2a
1. First select m + l points z l , z 2 ,
. . .
, zm+l and s e t v = m + l . The initial approx- imation u0 t o t h e solution of problem ( 2 2 ) maximizes t h e function pO(u,O) :u 0
=
arg max cpO(u,~)U E U + m
po(u,O) = min ( f : ( z j )
+ C
u i f i ( z j ) )lsjsv i = l
2. Suppose t h a t before beginning iteration number s we have v points z 1 . z 2 ,
. . .
, z Y and t h e c u r r e n t estimate of t h e minimum us-'.
Then itera- tion n u m b e r s involves t h e following stages:( i ) Take v
=
v+l ( i i ) Findn
zY = arg min ( f : ( z ) + C uis-If i ( z ) ) z EX i =l
( i i i ) Calculate the next approximation t o t h e optimal solution us+' : us
=
arg max p S ( u , s )U E V +
The following theorem proves t h e convergence of this method.
Theorem 4 Assume that:
1. S e t X is compact and functions f i(z) , i
=
T r n a r e continuous.2. Functions f :(x) a r e conkinuous for x E X uniformly on s and
a s s + m .
3. There exists a y
>
0 such t h a t for any u Eu+,
IIu1 1 =
1 t h e r e exists an i E 1,...,
m + l j for whichThen
max pS ( u ) - p S ( u S ) + 0
U E V + Proof
1. We shall first prove t h a t sequence us is bounded. Take any point ZLE U + and estimate t h e value of pS ( u , s ) a t this point. Select i Ir rn +1 such t h a t
which from assumption 3 of t h e theorem will always be possible. Then
P " ( " , ~ )
"
f:(xi)-yIl " I I .
(25)The uniform continuity of functions f:(x) implies t h e existence of a constant C such t h a t
1
f:(x) 15 C.
Combining this with ( 2 5 ) yieldsp S ( ' l l * s ) ~ C - y I I
G O
and
These two inequalities lead t o
p S ( 0 , s ) - p ~ ( i i , s ) > y I 1'111
I
- 2 Cwhich implies t h a t t h e n o r m of a n y point u * which maximizes ~ + ~ ( u . , s ) is bounded, i.e.,
where c o n s t a n t s C a n d y do not depend on s
.
This proves t h a t s e q u e n c e us is bounded, because us maximizes rpS ( u , s ).
2. Now suppose, arguing by contradiction, t h a t t h e t h e o r e m is not t r u e and t h a t t h e r e exists an a
>
0 a n d a sequence sk s u c h t h a tm a x p S k ( u ) - p S k ( u S k )
>
a.
U E V+
From t h e boundedness of sequence u s a n d assumptions 1 a n d 2 we m a y a s s u m e without loss of g e n e r a l i t y t h a t
where
vk =
sk+
1+
m+
1.
We shall now e s t i m a t e t h e difference
We have
for s o m e
K >
0 f r o m assumption 1. We also know t h a t pSk(u *) + p * , which t o g e t h e r with assumptions 1 a n d 2 implies t h a tp S k * l ( u S k ) + p* a n d p S k ( u S k )
-
p*.
Hence
p S k + l ( u S k ) 2 p S k ( u S k )
-
c l ( k )where c l ( k ) + 0 a s k + m. We have a s s u m e d t h a t r n a ~ I f : + ~ ( z ) - f P ( z ) I +
o
zEX
as s -+ and t h i s gives
sk + I
(pSk(usk) 2 $0 ( u S k ) - c 2 ( k ) where c Z ( k ) + 0 as k -+ m. But from t h e algorithm we have
where
vk =
sk+
1+
m+
1.
From estimate ( 2 9 ) , we obtain:m m
f ; + , ( z v k )
+ C
u i S k f i ( x v k ) 2 m i n [ f & ( 2 ' )+ C
uiSk f i ( x j ) ] 2i = l l S j l ~ ~ + ~ - l a = I
min
[
f;+l ( z j )+ 9
u i S k + ' f ( z j ) ] -l s j s ~ ~ + ~ - l i = l
m a x min
[
f;+, ( x j )+ 2
y f i ( x j ) ]-
u E U+ I s j s v k + l - l i =l
K ]
I I
uSk-
uSk+lI I
2 m a x ipSk+'(u)-
K ]I I
u S k-
uSk+lI I .
U E U + (30)
Combining (26)-(30) gives
m a x p S k + ' ( u ) - p S k + l ( u S k + l ) I K
I I
uSk-
uSk+lI I +
U E V+
e l ( k )
+
e Z ( k )+
cq(k )+
K11 1
uSk-
usk+lI 1
I e5(k )where c 5 ( k ) + 0 as k + m
.
This contradicts t h e initial assumption
a n d t h u s completes t h e proof.
We shall now give an algorithm based on t h e r e s u l t s obtained in Sections 3 a n d 4. I t is a s s u m e d t h a t t h e conditions of Theorem 3 a r e m e t .
Algorithm 1 a
1. We begin by choosing a n initial distribution
H0
which satisfies assumption 3 of Theorem 4.
Let x l , x 2 , . . . , x m + l be t h e m t l points which form t h e ini- tial distribution HO. Consider t h e following linear programming problem:Assumption 3 of Theorem 1 is satisfied if and only if problem (31)-(33) h a s a solution .ELI , iiz ,
. . .
, - s u c h t h a t urn+]<
0.
If t h i s is t h e c a s e , t h e solu- tion is t h e s a m e a s t h a t of t h e dual problemwhere t h e optimal value of p m + ~ is less t h e n 0
.
The distributionI?,
whereR =
t ( z l , p l ) , ( x 2 , p z ) ,. . .
, ( z m + l Frn+l)ja n d & i s t h e optimal solution of (34)-(36), h a s t h e property
J
f i ( z ) d H ( x )< o
X (37)
a n d t h e r e f o r e condition 2 of Theorem 1 is satisfied. The converse i s also t r u e (see [ 7 ] ) . Thus if condition 2 of Theorem 1 is fulfilled, i t is possible t o find m t l points z l , . . . , z m + l s u c h t h a t problem (34)-(36) h a s a solution jil ,
. . .
, pm+z - withh+z <
0.
This g u a r a n t e e s t h a t condition 3 of Theorem 4is fulfilled. These points, together with t h e probabilities
pi,
now form t h e desired initial distribution H O , which is a solution of t h e following problem :min p H
Jf
~ ( z ) d H ( z ) r p , j =i;;;;
X
This problem is of t h e form (1)-(3) and may be solved using algorithm 2 (see [7]). We do not need t o solve (38)-(40) exactly - we can stop when t h e c u r r e n t solution satisfies (37). This will occur after a finite number of iterations of algorithm 2 if condition 2 of Theorem 1 is satisfied. The initial s t e p of algo- r i t h m l a therefore involves the following stages:
(i) Take vo
=
m +l , where us is t h e number of points in distribution HS.(ii) Obtain t h e initial distribution H1, where
by applying algorithm 2 to problem (38)-(40). This algorithm will produce a sequence of distributions
ES
which after substitution in problem (38)- (40) gives corresponding pS.
Take as H1 t h e firstBs
with p,<
0.(iii) Take t h e initial point u 1 E
U+
for solution of t h e dual problem.2. Suppose t h a t before beginning iteration number s we have t h e c u r r e n t approximation t o optimal solution
HS
:and point u s
.
Iteration number s t h e n involves t h e following operations, where steps (i)-(iii) correspond t o steps (i)-(iii) of algorithm 2a and s t e p (i) of algorithm 1, and steps (iv)-(v) correspond t o steps (ii)-(iii) of algorithm 1 :(i) Take
J
= $+
1(ii) Find a new point xVst1 , where
z v s t ~
=
arg min[f
'(x ,HS)+ C
m uff
(x)]z EX i=l
(iii) Solve the following linear programming problem :
a=l together with its dual:
This will give u s t h e next approximation t o t h e solution of t h e dual prob- lem, us+], a n d also vector
pS+'
:g+l=
(p;+l
,.
,.
, pv*+i -s+l )which will have n o more t h a n m+1 nonzero elements, say ( ,
. . .
I Pkmtl -s + I ).(iv) Take t h e family of distributions
H"+l(a)
= I(
zl,p;+' (a)). . . .
, ( ~ " + ~ . p ~ ~ + ~ ( a ) ) j where, i f ( # kj for j = l , m + l ( l
-
a) +.
otherwise.
Then find
a, =
arg rnin \II(HS+l(a))(ka4 1
a n d take p, = minl a,
,fl,
j , where(v) Take
HS+1 =
Hs +YP,
)a n d go to s t e p 2(i).
Remark. It is not necessary to solve nonlinear programming problem (41) with g r e a t accuracy. All we need is a point zVs+' such t h a t
m
lim
i[
f O ( z V s + 1 , ~ )+ C ut
f i ( Z v s + l ) ]-
s +- i =I
m
min [f O ( Z , H ~ )
+ C
uf f i ( z ) ] ] = 0 ,z EX i = l
I t is also possible t o avoid increases in t h e dimension of linear programming problem (42)-(44) by considering only points zi which satisfy some additional inequality (see [7]).
Remark. Algorithm 2a adds one additional point to t h e c u r r e n t approximation of optimal solution
HS
a t each iteration, which may not be convenient if we have restrictions on t h e amount of memory available for storing t h e distribu- tion. In this case measures should be taken t o avoid this expansion, perhaps a t the expense of accuracy. Some possible ways of achieving this are discussed below.1. Suppose we want t o find t h e best possible approximation, in no more than
N
points, t o t h e optimal solution of (5)-(7). (It is assumed t h a t some additional memory i s available for storing N further points.) We then proceed as follows:
(i) Run algorithm 2a until t h e c u r r e n t distribution HS contains 2N points.
Arrange t h e s e points in order of decreasing probabilities :
(ii) S t a r t algorithm 2a again from the distribution
(iii) Continue this process as long as t h e new ZN-point distribution has a b e t t e r value of + ( H ) t h a n t h e previous one.
2. Suppose t h a t we want t o find an approximation t o t h e optimal solution using a t m o s t N points.
(i) Run algorithm 2a until t h e c u r r e n t distribution contains N points. Let
- m a x p t . Divide t h e s e t 1 ,
. . .
, ZN i n t o two subsets:=
lsisNwhere
x >
0 should be chosen previously.(ii) S t a r t algorithm 2a again from distribution HI :
(iii) Continue t h i s process a s long as t h e value of *(H) in consecutive 2 N - point distributions improves a n d s e t
I2
is not empty.3. Another possibility is t o u s e approximation t e c h n i q u e s t o fit d i s c r e t e distri-
butions by continuous ones. For example, splines c a n be used when t h e dimen- sions a r e small or when t h e distributions
HS
have independent components.This approach needs f u r t h e r study.
The convergence of algorithm 2 follows directly from Theorems 3 a n d 4.
The n e x t r e s u l t m a y b e derived using t h e r e m a r k following Theorem 3.
Theorem 5
Let t h e following conditions be satisfied:
1. X is a compact set.
2. Functions f i ( z ) a r e continuous for z E A' , i
=
r m . 3. 9 ( H ) is convex and satisfies (8) andIf O ( z . ~ l ) - f O ( ~ , ~ Z ) I ~ I ~ X ( ~ . H I . H Z ) d ( H 1 - H Z ) \ X
f o r some summable ~ ( z ,-Y1,H2) such t h a t
I
A(z,H,,H,)1 <
K<
mfor any H I , H Z E G
.
4. There exists an H ( x ) such t h a t
Then
lim
[+(PI -
min + ( H ) ]=
0 s + = H E CIt is interesting to compare this algorithm with t h e methods for solving sto- chastic problems with recourse recently proposed by Wets [ l l ] . Although applied to quite different problems, they both use generalized linear program- ming techniques and on each iteration require solution of one linear program- ming problem and one nonlinear optimization problem.
More complex problems with nonlinear constraints can be t r e a t e d in t h e same way. If t h e definition of s e t G of constraints is changed to t h e following:
and functions \ k i ( H ) have directional derivatives of form ( 8 ) then analogues of Lemma 1 and Theorem 2 will hold
.
In t h i s case we can construct a feasible- direction type algorithm, which inherits all of t h e important characteristics of algorithm 1. The same ideas can be used t o solve minimax problems whichd e p e n d on d i s t r i b u t i o n f u n c t i o n s . This, t o g e t h e r x i t h t h e r e s u l t s of s o m e n u m e r i c a l e x p e r i m e n t s , will provide t h e s u b j e c t of a s u b s e q u e n t p a p e r .
REFERENCES
1. Yu. Ermoliev, "Method for stochastic programming in randomized stra- tegies," Kibernetica 1 (1970).
2. Yu. Ermoliev and C. Nedeva, "Stochastic optimization problems with par- tially h o w n distribution functions", CP-82-60, International Institute for Applied Systems Analysis, Laxenburg, Austria (1982).
3. A. Golodnikov, "Optimization problems with distribution functions", Disser- tation, V. Glushkov Institute for Cybernetics, Ukrainian Academy of Sci- ences, Kiev (1979).
4. P.J. Huber, Robust S a t i s t i c s , John Wiley and Sons (1981).
5. M.G. Krein, "The ideas of P.L. Tchebysheff and A A Markov in t h e theory of limiting values of integrals and their further developments", Am. Math.
Soc. Series 2, 12(1951), pp. 1-122. (translation).
6. A Golodnikov, A Gaivoronski and Le Ding Fung, "A method for solving limit extremum problems applied t o t h e minimization of functionals", fiber-
n e t i c a 1 (1980).
7. Yu. Ermoliev, A. Gaivoronslci and C. Nedeva, "Stochastic optimization prob- l e m s with incomplete information on distribution functions", \\'P-B3-xx, I n t e r n a t i o n a l I n s t i t u t e for Applied S y s t e m s Analysis, Laxenburg, Austria (1983).
8. G.B. Dantzig, L i n e a r P r o g r a m m i n g a n d E x t e n s i o n s , P r i n c e t o n University P r e s s (1963).
9. J.E. Kelley. "The cutting-plane m e t h o d for solving convex programs", J.
S o c . h d z s t . AppZ. M a t h . , B (1960), pp. 703-712.
10. B. Curtis Eaves a n d W.I. Zangwill, "Generalized c u t t i n g plane algorithms", SIM J. C o n t r o l , 9(4), (1971).
11. R. J.-B. Wets, "Stochastic programming: solution t e c h n i q u e s a n d approxi- m a t i o n schemes". In M a t h e m a t i c a l P r o g r a m m i n g , m e .&ate o f t h e Art
.
Springer-Verlag ( 1 983).