• Keine Ergebnisse gefunden

Pi = Prob le(w) = ell·

Step 1. Solve the problem

minimize h (z ) subject to z EKk,

whereK"is a convex polyhedron. Letzkbe an optimal solution tQ this problem.

Ifhi (zk) ~ Pi, i

=

0, ... , m then zk is an optimal solution to problem (5.7).

Otherwise go to Step 2.

Step 2. Let)..k be the largest ),,(0 :$; ).. :$; 1) for which the following inequality holds

hi(Zl

+

)..(zk - zl)) ~ Pi, i=O, ...,m.

Various one·dimensional methods can be applied to solve this problem. Let yk

=

zl

+

)..k(zk _ zl).

Ifh(yk) - h(zk) :$;e where e is a previously chosen small positive number then we stop and accept y" as an approximate solution to the optimization problem.

Otherwise choose a subscript

i

kfor which hik (yk)

=

0 and define Kk+l

=

{'lIZE K k, "ilhik (yk)(z - yk) ~O}

and go to Step 1 using k

+

1 instead ofk. Under the mentioned assumptions the procedure is convergent in the sense that

lim h(zk)= minh(z).

1<:--+00 :rED

This method was published in

114]

and applied to solve probabilistic constrained programming problems in [9].

130

- ---- - - - - - ~ - - - ~

Stochastic Optimization Problems

5.5 Solution by a Variant ofthe General Reduced Gradient Method A variant of the GRG method [lJ suitably adapted to problem (5.1) where the stochastic constraint reduces to (5.2) and the other constraints are linear has been reported in

[4.J.

It differs from the GRG method primarily in the fonnulation of the direction finding problem. Here we generate always feasible solutions and thus we avoid the application of intermediate methods to return to the feasible set which is very important because our function values are noisy.

The problem to be solved is now fonnulated in the following form:

minimize h(x)

subject to ho(x)

=

P(Tx ~e) ~ p Ax=b

x~

o.

(5.14)

Concerning this problem the following assumptions are introduced:

- the random variable

e

has a continuous probability distribution with log.

concave density function,

- Vho(z) is Lipschitz-continuous and bounded inRn,

- there exists a feasiblexsuch thatho(z)

>

p,

- the m X n matrix A has rank equal to m and for every feasible x there exists a basisB such that Xj

>

0fori EIB and IB is the set of subscripts of the basis vectors.

We start from a feasible solution x to problem (5.14) and assume that a basis B of the columns ofA can be found which, for the sake of simplicity is assumed to consist of the first m columns ofA,with the property that when applying the partition A = (B, C) and the corresponding partition of x is z'

= (w',z')

then all components of ware strictly positive. We will have a direction finding problem and a setp length determination problem.

Diredion finding problem. First we fonnulate the following problem minimize y

subject to Vwh(x)u

+

Vzh(x)v ~ y

Vwho (x)u

+

Vzho(x)v

+

8y~0, ifho(z) = p,

Bu

+

Cv

=

0,

vj~o,ifzj=O, i=l, ...,n-m,lIvll~l.

(5.15)

Here 8

>

0 is a fixed number and the partitiont'

=

(u' ,Vi) corresponds to the partition ofZl

=

(Wi, Zl). Introducing the row vectors

r

=

Vzh(x) - V wh(x)B-1C,

B

=

V.h o(x) - VIDho(x)B-1C,

Probab~'listic Constrained Problems 131 which are called reduced gradients, problem (5.15) can be rewritten in the following manner

nuny rv $ y

,v+()y~O, ifho(z)=p,

vi~O,ifzi=O, i=I •...• n-m,

IIvll

$ 1.

(5.16)

Itcan easily be proved that the optimum value of (5.16) is equal to zero if and onlyifz is a Kuhn·Tucker point. Ifthis is not the case then the optimal value of problem (5.16) is negative andif

v·,

y. is an optimal solution of this problem furthermore'U.

=

-B-ICv· then

= ( : : )

is a feasible directi.on such that along this the functi.on h is strictly locally decreasing.

II the norm

IIvII

is chosen in the following manner

II

v

II =

max,

hi

then

problem (5.16) becomes a two row LP with individual lower resp. upper bounds which can easily be handled. Here we are able to take into account the inac-curacy in the evaluation of "i7ho ' The accuracy can be increased by taking a larger sample in the Monte Carlo evaluation. We remark that when updating the reduced gradients standard LP technique can be used.

Step length determination. Starting from the interval allowed by the non·

negativity restrictions we apply a linear search technique to lind a point for which the nonlinear restriction holds with equality. Then we minimize the ob-jective function on the line segment between z and this point. In this one dimensional optimization we optimize with respect to A i.e. we solve the prob·

lem

minh(z

+ At·) .

.\

Ifits optimal solution isA· then the new feasible solution will be

z(1)

= (W(l))

z(1)

= (w)

z

+A. (w.)

provided all components ofW are strictly positive. Otherwise by applying sub-sequent pivoting we find a basis B(1) with the property that the corresponding components ofz(1) are already strictly positive.

For the sake of simplicity, we did not include into the algorithm all techni-calities ensuring the convergence. The paper

(4)

already referred to gives a full description of these.

132 Stochast£c Oph'm~'zation Problems 5.6 Solution by a Primal-dual Type Algorithm

The problem to be solved has the following form:

IDlnlllll ze c' x Bubject to

F(y)

~p

Tx ~ y,Bx ~d,

(5.17)

where x E Rn and y E Rr We assume that the multivariate probability distribution function F is strictly logarithmically concave and has continuous gradient inRn We will shortly describe the method proposed in (3).

To this problem we assign a problem that we will call dual problem al-though it is not dual in the classical sense. This dual is the following:

T'u +B'v = c u ~

o,v

~0, maxI min u'y

+ v'dJ.

F(y)?p

(5.18)

The procedure works in the following manner. First we assume that a pair of vectors (u1, VI) is available for which

(U1

, vI) E V

= {u,

viT'

u +

B 'V

=

c,v~ O}.

Suppose that (uk,vk)has already been chosen, where uk ~O. Then the follow-ing steps have to be performed.

Step 1. Solve the problem

minimize yk' Y Bubject to

F(y)

~ p.

Let y(uk) denote the optimal solution to this problem. Then we solve the following direction finding problem

maximize [U'Y(Uk )

+

d'vJ subject to (u,v) E V.

Let (uk''liZ) be an optimal solution to this problem. Ifuk

=

pu kthen (uk'vk) is an optimal solution of the dual problem and the pair5:,y(uk) is an optimal solution of the primal problem where 5: is an optimal solution of the linear programming problem:

• • I

IDllllllllze cx

subject to Tx ~ y(uk),Bx~d.

Probabilistic Constrained Problems

Otherwise go to

Step 2. Find AIe(O

<

Ale

<

1) satisfying

'Uk'y

(1 ~IeV

ule

+ Uk) > u

le'y(

u

le)

+

vie'd.

Then we define

133

'U1e+1= AIeUIe

+

(1 - Ak)Uk'

vleTI = AIeV Ie

+

(1- AIe)Vk'

Ifthe procedure is infinite then it can be proved that the sequence (ule,

v

k) converges and the limiting pair has the same property as

(uk,vk)

in Step 1.

5. '1 The Polynomial Distribution

A special multivariate probability distribution has been introduced by the au-thor to approximate the distribution of

e.

This is defined on the unit square of the n-dimensional space by its probability distribution function as follows:

1

- -

,

( )

- "'''1 Olin

F ZI ••• ,Zn - "N C'Z.' .. • Zn

LJi=1 I

if

0<

Zi ~ l,i= 1, ... ,N, (5.19 )

(5.21 ) F( Z17'" , zn) is suitable defined otherwise. Here ail ~ 0,. -.,ain .:5 0, ail

+ _., +

ain

<

0, i = 1, ... ,N and Cl

>

0, ... ,CN

>

0; furthermore these are constants.

If a mathematical programming problem has the form of a geometric programming problem and in addition a probabilistic constraint of the type F(z) ~ p is included where F(z) is of the above type then the new problem is again a geometric programming problem for which methods of solution are available.

We will not consider the algorithmic solution of problems of this type in detail. Our purpose here is to show that under certain conditions the func-tion (5.19) will in fact be a probability distribufunc-tion funcfunc-tion. To illustrate the si.tuation we restrict ourselves to the case ofn= 2.

Theorem 5.'1.1. Ifthe following conditions holds:

all ~ al2 ~ .. -~ al n, an ~ a22 ~ '" ~ a2n,

then the function (5.19)isaprobability distribution function inthe unit square

0<

%1,Z2

<

1.

Proof. The only property that we need to show is that

a

2F(ZI,Z2)

- a

~0, If. 0

<

ZI,Z2

<

1.

ZI Z2

134 Stochastic Optimization Problems The other properties of a two-dimensional probability distribution are satisfied.

Introducing the notation:

N

'"' '"' "'0 "'02

L...J= L...JCi Zi,IZ2' ,

i=1

the functionF can be written asF = 1/

E .

By differentiating we obtain

82F(ZI, Z2) 2 8 E 8 E 1 82 E

8z I8z2 = E a 8zI 8z2 - E 28z18z2'

The requirement that this be non-negative is equivalent to the following in-equality:

2 8 E 8 E 82

- - - L - - > O

E 8zI 8z2 8zI8z2 - , or in a more detailed form:

N N

Multiplying by ZIZ2 on both sides in (5.22) we get the equivalent inequality

N

' " ' N

"'0 "'0 '"' '"

°1 '"°2

2L...Ji=1 Ci a ilZI,I z2 ,2L...J Cjaj2 Z/ Z2) ~ j=1

Let us introduce the notation:

\ c,oz"'il "'i2 I\i= I Z2

E

i= 1, ... ,N.

Then (5.23) is equivalent to

N N

Probabili,t-ic Con,trained Problem,

is the covariance of the two sequences

all,a12,···,alN an, an, . .• , a2N

135

where to the corresponding pairs we assign the probabilities

>'1' >'2," ., >'N,

respectively. Assumption (5.20) implies that the covariance (5.25) is nonpositive (as it can be seen very easily). Hence (5.24) holds true which is the same as (5.20) and the theorem is proved.

The following theorem is useful when considering probabilistic constraint of the form

F(Zl"",Zn)~P, O<zi:$l, i=l, ...

,n,

(5.26)

where 0

<

P

<

1 is a fixed probabilit.y.

Theorem 5.1.2. The function F(Zl,,,,,Zn) is logconcave

in

the unit cube 0< Zl"",Zn:$ 1.

Proof. A well· known theorem due to Arlin states that t.he sum of logconvex functions defined on the same convex set. is a logconvex function on the same set.

Sinceail :$0, ...,aiN :$0, i = 1, ... ,N, it follows that each term

ail ain

CiZl •••Zn

is a logconvex function in the unit cube, hence the same holds for their sum which is equal to

L.

Now F =

l/L

and this implies thatF is a logconcave function in the n·dimensional unit cube. This proves the theorem.

Theorem 5.7.2 shows that the set of n·tuples Zl,' •• ,Zn determined by the inequality (5.26) is a convex set for every fixed probabilityp.

5.8 Calculation of FUnction Values and Gradients

In this section we consider the problem how to compute the gradient of the function F(T~). It turns out that many special probability distributions allow the computation of the gradient of F(T~) as we illustrate it in two special cases which are: the multivariate normal distribution and a special type of multivariate gamma distribution.

Under suitable differentiability assumptions the following equality holds true in all cases:

8F(z) F(z )'-1 1')·-J..i IZ,")J.-(Zi), i=l, ...,1', - - = j, - , ••. " T

8Zi (5.27)

where /; is the probability density function of the random variable

ei'

Let us first consider the case of the multivariate normal distribution. It will be convenient to assume that the joint distribution of the variables

ei, ... ,er

is

nondegenerated, furthermore E(e;)

=

0,E(en = 1,i

=

1, ... ,1'. Then the joint

136 Stochastic Optim~'zationProblems probability distribution function is ~(ZjR) where R is the correlation matrix.

It is well· known that

8~(ZjR)

(Z" - T""z" )

----,0-.--"':""= ~ J J" • _ • •

8zj J1-T}' J-1, ...,TJ1=~jRj

\O(z.-)

(5.28)

where Ri is the

(T

-1)X

(T

-1) correlation matrix consisting of the correlations (5.29)

i,

k

=

1, ...

"i 1=

i,k

1=

ij

Tjlr - 'iiTlri

,"/c=

~~'

J y1-TJiy1-Tlri

and \0is the one· dimensional standard normal probability density function. It turns out that the gradient of~(zjR) can be computed in a similar way as the function value ~(T;R). The same subroutine can be used in the T- 1and T·dimensional cases, resp ectively.

The second example is the multivariate gamma distribution introduced in (8). Suppose that the random vector

e

has the form

e=A1] (5.30)

where Ais an Tx(2r-1) matrix the columns of which are the different nonzero vectors having 0,1 components and1]is a2r-l·dimensional random vector with independent, standard gamma distributed components (some of them may be equal to 0). Then the conditional probability distribution function in formula

(5.27) can be written in the form

p(e2

<

Z2,· .. ,er

<

Zrle1

= zd =

-p(e(l) e(2) e(l) e(2) Ie -

)-- 2

+

2

<

Z2,' .. , r

+

r

<

Zr 1 - Zl

-(1) (1)

( e2 (2) er (2)

Ie )

=PZ16 +e2 <Z2"",Z1 6 +er <Zr l=Zl =

(1) (1) spectively. Thus the conditional probability distribution function equals the unconditional probability distribution function of the sum Zl

P +

'I,where 'Ihas

an T - l·dimensional multigamma distribution of the same type that

e

has and

P

has similar structure but instead of partial sums of standard gamma variables now we use partial sums of components of a random vector having Dirichlet distribution. Moreover,

P

and 'I are independent.

Probabirish'c Constrained Problems

5.9 The use of Discrete Probability Distributions The following problem will be considered

. .. ,

InIllillllZe cx

137

(5.32) subject to F(z)?: P,

Tx ?: z,Bx?: b,

where F is the probability distribution function of the random vector

e.

If

e

has possible values Zl, ...,ZN such that allpositive values ofF are among F(Z1), ••• ,F(ZN ),then the above problem is equivalent to the following mixed variable problem

. .. ,

InIlliInlZe c x

subject to ylF(zd

+ ,., +

YN F(ZN) ?:P,

Yl

+ ... +

YN = lYl, ... ,YN ?: 0,

1

integers,

Tx ?:YIZI

+ ... +

YNZN Bx ?: b.

(5.33)

Taking a random vector uniformly distributed in the n-dimensional unit cube and discretizing it by a step length h which is chosen in such a way that

1- nh=p (5.34)

Vizvari

[151

proves that the number of lattice points satisfying the probabilistic constraint is equal to

C:)

which is a large number for a largen but small as compared to all lattice points (of distance h) in the unit cube, e.g. ifP

=

0.95 andm

=

5then h

=

0.01. The total number of lattice points is 5101whereas the number of those which satisfy

h bbil" . . I ~

t e pro a IStlC constramt IS on y 5J=ill'

Computational experiments show that handling problem(5.32) in the form of (5.33) provides us with satisfactory solution methodology ifn is not very large.

Another mixed variable formulation will be illustrated in the case when

e

is a two-dimensional random vector the possible values of which are nonnega·

tive lattice points with coordinates 5 N,M, respectively. The mixed variable reformulation of the problem is the following

minimize c'x

subject to PooYoo

+.,. +

PNOYNO

+

POIYOI

+.. ,+

PNIYNI

+...

POMYOM

+... +

PNMYNM ?: P,

Yoo

+.,. +

YNO = Zl,

Yoo

+

YOI

+ .. ,+

YOM = Z2,

138

These models can be used in connection with continuously distributed random vector

e

too when approximating its distribution by a discrete distribution. In the higher dimensional case, however, the number of0,1variables becomes too large.

Referenc:es

[lJ J. Abadie, and J. Carpent,ier, "Generalisation de la methode du gradient reduit de Wolfe au cas des contraintes non-lineaires, in: Proc. IFORS Conf., eds. D.B. Hertz and J.Melese, Wiley, New York, (1966) 1041-1053.

[2J

A.V. Fiacco and G.P. McCormick, Nonlinear programmz'ng: sequential un-con,trained minimization technique. Wiley, New York (1968).

[3]

E. Komaromi, "A dual approach to probabilistic constrained programming problem". Mathematical Programming Study. (forthcoming).

[4]

J. Mayer, "A nonlinear programming method for the solution of a sto-chastic programming model of A. Prekopa" . Survey of Mathematical Pro-gramming. North Holland Publishing Co., New York. Vol. 2,(1980)129-139.

[5J

A. Prekopa, "Eine Erweiterung der sog" ,Methode der Zulii"igen Richtun-gen. Math. Operation,forrchung und Stah',tik 5 (1974),281-293.

[6J

A. Prekopa, "Contributions to the theory of stochastic programming", Mathemah'cal Programming 4(1973),202-221.

[7]

A. Prekopa, l.Deak, J. Ganczer and K. Patyi, "The STABIL Stochastic programming model and its experimental application to the electrical en-ergy sector of the Hungarian economy" , in: Stocha,tic Programming, Pro-ceeding, of the International Conference on Stochash'c Programming, Ox-ford, England, edited by M.A.H. Dempster, Academic Press, London(1980), 369-385.

[8] A. Prekopa and T. Szantai, "A new multivariate gamma distribution and its fitting to empirical streamflow data", Water Resources Research 14 (1978),19-24.

[9] A. Prekopa and T. Szantai, "Flood control reservoir system design" , Math-ematical Programming Study 9, 138-151.

[10] A. Prekopa and P. Kelle, "Reliabili~type inventory models based on sto-chastic programming", Mathematical Programming Study 9(1978), 43-58.

[11] A. Prekopa, "Network planning using two-stage programming under un-certainty", in: Recent Results in Stochastic Programming (Proceedings of the International Conference on Stochastic Programming, Oberwolfach,

Probabili.tic Constrained Problem. 139 Germany, 1979). Lecture Notes in Economics and Mathematical Systems 179, Springer Verlag (1980),215-237.

[12J A. Pfl!kopa, "Logarithmic concave measures and related topics", in: Sto-chastic Programming. Ed. M.A.H. Dempster, Academic Press, London (1980),63-82.

[13J A. Prekopa and T. Szantai, "On optimal regulation of a storage level with application to the water level regulation of a lake" , Survey

0/

Mathematical Programming, North Holland Publishing Co., New York (1981), pp. 183-210.

[14J A.F. Veinott, "The supporting hyperplane method for unimodal program-ming", Operation. Re.earrh (1967),147-152.

[15] B. Vizari, "On the discretization of probabilistic constrained programming problems", (manuscript in Hungarian).

[16J G. Zoutendijk, Method. o//ea.ible direction., Elsevier Publishing Co., Am-sterdam and New York (1960).

CHAPTER 6

STOCHASTIC QUASIGRADIENT METHODS Yu. Ermoliev

As it follows from the brief discussion of the Chapter 1, the main purpose of the stochastic quasigradient (SQG) methods is the solution of optimization problems with a complex nature of objective functions and constraints. For the stochastic programming problems, SQG methods generalize the well·known stochastic approximation methods for unconstrained optimization of the expec·

tation of random function (see for instance Wasan [4.5]) to problems involving general constraints and nondifferentiable functions. For deterministic nonlinear programming problems SQG methods can be regarded as methods of random search (see for instance [4.2], [6'1], [68]).

The purpose of this chapter is a discussion of the main direction of devel·

opment of SQG procedures, their applications and an overview of ideas involved in the proofs. The contents of this chapter is close to that of the paper [69].

6.1 The General Idea

Consider the problem of minimization:

minimize ~O(x)

subject to ~i(x) ~ 0,i = 1:m,

xEX~Rn.

(6.1)

(6.2) (6.3) To start with, let us assume that the functions ~V(x),1I =

°: m

are convex.

Then for every x we have the inequality

~V(z)- ~V(x) ~

(F:(x),z - x),

Vz EX,

where ~: is a subgradient (generalized gradient). We denote as a~V(x) the whole set of sub gradients at x-the subgradient set. In stochastic quasigra·

dient methods the sequence of approximates xB, B= 0,1, '" is constructed by using statistic estimates of the ~V(x B)and~:(xB)-random numbers 17v( B) and vectors

e

v

(B)

which in average are close to the~v(xB) ,~:(xB). These quantities are constructed by using information about the past history of the optimiza·

tion process, generated by the path (xo, ... ,xB) and some other variables, for instance the Lagrangian multipliers. We denote this history asBB and for the sake of simplicity we usually assume that it is the (xo, ... ,xB) . Then for the 17v(B), eV(B) we have the conditional mathematical expectation

E{17v(B)!Xo, ..• ,xB} = ~V(XB)

+

av(B);

(6.4)

142 Stochastic Optim~'zationProblems

E{eV(s)lzO, ...,Z6} =

F;(Z6) + blJ(s),

(6.5)

where the numbers

av(s)

and the vectors

bV(s)

may depend on (zO, ...,Z6).

For exact convergence to an optimal solution, the values

a

v

(s), IW

(8)

II

must be small (in a certain sense) when s --.00. At some time we must have that

aV(s)

--'0,IW(s)ll--.0 directly or in such a way that

(6.6)

FV(z·) _FV(Z6)

~

(E{eVlzo, ...

,Z6},Z· - Z6)

+ 1v(s),

(6.7) where

1v(S)

--.0 as

s --.

00 and z· an optimal solution. The vector

e

6

(s)

is

called a stochastic quasi·gradient when bV(s)

¥=

0, or stochastic subgradient, stochastic generalized gradient (stochastic gradient for differentiable function

FV(z))

when

bV(s) ==

O.

Itturns out that for many important classes of optimization problems with functions

FV (z),

v= 0 : m of a complex structure it is much easier to generate statistic estimates 'Iv(s),

e

v(s) then to calculate exact values F

lJ

(Z6) and its

subgradients

F:

(Z6). For stochastic programming problems when

FV(z)=Er(z,w),

v=O:m

(6.8)

typically one can take

e

v(s)equal to a subgradient (gradient in the differentiable case) of

r(·,w)

at Z6

eV(s) =J:(Z6,W 6)

(6.9)

where w6 is an observation ofw, since usually with an appropriate definition of the subgradient·set, we have

aFV(z)

=

f ar(z,w)p(dw).

More generally

N6

eV(s)

=

~ L: J:(Z6,w 6k )

6 k=1

with a collection of independent samplesw6k, k= 1 :N6 , N6 ~ O. Similarly we can take

'1V(s)

=

r(z6,w 6)

or more generally

N6

'1v(s)

= ;

L:r(z6,w6k ),

6 k=l

since according to the definition of functions F

lJ (z) FV(Z6) =E{r(z6,w)l z6}.

(6.10)

We consider different special rules for computing

ev(8), '1v(8)

in Sections 6.7-6.13

Stochastic Quasigradient Methods 6.2 Methods for Convex Functions

143

6.2.1 The Projection Method

Suppose we have to minimize a convex continuous functionFO(x) in xE X C R", where X is a closed convex set such that a projection on X can easily be calculated:

?rx(Y)

= argmin{lly -

xll

2 : x E X}. For instance, if

X

is a hypercube a 5 x 5 b, then ""x(y)

=

max[a,min{x,b}]. Let be a set of optimal solutions. The method is defined by the relations:

xS +l=?rX[xS-P8eO(,)],,=0,1,... (6.11)

FO(x·) - FO(x8)

~

(E{eO(')lxo, ...

,XS},z· - XS)+1'0(')' (6.12) where Ps is the step size, 1'0(') may depend on (xo, ... ,XS),x· E X·. Let us notice, thatifvector

e

v(,) satisfies (6.5),then

1'0(,)

= -(bO(,),x· - x

8) . (6.13)

This method was proposed and studied in [1]-[3],

[51.

If

eO(,) =

F~(x8), we obtain the generalized gradient method which was suggested by Shor

[34.]

and was studied by Ermoliev

[351

and Poljak [36]. IfX

= R",

FO(x) =EjO(x,w),

eO(,) = t

jO(XS

+

~8ei,wsj) - jO(X

8

,W

8

0)

d,

j=1 ~s

then the method suggested by (6.11) corresponds to the well· known stochastic approximation methods which were developed by Robbins and Monro, Kiefer and Wolfowitz, Dvoretsky, Blum and others (see

[4.5]).

It was shown that under natural assumptions, that are also those of interest in practice, the sequence {XS} defined by (6.11),converges to a set of minimum points of the original problem with probability1. The proof of this fact is based on the notion of a stochastic quasi.Feyer sequence [3]. A sequence {Z8}~0is a Feyer sequence for a set Z CR", if[66]