that minimizes Fo(x) = f 10{x,w)P(dw), (1.43) we suppose that the other constraints have been incorporated in the definition

of the set X. We deal with a problem involving one expectation functional.

Whatever applies to this case also applies to the more general situation (1.39), making the appropriate adjustments to take into account the fact that the functions

F,·(x) =

f l;(x,w)P(dw),

i=^{1, ...}

,m,

determine constraints.

Given a problem of type (1.43) that does not fall in one of the nice cate-gories mentioned in Section 1.7, one solution strategy may be to replace it by an approximation*. There are two possibilities to simplify the integration that appears in the objective function, replace

10

by an integrand

10

or to replaceP by an approximationPIJ , and of course, one could approximate both quantities at once.

The possibiliW of finding an acceptable approximate of

10

that renders the calculation of

f lo(x,w)P(dw)

^=:

Fo(x),

sufficiently simple so that it can be carried out analytically or numerically at low-cost, is very much problem dependent. Typically one should search for a separable function of the type

It(x,w)

⁼

LlOj{x,Wj),

j=1

'" Another approach will be discussed in Section 1.9.

recall that 0 C Rq, so that

Stochastic Optimtzation Problems

Fo"(x)

= t / \OJ(x,wj)P(dw) = t / \OJ(x,Wj)Pj(dwj)

;=1 ;=1

where the

Pj

are the marginal measures associated to the i-th component of w. The multiple integral is then approximated by the sum of I-dimensional integrals for which a well-developed calculus is available, (as well as excellent quadrature subroutines). Let us observe that we do not necessarily have to find approximates that lead to I-dimensional integrals, it would be acceptable to end up with 2-dimensional integrals, even in some cases-when P is of certain specific types-with 3-dimensional integrals. In any case, this would mean that the structure of

10

is such t,hat the interactions between the various components of^W play only a very limited role in determining the cost associated to a pair

(x, w).

Otherwise an approximation of this type could very well throw us very far off base. We shall not pursue this question any further since they are best handled on a problem by problem basis. If

{fou,

=

I, ...} is a sequence of such functions converging, in some sense, too

I,

we would want to know ifthe solutions of

XV E argminr

= flo"("W)P(dw),

^v ⁼ ^{1, ...}

converge to the optimal solution of (1.43) andifso, at what rate. These ques-tions would be handled very much in the same way as when approximating the probability measure as will be discussed next.

Finding valid approximates for lois only possible in a limited number of cases while approximating P is always possible in the following sense. Suppose P_v is a probability measure (that approximates P),then

where now

IFo"(x) - Fo(x)1

~ / I/o(x,w)llPv - PI(dw),

FO'(x) ^{:= /}

10(x,w)Pv(dw).

(1.44)

Thus if

10

has Lipschitz properties, for example, then by choosingP_v sufficiently close to P we can guarantee a maximal error bound when replacing (1.43) by:

find xEX C R"

that minimizes Fo"(x)

= / 10(x,w)Pv(dw).

^(1.45) Since it is the multidimensional int,egration with respect to P that was the source of the main difficulties, the natural choice-although in a few concrete

Stochast~'c Programm£ng, An Introd'Uet~'on 25 cases there are other possibilities-for P_v is a discrete distribution that assigns to a finite number of points

WI ,W2

, ••• ,WL

the probabilities

PI,P2,· .. ,PL;

Problem (1.45) then becomes:

find xE Xc Rⁿ

that minimizes Fo"(x)=

LPe!o(x,we)

e=1

(1.46)

At first glance it may now appear that the optimization problem can be solved by any standard nonlinear programming, the sumE~^I involving only a "finite"

number of terms, the only question being how "approximate" is the solution of (1.46). However, if inequality (1.44) is used to design this approximation, to obtain a relatively sharp bound from (1.44), the number L of discrete points required may be so large that problem (1.46) is in no way any easier than our original problem (1.43). To fix the ideas, iff} C RIO,andP is a continuous dis-tribution, a good approximation-as guaranteed by (1.44)-m<\Y require having 10¹⁰~ L ~ 10^1I! This is jumping from the fire into the frying pan.

This clearly indicates a need for more sophisticated approximation schemes.

As background, we have the following convergence results. Supp ose

{P

v,1/= 1, ...} is a sequence of probability measures that converge in distribution toP, and suppose that for all

x

X,

the function

Jo(x,w)

is uniformly integrable with respect to all Pv ,and suppose there exists a bounded set D such that

Dnargmin[Ft(x)=

f Jo(x,w)Pv(dw)lXEX] #0

for almost all1/,then

inf_X Fo= _v--+oolim (inf_X F

o)

and

iJxv

^E argminFo,x= lim xv/;

X ^/;--+00

then

x E argminF_o•

The convergence result indicates that we are given a wide latitude in the choice ofthe approximating measures, the only real concern is to guarantee the conver-gence in distribution of the P_v to P,the uniform integrability condition being from a practical viewpoint a pure technicality.

26 Stochastic Optimization Problems However, such a result does not provide us with error bounds, but since we can choose the P_v in such a wide variety of ways, we could for example have P_v such that

and Pv+l such that

in(F,v

<

in(F.

X 0 - X 0 (1.47)

inC Fo ::::; inf

Fo"+l (1.48)

X X

providing us with upper and lower bounds for the infimum and consequently error bounds for the approximate solutions:

XV EargminFo", and x^v+¹E argminFo"+l.

X X

This, combined with a sequential procedure for redesigning the approximations P_v so as to improve the error bounds, is very attractive from a computational viewpoint since we may be able to get away with discrete measures that involve only a relatively small number of points (and this seems to be confirmed by computational experience).

The only question now is how to find these measures that guarantee (1.47) and (1.48). There are basically two approaches: the first one exploits the properties ofthe function

w

Jo(x,w)

so as to obtain inequalities when taking expectations, and the second one choosesPv in a class of probability measures that have characteristics similar toP but so that P_v dominates or is dominated by P and consequently yields the desired inequality (1.47) or (1.48). A typical example of this latter case is to choose P_v so that it majorizes or is majorized byP,another one is to choose P_v so that for at least for some

x

^EX:

Pv E argmax[!

Jo(x,w)Q(dw)IQ

ED] ^{(1.49 )}

Im Dokument Stochastic quasigradient methods. Numerical techniques for stochastic optimization (Seite 39-42)