• Keine Ergebnisse gefunden

Pi = Prob le(w) = ell·

NONLINEAR PROGRAMMING TECHNIQUES APPLIED TO STOCHASTIC PROGRAMS WITH RECOURSE

4.3.2 Multi-Point Linearization Methods Consider the problem

4.3.2.1 Applications to Recourse Problems

For recourse problems, particularly with the form (4.5) using tenders, the GLP approach looks very promising.

Using GLP to solve simple recourse problems has an early history. It was first suggested by Williams

[14],

in the context of computation of error bounds and also used at an early date by Beale [2]. Parikh

[51J

describes many algorithmic details. The method has also been implemented for specialized applications (e.g. see Ziemba

[18],

for an application to portfolio selection).

However, as a general computational technique in particular, for nonsimple recourse it has apparently not been studied until recently, see Nazareth and Wets [56J and Nazareth[51J.

The GLP met.hod applied to (4.5) yields the following master program:

K

Nonlinear Programming Tuhniq'UeB In order to complete the description of the algorithm it is necessary to specifyX,and if this is not a compact set, to extend the master program (4.14a) by introducing directions ofrecession (whose associated variables do not appear in the convexity row). In addition, a suitable set of starting tenders which span R!f' should be specified. As discussed in Nazareth

[51]

these considerations can be largely circumvented by using the equivalent form (4.6) and solving master programs of the form:

As discussed in more detail in Nazareth &, Wets [56], we expect the above algorithm to perform well because normally only a few tenders will have nonzero coefficients in the optimal solution and because one can expect to obtain a good set of starting tenders from the underlying recourse program.

Still at issue is how readily one can compute 'l'(X(kJ)and its subgradients at a given point Xk This in turn determines the ease with which one can solve the subproblem (4.14b) and obtain coefficients in the objective row of the master.

For C1 problems

'l'(X)

is separable and easy to specify explicitly (see Wets [12]). Algorithms have been given by Parikh

[5'1]

and Nazareth

[51].

A prac-tical implementation is given in Nazareth

[55]

where further details may be found.

For C2 problems (i.e. with complete recourse and a relatively small set of scenarios say, he,f

=

1, ... ,L with known probabilities fe,f = 1, ... ,L) one can solve the subproblem (4.14b) and compute 'l'(X(kJ) in one of two ways, as discussed in Nazareth

[51]:

(i) Formulate (4.14b) as a linear program which can be efficiently solved by Schur· Complement techniques, see Bisschop &,Meeraus [9], and Gill et a1.

[~.a4,]. The values 'l'(X(kJ) are a part ofthe solution of this linear program.

(ii) Use unconstrained nonsmooth optimization techniques, see Lemarechal

[4.0j,[4.1],

Kiwiel [35] and Shor [66]. Information needed by such meth·

ods is 'l'(X(kJ) and its subgradientg(X(kJ) and this can be computed by

106 Stochastic Optimization Problem,

solving a set of linear programs of the fonn:

1/;(X(k),he)= ~[qyIWy= he - X(k)

I.

y-Supp ose tr

e

are the optimal dual multipliers of the above problem. Then

L

\II(X(k))

= Lie1/;

(X(k),he) t=l

L g(X(k))

= L le7r e•

f=l

This can be earned out very efficiently using the dual simplex method coupled with techniques discussed by Wets in [731.

The method based upon outer linearization mentioned at the end of the previous section has been widely used to solve stochastic programs with recourse (see Van Slyke &:Wets [681, Wets [73] and Birge [6]). This is a particular fonn of Benders' decomposition [31 and it is well known that approaches based upon Benders' decomposition can solve a wider class of nonlinear convex programs than approaches based upon the Dantzig-Wolfe d('composition, see, for exam-ple, Lasdon [39]). We shall not however discuss this approach in any detail here because it is already studied, in depth, in the references just cited.

4,.3.2.2 Extensions

When \II(Xk) and its subgradients are difficult to compute, the GLP approach continues to appear very promising but many open questions remain that center on convergence.

Two broad approaches can be distinguished:

(i) Sampling: Stochastic estimates of\II"(X)and its subgradient can be obtained by sampling the distribution. An approach that uses samples of fixed size and carnes out the minimization of the Lagrangian subproblem (4.14b) using smoothing techniques is described by Nazareth

[51].

Methods for minimizing noisy functions suggested recently by Atkinson et al. [1] would also be useful in this context. With a fixed level of noise, convergence proofs can rely upon the results of Poljak [58].

Another variant is to use samples of progressively increasing size tied to the progress of the algorithm and to solve the Lagrangian subproblem us-ing stochastic quasi. gradient methods, see Ermoliev&:Gaivoronski

[18].

A particular algorithm (suggested jointly with A. Gaivoronski) is to replace

\II(X(k)) in (3.6) by some estimate\ll"s(X(k)) which is based upon a suitable sample size N. When no further progress is made, then this sample size is incremented by ANand the approximation refined for all X(k) in the current basis. There are, of course, many further details that must be spec-ified, but under appropriate assumptions convergence can be established.

Non/£near Programming Techn£que. 107 (ii) Approximate Distribution and Compute Bounds: At issue here is how to simultaneously combine approximation and optimization. For example, Birge [71 assumes that converging approximations 1lI~(X) and 1lI}((X) are available for K = 1,2, and replaces llI(X(k)) in (4.14a) by the upper bound llI~(X(k)),k= 1, ,K. In the subproblem (4.14b),if

llI~+I(X(K+l))+?T(K)X(K+l) ~(J(K)

then 1lI}(+I(X(K+I)) is computed. If, further, the above inequality is sat·

isfied using this lower bound in place of the upper bound, then X(k+l) is optimal. Otherwise t.he approximation is refined and the process contino ued. Approximation schemes for obtaining bounds rely on the properties of recourse problems, instead of purely on the distance between the given probability distribution and t.he approximating ones; this allows for se·

quential schemes that involve much fewer points as discussed by Kall &,

Stoyan

[31]

and Birge&, Wets

[81.

The interpretation of the optimal solution in N azaret.h

[511,

suggests the possibility of an alternative approach t.o approximation by an increasingly large number of points. It is shown that if AZ. and X(kj )

,i =

1, ... ,(m2

+

1) give

J

the optimal solution of (4.14a), t.hen the problem (4.5) is equivalent to the associated discretized problem obtained by replacing the distribution of h(w) by the distribution whose values are x(kj),i

=

1, ... ,(m2

+

1) with associated probabilities AZ., Note that LI~21+1AZ. = 1,At. ~ 0, so that these quantities

J - J ~)

do indeed define a probability distribution.

Let us conclude this section with a discussion of some other possibilities.

1. When the technology matrix is nonlinear, i.e. when T is replaced by a smooth nonlinear function, we have the possibility of a generalized program·

ming algorithm where themaster program itself is nonlinear. The question of convergence is open, Here an implementation based upon MINOS would be able to imm~diatelydraw upon the ability of this routine to solve programs with nonlinear constraints.

2. When some columns ofT are stochastic, the transformation discussed at the end of Section 4.2 can also be used within the context of the GLP algorithm to keep the degree of nonlinearity low. This time inner approximation of 1lI(X,z) would be carried out over the convex hull of(X(k),i(k)),k = l, ... ,K,

3. Generalized programming techniques appear to be useful for solving pro-grams with probabilistic constraints, for example, of the form:

minimize ex subject to Ax

=

b

Prob [wlTx

>

h(w)] ~ Q'

X ~O.

108 Stoeha.tie Oph'mization Problem.

where

With the usual definition of tendersTz = X and under the appropriate assump-tions on the distribution of h (,), we can express the above problem as:

minimize ez subject to Az= b

Tz - X= 0

g(X)

~ 0 z~o

where

g(X)

= a- Prob [wlh(w)

< xl

is a nonlinear function which is log· concave for a wide variety of distribution functions, in which case the set [xlg(x) :5

0]

is convex. In such a situation we can reformulate the constraint u(x) ~ 0,

as

XED= {Ylu(y) :5 o}

g(y) = a -

j p(dd~.

;<x

Here p(') denotes the density function of the random vector h(·). Assuming that we have already generated Xl, ... ,

x

K in D such that

{X= TzlAz= b,z ~

o} n

co{X1, ••• ,

x

K}

'1O,

we would be confronted at stepK with the master problem:

minimize ez

subject to uK: Az= b

K

7fK :Tz -

L

AjXj = 0

j==l

OK:LAj = 1

z ~ 0,Aj ~ 0, i= 1, ... ,K

where (uK,7fK,OK) represent the dual variables associated with the optimal solution (X K ,AK) of this master problem. The next tender XK+1 is obtained by solving the Lagrangian subproblem, involving only X:

minimize [7fK

xix

E

D]

and this XK+1is introduced in the master problem unless

7fKX>

-

OK

,

Nonlinear Programming Technique,

in which case zK is an optimal solution of the master problem. To find

x

K+1 Eargmin[?rKxl

J

p(ddr

~ a]

'<x

we consider the Lagrangian function

i(X,P)=7rKX

+

P

(j

p(ddr

-a),p~

0

'<x

and the dual problem

maximize h(P),P~0, where

h(P) = inI[i(x,PlIx].

109

The functionh is an

dimensional concave function, its (generalized) derivative is a monotone increasing function, and, moreover, under strict log-concavity of the probability measure, its maximum is attained at a unique point. To search for the optimal

p

K we can use a secant method for finding the zero of a monotone function. We have that for fixed P,

x(P)

=

argmini(x,p)

is obtained by solving the following system of equations:

-1ff/p= {

p(rll .. ·,ri-l" .. ,Xi,ri+l, ...

,rm~)dr,i=l,

...

,m~.

J{'Ir<xlrllrf.i }

IIp is simple enough, or if it does not depend on too many variables then this system can be solved by a quasi-Newton procedure that avoids multidimensional integration.

This application to chance-constrained stochastic linear programming is an open area and certainly deserves further investigation.

4. It is also worth pointing out that generalized programming methods have been recently applied to the study of problems with partially known distribution functions (incomplete inIormation), see Ermoliev et al.

[17']

and Gaivoronski [:a:a] •

110 Stocha.lt'c Optimization Problem.

4.4 Variable Reduction (Partitioning) Methods

Methods in this group seek to restrict t.he search region to one defined by a su bset of the variables and carry out one or more iterations of a gradient (or subgradient) based search procedure, The search region is then revised and the process continued, We can make a distinct.ion between 'homogeneous' and 'global' methods (using the terminology of Lemarechal[42]). Homogeneous or active set methods, in the linearly constrained case, restrict the region of search to an alline subspace within which unconstrained minimization techniques can be used. We shall concentrate on the reduced gradient formulation of Murtagh

&, Saunders

[491,[5°1

as implemented in MINOS and seek extensions of an approach which has proved effective for large smooth problems. However the fact that extension is necessary, in contrast to the methods of the previous section, and t.he fact that there are theoretical issues of convergence that remain to be settled mean that such methods are still very much in the development stage.

Global methods treat all constraints simultaneously and define direction finding subproblems which usually involve minimization subject to inequality constraints (often just simple bound constraints). Convergence issues are more easily settled here. We shall consider some methods of this type.

We also include here approaches where the partition of variables is more directly determined by the problem structure, in particular the grouping into linear and nonlinear variables.

Consider first the problem defined by minimize

!(x)

subject to Ax= b

x~o

(4.16)

where, initially,

!(x)

is assumed to be smooth.

The variables at each cycle of the Murtagh and Saunders

[491

reduced gradient method are partitioned into three groups, (xB,xs,XN ) representingm basic variables, Bsuperbasic variables, and nb= n - m - Bnonbasic variables respectively. Non-basics are at their bound. A is partitioned as

[BISINI

where B is an mX m nonsingular matrix, S is an m X Bmatrix, and N is an m X nb matrix. Letg=

\7!(x)

be similarly partitioned as (gB,gS,gN).

Each cycle of the method can be viewed as being roughly equivalent to:

RGI one or more iterations of a quasi-Newton method on an unconstrained optimization problem of dimensionBdetermined by the active set Ax= b, xN = 0. Here a reduced gradient is computed as

I-'

=

gS - (g~B-l)S=

[-(B-1S)TjI..

x ..

lolu =

Zig, (4.17) The columns ofZs span the space in which the quasi.Newton search direction lies, and this is given byp= -ZsH

zl

gwhereH is an inverse Hessian approx-imation obtained by quasi-Newton update methods and defines the variable

Nonlinear Programming Techniques 111 metric, e.g. H = 1gives the usual projected gradient direction. Along pa line search is usually performed. (Note that in actual computation H would not be computed. Instead we would work with approximations to the Hessian and solve systems ollinear equations to compute the search directionp.)

RG2 an iterationolthe revised simplex method on a linear program of dimen·

sion m X nb. Here components of the reduced gradient (Lagrange multipliers) corresponding to the llonbasic components are computed by

A

= gN - (g~B-1)N A=

I-(B-

l

N)TjOj1nbxnbjg

= Z'&g.

(4.18) (4.19) This is completely analogous to the computation ofII- in (4.17) above. The difference is in the way that Ais used, namely to revise the active set. In each case above pricesJr can be computed byJr = g~B-1 and II-and Acomputed as II-= gS - JrT5, A= gN - JrT

N

(4.20)

(4.22) (It is worth noting that theconvex simplex method is a special case of the above where (RG 1) is omitted and (RG2) is replaced by a coordinate line search along a single coordinate direction in the reduced space given by

(ZN h·,

say, for which

>'40

<

O. When there are nonlinear constraints present the above method can

also be suitably generalized.)

In the nonsmooth case we can proceed along three main directions:

1.Compute II- and

>.

in plan'olthe above by

II-= zl{argminlgT

(ZszJ)glg

E

Bf(x)]}

>.

=

Z.HargminlgT(ZNZ'J)glg

E

Bf(x)]}

where

Bf(x)

is the subdifferential of

f(x)

at x. In effect we are computing steepest descent directions in the appropriate subspaces. Note that it is, in general, not correct to first compute a steepest descent direction

g

from

g

= argminlgTgigE

Bf(x)]

and then reduce

g

to give

Z

T-II- =

sg

A

=

Z'Jg.

The reason lor this is that the operations of minimization and projection are not interchangeable. However this approach does make it possible to restore use of the Jr vector and therefore yields useful heuristic methods, as we shall see in the next section. In order to ensure convergence, it is necessary to replace

Bf(x)

by

B.f(x)-the

(·subdifferential (except in special circumstances e.g. when

f(x)

is polyhedral and line searches are exact). This is useful from a theoretical standpoint. However, from the point of view of computation it

112 Stochastic Optimization Problems is usually impractical to use the subdifferential, let alone t,he t;:-subdifferential

(except again in rather special circumstances). One such instance is when the sub differential is defined by a small set of vectors, say,UI,.' .,

UN.

Then (4.21) leads to the problem:

minimize

uTZsHzlu

N

subject to

U = L A;(Z}U,.)

i=1

L

N Ai

=

1

i=1

Ai ~O.

(4.23)

If

is its solution, then p,

= ZI U·,

with a similar computation for

ZR'

We also have p

= -ZsHzIu·.

2. Utilize bundle methods in which the subdifferential is replaced by an approx-imation composed from subgradients obtained at a numb er of prior iterations.

For the unconstrained case algorithms are given by Lemarechal [4.0],[4.1] and an implementable version is given by Kiwiel [35]. An extension of [4.0] to han-dle linear constraints in the reduced gradient setting is given by Lemarechal et al. [4.5]. However, as the authors point out theoretical issues of convergence remain to be settled in the latter case.

3. Utilize nonmonotonic methods (see, for example, Shor [66]) which require only a single subgradient at each iteration. In effect nonmonotonic iterations will be carried out in subspaces (see RGt and RG2 above) determined by

Zs

and

ZN,

using reduced subgradients

ziu

and

ZRU.

Again convergence issues remain open.

Line searches suitable for use in the above cases (1) and (2) are given by Mimin [4.8] and Lemarechal [4.3].

The reduced gradient method as formulated above benefits from additional structure in objective and constraints, in particular the partition between vari-ables that occur linearly and varivari-ables that occur nonlinearly. We shall see instances of this in the discussion of recourse problems. In particular, it is easy to show that when f(x) is replaced by cx

+

'II(X), an optimal solution exists for which the number of superbasics does not exceed the number of nonlinear variablesX.

Instead of obtaining an active set from XN = 0, another approach which gives a 'global' method is to reduce the gradient or subgradient only through the equality constraints Ax

=

b(these are always active) and define reduced problems to find the search direction involving bound constraints on the XN

variables. This is discussed in Bihain [5]. (See also Strodiot et al. [61J.)

Reduced gradient methods, as discussed above, benefit from the partition of the problem into linear and nonlinear variables, but they do not explicitly

Nonlinear Programmt'ng Techniques 113 utilize it. It is however possible to take more immediate advantage of this partition. Possible approaches are given, for example, by Rosen [62] and by Ermoliev

[15].

Consider the problem

minimize cx

+

F(y) subject to Ax

+

By= b

x~O,y ~0

Ifthe nonlinear variables yare fixed at certain values we obtain a simpler problem, in this case a linear program (which may have further structure, for example, when A is block.diagonal). The optimal dual multipliers :IT. of this linear program (assumed feasible), can then be used to define a reduced sub·

problem, for example, F(y) - (:IT.)TBy,y ~ O. This is then solved to revise the current values of y, for example, by computing a reduced subgradient by g - :IT·B,gE F(y) and carrying out (nonmonotonic) iterations in the positive orthant of the yvariables (see Ermoliev

[15]).

An alternative approach is given by Rosen [62].

4..4..1 Appli~ations to Recourse Problems

Since the number of nonlinear variables X in (4.5) is usually small relative to the number of linear variables, the reduced gradient approach outlined above is a natural choice. When w(X) is smooth (and the gradient is computable) the reduced gradient method can be used directly. In the form of the convex simplex method, which is a special case of the reduced gradient method, it has been suggested for the simple recourse problem by Wets [69] and Ziemba[11].

Wets [11] extends the convex simplex method to solve problems with simple recourse when the objective is nonsmooth.

For C1 problems 8wj(X;)

=

[Vj-,'11.71(see Nazareth & Wets [56]). The computation ofJ1- and>. in (4.21) thus requires that we solve bound constrained quadratic programs. We can utilize structure in the basis matrix in defining these quadratic programs. Since the X variables are unrestricted, they can be assumed to be always in the basis. A basis matrix will thus have the form

B =

(~ ~)

(4.24a)

and its inverse (never, of course, computed directly) will therefore be given by B-1

= ( D-1 0)

- E D - I I ' (4.24b)

LetgB

=

(CB,gx) where CB are coefficients of the objective row corresponding to the x variables in the basis and gx is a subgradient of W(X) at the current value of

x.

Also, since superbasics and nonbasics are always drawn from c, we

114 StochaBtic Optim~'zation ProblemB shall use Cs and CN in place ofUs andUN· Thus we defineU= (CB,UX,CS,CN).

The quadratic programs (4.21) then takes the form IDlIDIDlze uTZszIu

subject to

vi

~ (Ux)" ~

v,-!,

i=1, •••

,m2

(4.25)

where Uis defined above,

zl

= (-(B-1S)TIIsxs!O) withB defined by (4.24a).

Note that usually Ux will have relatively few components. A similar bound constrained quadratic program can be defined for

Z'J.

Both can be solved very efficiently using a routine like QPSOL, see [~5]. The above approach also requires a line search and an efficient one based upon a specialized version of generalized upper bounding, is given in Nazareth [54.]. An implementation could thus be based upon MINOS, QPSOL and this line search.

It is possible to avoid tht' use of quadratic programming by using a heuristic technique in which a steepest-descent direction is first computed as the solution of the expression preceding (4.22). This is given by:

minimize subject to

UxuxT

v -:-I

<

- (g ).x' -

< .,

Vi ,+ i = 1, ...

,m2

(4.26)

The solutionUx is given explicitly by:

{ v.

(Ux ).. =

O~

,

if v,-:-

>

0 if0E

[v,-:-, v,+]

if

v,-! <

0

(4.27)

Projected quantities

Ziu

and

Z'/;g

can then be computed withUdefined anal-ogously to g (just before expression (4.27)). This and use of the line searchin Nazareth [54.] suggests a very convenientheuristic extension of MINOS. Even the construction of a specialized line search can be avoided by utilizing line search methods designed for smooth problems (again heurist,ic in this context) as discussed by Lemarechal[4.4.].

For C2 problems, computingII-and>" by (4.21) again requires that we solve the following special structurt'd quadratic program (Nazareth& Wets [56]):

findUE

Rf

such that Ilull~is minimized such that

gx

=I:£=

ILIff,. and ?J"€W ~q,?J"€(h€- X) ~ lII(X,h€),£=1, ...,L wherehl and

Ie

define the probability distribution of the scenarios, asinSection 4.3.2.1. M defines the metric and for different choices, the objective takes the formUTg (or equivalently, in this case,gIgx ), gT ZsZI gorgT ZNZ'/;g. Again

Nonh'near Programming Techniques 115 special purpose techniques can be devised to solve such problems. It is how-ever often impractical to consider use of the above steepest descent approach because only III(X) and a subgradient are available. In this case an algorithm would have to be designed around bundle techniques or nonmonotonic opti-mization as discussed in Section 4.4, items (2) and (3) (after expression (4.23)), using reduced subgradients given by ZIg and Z'J g, with g and other quantities defined as in the paragraph preceding (4.25) In this case an implementation could be based upon a routine for minimizing nonsmooth functions, see Bihain

[5].

In the above methods the X variables would normally always be in the basis, since they have no bowlds on their value. This means that there are

In the above methods the X variables would normally always be in the basis, since they have no bowlds on their value. This means that there are