Algorithms for Stochastic Programs: The Case for Nonstochastic Tenders

(1)

AJGORZTHMS FOR STOCHASTIC PROGRAMS:

THE

CASE OF NONSTOCHASTIC TENDERS

Larry Nazareth Roger J.-B. Wets

January 1983 WP-83-5

Working Papers are interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of the Institute or of its National Member Organizations.

INTERNATIONAL INSTITUTE FOR

APPLIED

SYSTEMS ANALYSIS 2361 Laxenburg, Austria

(2)

ABSTRACT

We consider solution strategies for stochastic programs whose deterministic equivalent programs take on the form: Find z E R ~ ,

x

^ER m such that z

r

0 , A z = b , Tz =

x

^and^z

⁼

^{c z}

⁺

\k(x) is minimized.

We suggest algorithms based upon (i) extensions of the revised simplex method, (ii) inner approximations (generalized programming technique s), (iii) outer approximations (min-max strategies). We briefly dis- cuss implementation and associated software considerations.

(3)

ALGORITHMS FOR STOCHASTIC PROGRAMS:

THE

CASE OF NONSTOCHASTIC TENDERS

Larry Nazareth and Roger J.-B. Wets

1. INTRODUCTION

We report on some approaches to solving certain classes of stochastic programming problems. The main purpose is to provide a basis for discussion with a view to identifying promising algorithms that could eventually be realized as software.

The subclass of stochastic programs (with recourse) that we have in mind, and to which we refer as having nonstochastic tenders, arise as models for the following decision process. An (optimal) decision vector x must be selected when some of the parameters of the problem are only known in probability, i.e. only in a statistical sense, the actual cost depending in part on how well a transformation of x,

x

= Tz matches a random demand or recourse vector p.

(4)

We think of

x

^{as a}t e n d e r , n o n s t o c h a s t i c if the transformation T does not depend on the (unknown) values of the random parameters. For example, stochastic programs with simple recourse and fixed technology matrix are of t h s type. As we shall see in Section 2, for stochastic (linear) programs, the equivalent deterministic program can then be expressed as:

(1.1) Find Z E R ~ , xcRm such that A Z = ~ , T Z

= x , z ~ o ,

and z

=

cz

+

'k(x) is minimized

rhe algorithms that we analyze could be viewed as procedures for convex programs of the type (1.1) that seek to take advantage of the special structure, and to some extent that view is certainly correct. In fact we e ~ p e c t that the suggested techniques will also be efficient whenever nonlinear optimization problems can be cast in the form (1.1). However, because stochastic programming problems present computational chal- lenges of their own, it is their specific properties that are always in the background of our solution strategies. For example, our title is intended to s ~ g e s t that the major task of the solution procedure is the finding of optimal or nearly optimal tenders.

In Section 2, we review briefly the properties of stochastic programs that will be used in the design of algorithmic procedures. In Section 3, we examine the issue of what information can be made available and its cost, and we also exhbit some important special cases when the objective and the underlying distribution functions are such, that the equivalent deterministic programs can be conveniently and inexpensively specified. We

(5)

then turn to the three main methods that we consider here. They are based upon

(i) extensions of the revised simplex method,

(ii) inner approximations (generalized programming techniques), (iii) outer approximations (min-max strategies).

In order to give the essence of each solution strategy, we consider first, in Section 4, a very simple case, viz,, equivalent linear programming formu- lations for finding the minimum of a convex piecewise linear function of one variable. In Sections 5 , 6 and 7 we go into each approach as it applies to our class of stochastic programming problems. Each section is organ- ized along similar lines as follows: the case of simple recourse is considered in detail, extensions to problems with complete recourse are briefly outlined and finally some comments are made related to software choices and implementation.

2. STOCXASHC PROGRAMS WITH RECOUEE: NONSTOCHASTIC TENDERS

We consider stochastic (linear) programs of the type (2.1) find z ^ERnl such that

A z = b , 2 2 0

and z = E i c ( w ) z

+

Q(z ,w){ is minimized ,

where Q is calculated by finding for given decision z and event zu, an optimal recourse y E Rn2, viz.

(6)

Here ~ ( m ~ x n ~ ) , T(rnzxnl), w(rnzxn2) and b ( m l ) are given (fixed) matrices, c ( . ) ( n l ) and p (,)(m2) are random vectors, y ++ q

(Y

;):R n 2e R is a random finite-valued convex function and C is a convex polyhedral subset of Rn2, usually C = R + ~ . n Because W is nonstochastic one refers to (2.1) as having fixed _recourse. Tenders are nonstochastic because T is fixed. (Strictly speaking nonstochastic tenders allow for the possibility of having W random. However because of the computational intractability of that case, it will not be considered here.) With

c

=

E ~ C ( W ) ] and B(z) = E ! Q ( ~ , W ) ] we obtain the equivalent deterministic form of (2.1)

(2.3) find Z E R ~ ' such that AX=^ , 2 2 0

and z = cz

+

Q ( z ) is minimized

We assume that the random elements of the problem are such that all quantities introduced are well-defined, with Q(z) finite, unless

Prob.fw ) ( p ( w )

-

Tz) E W(C)J

>

0

where W(C) = f t

=

W y ( y E Cj, i.e. there is no feasible recourse with positive probability, in whch case Q ( z ) =

+=.

Detailed conditions have been made explicit in [ I ] ; extensions to the multistage case have been provided by P. Olsen [2], consult also [3] for some results in the nonconvex case.

As background to the algorithmic development, we review the basic properties of (2.3), proofs and further details can be found in [I]; see also [4] for a compact treatment for stochastic programs with complete

(7)

r e c o u r s e , i.e. when W ( C )

=

R m 2 and thus Q is everywhere finite

2 . 4 . PROPERTIES. The f u n c t i o n Q .Is l o w e r s e m i c o n t i n u o u s a n d c o n v e z . I t is L i p s c h i t z i f f o r ( a l m o s t ) a l l w , y w q ( y , w ) is L i p s c h i t z . Also t h e s e t

is a c o n v e z p o l y h e d r o n t h a t c a n b e e z p r e s s e d a s

K 2 =

fz IDz 1 d l

f o r s o m e m a t r i x D a n d v e c t o r d . Moreover i f t h e d i s t r i b u t i o n of t h e r a n - d o m e l e m e n t s of t h e p r o b l e m is a b s o l u t e l y c o n t i n u o u s t h e n Q is d i f f e r e n - t i a b l e r e l a t i v e t o

K2.

Because q ( . , w ) is Lipschitz rather than linear, the assertion about Q being Lipschitz does not follow directly from Theorem 7.7 of [ I ] but can be gathered from its proof, or see [ 5 ] , for example.

In the case of nonstochastic tenders it is useful to consider another representation of the deterministic equivalent program. Let

and

Problem (2.3) is then cast in the form ( 1 . 1 ) : ( 2 . 5 ) find z E

R ~ '

,

x

E R m 2 such that

A z = b , T z = x , z S O

and z

=

^cz

+ +(x)

is minimized.

(8)

This program, more exactly the function +, exhibits the same properties as those listed for Q under Properties 2.4. In particular it is finite for all

x

such that

x

= Tz and z E K 2 . Including these constraints explicitly in the formulation of the problem, we get

(2.6) find z E Rn' ,

x

^E^{R m 2}^{such that}

z

=

cz

+

+(x) is minimized, and

Ax = b ,

Dz S d ⁰

B - x = o ,

z 2 0,

i.e. a convex program with

+

finite on the feasible region. In what follows we shall assume that the constraints Dz 2 d have been incorporated in the constraints Az

=

b , z 2 0, SO that they will no longer appear explicitly, and that

+

is finite on ^.

Stochastic programs of this type are said to have relatively complete recourse [ I , Section 61, a situation which is always obtained if the (induced) constraints, determining K2, are incorporated in the original constraints.

When W = I and C

=

R n Z , there is really no need to solve an optimization problem to know the optimal recourse and its associated cost. It is uniquely determined by the relation

and

(9)

The stochastic program is then said to be with s i m p l e recourse, which clearly implies complete recourse: Kz

=

Rnl. Determining the value of \k a t

x

depends then on our capability of performing the multidimensional integration. Usually, the cost-function will be separable. However, if there is dependence between some of the components of the p (.)-vector and the cost depends on the joint realizations, then one must necessarily resort to this more general form. Assuming that the integral is well- defined, we have that the subdifferential of \k is given by

where a,q ( . , w ) denotes the subdifferential with respect t o the first variable. It is easy to see that if the convex function y c ~ q ( y , w ) is differentiable, t h e n so is \k. The function .k is also differentiable if the measure is absolutely continuous. If the random variables a r e independent, then the multidimensional integration to obtain the value of \k or its gradient is reduced t o a number of simple integrals on R1. This also occurs when there is separability.

If in addition to simple recourse, the recourse costs are separable , i.e, for all w

then

(10)

Thus ( 2 . 5 ) becomes a convex separable program:

( 2 . 7 ) find z ^E

R ~ '

,

x

^ER~~ such that

me

and c z

+ C

^'4ji^{( x i )}is minimized .

C = l

This latter optimization problem possesses many properties. Those that are directly relevant to our further development are summarized here below.

2.8 PROPERTIES. For i

=

1, ..., m2, t h e f u n c t i o n s \ki a r e c o n v e z , f i n i t e - v a l u e d a n d thus c o n t i n u o u s . If t h e r a n d o m e l e m e n t s h a v e a discrete dis- t T i b u t h , t h e \ki a r e piecewise l i n e a r w h e n t h e q i ( . , w ) a r e piecewise l i n e a r . O n t h e o t h e r h a n d , i f t h e m a r g i n a k of {

(ai

( . , w ) , pi ( w ) ) , i

=

1

,...

,m2j a r e a b s o l u t e l y c o n t i n u o u s t h e n t h e \ki a r e d i f f e r e n t i a b l e . Moreover, i f p r o b l e m ( 2 . 7 ) is solvable it a d m i t s a n o p t i m a l s o l u t i o n with n o m o r e t h a n m l

+

m2 p o s i t i v e e n t r i e s in t h e z- v e c t o r .

These properties are derived in [ 6 ] , (see also [ ? I ) , except the last assertion which was obtained by Murty

[a]

in a somewhat modified context; a very simple proof appears in [ 9 ] .

(11)

A version of (2.7) w h c h has received a lot of attention, because of its direct amenability to efficient computational schemes and the many applications that can be cast in this form, is when qi is itself independent of w and piecewise linear with respect to y . More precisely qi(.) is given by

with qi

=

^qi+

+

^qi-²^0,yielding the convexity of yi ^I+ qi ( 7,w ~ ). ~ In this case t h e function \ki takes on a form particularly easy to describe. This is done in the next section.

3. AVAlLABILlTY OF INFORMATION ABOUT THE OI3JECM

The exact evaluation of Q or its gradient for general probability distribution p , function q and recourse matrix W , might be prohibitively expensive, if at all possible. The difficulties come from two directions:

(i) for each w , having to evaluate Q(z ,w ) which involves solving a minimization problem, and

(ii) having to perform the multidimensional integration

For simple recourse, the evaluation of Q (z , w ), or equivalently +(x,w), because T is fixed, is easy since the recourse is uniquely determined.

When the recourse costs are also separable, the multidimensional integration is reduced to m2 separate 1-&mensional integrals. With T fixed, it takes the form:

(12)

where Fi is the marginal distribution function of the random elements appearing in this expression, and the integral

f

is a Lebesque-Stieltjes integral. The subgra&ents of the convex function 4? are then the (Carte- sian) product of the subgradients of the 41i which are themselves

a t least when the problem satisfies the regularity conditions suggested a t the beginning of Section 2. In general a U q (pi(w) -

xi

^,w ) is multivalued, in fact closed convex valued, and the integral is then also a closed convex set, In particular if q is piecewise linear as in (2.9), we get the following expression:

where qi

=

qi+ + ^qi-,

Fi-(z)

=

Prob. [pi (w )

<

z ] ,

and

We just note in passing that t h s implies that +i is differentiable whenever the distribution function Fi is continuous. In general, we have the two following representations for .ki

(13)

-

^{1 1}

-

where

pi = ~b~ (.)I.

If the distribution of p i ( . ) is discrete, say with possible values

withpd

<

p i , r + l , and with associated probabilities

- 1

the function \ki is piecewise linear. With = 0, we have

1 =O

r'- 1

pi-(Xi)

= C

f u where 7' = rnin [ki , inf(t

1

pit _xi

)I

1 =1

and

p i ( X i ) = r-1

C

^{f i r} where T = min [ k i , inf ( t lpit > x i ) ] . 1 = 1

Also

and

Note that T' = T unless

xi

⁼^pit ^{for some}t

=

1, ..., ki and then ^T'= 7

-

^1.

For I = 0,

...,k,

we set

and

1

eii = q i + ~ i

-

qi (

C

^Pit^{f i t}⁾

t = 1

(14)

We thus get

Observe that for any value of

xi

the supremum is attained by a t most 2 linear forms. As we shall see in the subsequent sections, both (3.3) and (3.4) yield useful representations for qi when developing algorithmic procedures for problems involving functions of this type. Still another representation of \ki can be exploited in an algorithmic context. Here the value of \ki is obtained as the solution of an optimization problem parametrized by

x i .

^Let

Q

=

p i t i + , - p i l ^, for L

=

1 ,... ,ki-l and

Then

4

subject to

C

^yd

=

^{x i} ,

1 =o

To verify (3.5) it suffices to use the fact that the coefficients sil , L = O , . . . , k i a r e strictly increasing and that consequently yil

>

0 only if

(15)

yio

=

dio and for 0

<

t

<

1 , all y& are a t their upper bounds. Details are worked out in [ l o , Proposition 11.

These expressions derived for qi taken in conjunction with the methods of Section 4 contain the germ of different algorithmic procedures embedded in them.

Before we turn to t h s , and in order not to lose sight of the fact that we a r e also interested in a more general class of problems, not simply stochastic programs with simple recourse with piecewise linear separable cost structure, we also describe a more general case. Suppose

and

x

is such that +(x,w) is finite for all possible p ( w ) , i.e. the linear program defining +(x,w) is feasible and bounded with probability 1. Then parametric analysis, in particular the Basis Decomposition Theorem [ I ] , shows that there is a (simplicial) decomposition of the sample space of p (.) (= t h e activity space),

such t h a t if p (w ) E Sh then

and with ^codenoting the convex hull,

where W ( h ) is a n invertible submatrix of W and q ( h ) the subvector of q corresponding to the columns of W ( h ) . Let S' be any partition generated by S. Then

(16)

and

adding to the second term the normal cone to

Kz

at

x

if

x

is on the boun- dary of

K2.

(We really only need the above, the rest being taken care of through the constraints.) The potential use of the preceding formulas depends very much on how accurate one needs to be. Multidimensional integration over convex polyhedral cones can really only be approached through sampling methods, cf. [I 1],[12] and the references given therein.

If p (,) is discretely distributed by which we mean here that it takes on a finite number of possible values, with

then the above formulas become simply sums, viz.

R-' (pk - ~ ) f k (3.8) * ( x ) = C

C

^{q(h) (h)}

h Ib I P ~ E Sh'l

and a similar expression for

a+.

To actually compute the above we can proceed via a sort of parametric analysis that we now describe. We refer to it as a bunching procedure. Let

Clc=prc ^{- x , k}= I , ...,

N

, and suppose we have solved the linear program

(3.9) find y r 0 such that W y = and q y is minimized,

(17)

with optimal basis W ( l ) and associated subvector g ( l ) of g .

s ; ,

bunch 1.

is defined by

s; =

^{b k}

1 w(;j Ck '

^O1

While constructing this set, identify those C k g S ; such that the vector W C ; ~ ^{c k}has the fewest number (and smallest) negative elements. Let

Cke

be such a vector, We find the optimal solution and a corresponding basis W ( z ) of the linear program (3.9) with

CkB

replacing

C 1 ,

by dual simplex pivoting, starting with the old basis W(l). The second bunch S; is given by

S ,

=

(pk L S ; (

~ @ j Ck

2 Oj .

We continue in this fashion until all pk have been bunched. Alternative procedures can be devised taking further advantage of the combinatorial structure of decompositions, how to do this so as to minimize the work involved needs further investigation. In any case with the above we obtain the value of 4j at X, as well as a subgradient of 4j at X , viz.

z

^{( Y}( h ) ~ & \ ) f k a4j(x) .

Ik I P ~ E $1

Observe that qh

w&\.

the vector of simplex multipliers, remains constant on

s;.

With

the above becomes

z

^{Y ( h )}

^w(%\

^ph ^E'^a*(x)

h

This formula for a subgradient of \k is, in fact, independent of the form of

(18)

the distribution of p (.), the problem being always the evaluations of Ph for a partitioning scheme constructed in the manner described above.

4. AN ILLUSTRATION OF EACH ALGORITHMIC APPROACH

We consider the very simple problem of finding the unconstrained minimum of a 1-dimensional finite piecewise linear convex function p defined on [zO, z H ] by reformulating the problem as an equivalent linear program. (This function p could of course be minimized by some 1- dimensional search procedure or simply by a sort of the slopes to find where they change sign, but this is not our real concern here.) There a r e a t least three ways of formulating t h s equivalent linear program. Each contains the germ of a more general solution strategy considered in later sections.

4.1 Flgure: The function p

(19)

-

^17-

The coordinates of the breakpoints of p are denoted by

( z h , ~ ( z h ) ) , h = ^0,...,H and with slopes

sh for z € [ z h - ] , z h ] , h = I , ..., H

.

The convexity of p implies that

( 4 . 2 ) s l C s 2 S * * . S SH ,

With

ah

=

zh - z h - ~ and eh = p ( z h )

-

shzh , the line segment on [ z ~ - ~ , z h ] takes the form

( 4 . 3 ) p h ( z ) : = s h z

+

^eh

The bounded v a r i a b l e m e t h o d

If we introduce a new variable yh for each interval [ z h m l , z h ] for any given value of ^zit is easily verified that on [ z O , z H ] ,

H

( 4 . 4 ) p ( z ) = p ( z o )

+

min [

C

^{s h y h}^lz

⁼

^{z 0}

⁺

^{C y h}^,

^o

^yh ^ah^,^h⁼l , . . . , ~ ]

h=1 h

The assumption of convexity and hence ( 4 . 2 ) is of course crucial, since this means that y l is preferred to y 2 , and y 2 to y 3 and so on in the minimization of ( 4 . 4 ) . Hence at the optimum point in ( 4 . 3 ) , yh

>

0 implies

(20)

Minimizing p on [ z O , zH] is equivalent to solving the following linear program:

(4.5) find yh€[O , ah] ^, for h = 1 ,..., H , H

such that z =

x

shyh is minimized.

h = l

The optimal z * is determined by

w h e ~

(Yi

^,^h⁼l , . . . , H ) is the optimal solution of (4.5).

Inner approximation

Referring to Flgure 4 . 1 , any point (z , a ) in the shaded region C, i.e.

with a ²p ( z ) , can be written as a convex combination of the extreme points

For any given z it follows that

and 1hus.minimizing p on [zO , zH] is equivalent t o solving the following linear program

H

such that z

= x

^hhp(zh)is minimized.

h =O

(21)

The optimal z

*

is determined by

where (A; , h = O , . . . , H ) is the solution of ( 4 . 7 ) . For an arbitrary finite convex function p with p ( z h ) = p ( z h ) , the function p can be viewed as an inner linearization of p .

Outer approzimation

Since p is piecewise linear, we have that

where the functions ph are defined by ( 4 . 3 ) . When p is expressed in this form, finding its minimum consists in solving the minimax problem:

min max ph ( z )

I E [ z ~ . z ~ ~ h = l , .... H

This is equivalent to solving the following linear program ( 4 . 9 ) ~ E R , a n d z E [ z O , z ~ ] s u c h t h a t

I9Sshz + e h

and z = I9 is minimized.

The methods of inner and outer approximation rely on dual representa- tions of the epigraph of p , but one should note that the linear programs ( 4 . 7 ) and ( 4 . 9 ) are not dual linear programs.

Each of the above three problem manipulations delineates an approach to solving problems of the type ( 1 . 1 ) . We consider each one in this more general setting in the next three sections.

(22)

5. EXTENSIONS OF THE REVISED SIMPLEX MmHOD

An algorithm for solving stochastic programs with simple recourse with separable piecewise linear recourse costs and discrete random variables has been proposed in [lo], a write-up and computer code has been provided by Kallberg and Kusy [13], and computational experience is reported in [14], [15]; cf. also [16]. From (3.4) we know t h a t

with e a c h qi polyhedral with the slopes sib and sio

<

s i l

<

. .

<

s ~ , ~ . We can clearly apply the bounded variable method of Section 4 to the m2- functions + i . Using (3.5) we obtain the following linear program, the ana- log of (4.5):

(5.1) ^findOSzj, j = 1 ,..., n l , and for i = l , . . . , m z ,

o s

yil s dir , I

=

⁰^,...,ki-l , and yieq 2 0;

such that

and z

=

c z

+ 2 2

silyil is minimized.

i = 1 l=O

where A, and Ti a r e the i-th rows of A and T , and for i = ^1,... ,m2,

with pi, E )-m , p i l ( chosen so that for t h e optimal solution z * , T i z

>

pio

(23)

is guaranteed

In seeking the solution of (5.1) using the revised simplex method for linear programs with simple upper bounds, let us assume t h a t we have in hand a nondegenerate basic feasible solution say

&$a ,i

=

^1,..., m2,1 = 0, ..., k i ) which yields the values of the functions

+,

^,ⁱ

⁼

¹^,...,^m2a t a point

X

=

E?

as follows: for e a c h i = ¹,..., m 2 ,

i.e. the

&

actually solve the program (3.5) for

xi

=

y i .

It follows that if

then

gih =

^dh f o r h

<

L ,

In [lo] such a solution is called a perfect basic solution. The fact t h a t t h e optimal solution is of t h a t type and t h a t one can pass from a perfect basic solution to another can be argued as follows: Let ( a , ~ ) E x be the simplex multipliers associated with the solution a t hand. To find the variable to be entered into the basis a t the next iteration, we compute Sib, the reduced cost ( t h e component of the reduced gradient

corresponding to t h e c u r r e n t basis) associated with the variable y*, viz.

The ISih , h

=

O , . . . , k i j a r e increasing, thus for each i among all variables at their lower bound ( h

>

I ) , the variable y i , l + l is the one t h a t yields

(24)

potentially the greatest (marginal) improvement. Similarly, among all variables at their upper bound ( h

<

1 ) the best candidate for decrease is y i , l - l . It is also readily established that when either y i , r + l or yi,l-l is introduced into the basis then yd will move to its appropriate bound and leave the basis. Thus the new basis will also be perfect. This property is not affected by exchanges between z-variables and yih-variables, unless degenerate cases are mishandled. A potential difficulty is that the algorithm could go through a great number of steps and associated basis changes if \ki has many pieces. Thls can partially be overcome by an acceleration procedure that in one sweep makes a number of basis changes involving variables [yih , h = 0,. ..,ki j for a given i , _see[lo]. This algorithm can thus be regarded as a n application of the bounded variable revised simplex method with an acceleration step; in 1101 it is also shown how to exploit the structure of the problem to obtain a good starting basis.

An extension of the above approach that seeks to avoid the difficulties associated with introducing bounded variables, and has also the advantage of greater generality permitting the distribution Fi to be arbitrary and q i ( - , w ) to be nonlinear, is to handle the. variable

xi

explicitly rather than implicitly in terms of a basic variable yd. T h s method, discussed in [9], can be viewed as an extension of the convex simplex method to problems with nonlinear nonsmooth objective functions. Let us briefly review the convex simplex method as it applies to the problem:

( 5 . 2 ) find z r 0 such that Az

=

b , and f ( 2 ) is minimized

Here f :Rn ⁺R is a smooth (continually differentiab1.e) function and A is

(25)

m x n . In contrast to the case when f is linear, we cannot work only with basic solutions Az = b , i.e, with no more than rn-elements of z larger than 0 . At the start of a cycle of the convex simplex method, assume that z is a feasible solution with z g the m-variables associated with a basis B . The remaining variables 2% define the non-basic variables with associated matrix N . For convenience let us assume that the basic variables are the first m variables, i.e. z = ( z B , z N ) , A = [B IN], and

We use t h s relation to eliminate z g from ( 5 . 2 ) . The reduced gradient of f is thengiven by

where

Schematically the c o n v e x s i m p l e x m e t h o d proceeds as follows:

S t e p 0. Find 2 a basic feasible solution of Az = b , z r 0.

S t e p 1 . Select as basic variables the rn components of z^ largest in magnitude such that the associated matrix B is a basis.

Compute Z (2) using (5.3).

S t e p 2 . With dj denoting a change in variable z j that does not violate feasibility, identify t h e improving variables as follows:

Ej

<

O a n d b j

>

0 , or

(26)

Choose a "best" candidate, say j = s , with associated column A S . If none exists, the problem is solved.

Step 3. Changing

z^,

by 6, corresponds to a step to a new point Z given by

EB

=

2B

-

6, ( B - ~ A ~ )

3j

=

Sj , for j # s and nonbasic.

Withd = z - 2 , find

' 5 ] ~ a r g m i n [ f ( f + v d ) l f + v d 2 0 ] .

Assume T j exists. Otherwise the original problem (5.2) could be unbounded or one needs to work with &-approximates of T j .

Define I ,f ,( = 5 ( o l d )

+

7jd and return to Step 1.

In [g] this overall scheme is used to design a method for solving (2.7), i.e. (convex) programs of the type

m,

find z € R n l , x€Rm that minimize cz

+

_\ki

(xi

i = 1

such that Az

=

b , Tz

= x

and z 2 0 _, with the \ki not necessarily differentiable. Let

a\ki

^(xi)= ^{I V}

1

-4 ^ci ^s

zi

^j

Let B be a basis, i.e. a rn x rn- submatrix of

(27)

m

=

m l

+

^{m 2 ,}and (0,n) the associated multipliers defined by (5.4) (u,n) = c ~ B - '

where

( c ~ ) ~ = cj if the k-th basic variable is zj, ( c ~ ) ~ = Cli if the k-th basic variable is

xi

Reduced subgrachents are computed for the nonbasic variables as follows:

-

_cj₌_cj_{- U A ~}_{- n ~ j}_{i f j}_{C n l}

and for the other nonbasic xi-variables

In Step 2 of the algorithm, the "best" candidate is chosen by

(5.5) ^s= arg min [ E ~ , j = ¹,... nl ; - ci

.

ⁱ⁼¹^....,^{m 2 ;}

^-ci

^,ⁱ⁼¹^,...,^m21.

j *i

The remaining operations remain the same, but note that finding

7

in Step 3 is greatly simplified because of the separability of the objective function.

The next logical step is to consider algorithmic procedures for a class of problems whose nonlinear features can still be relegated to the unconstrained optimization of some nonlinear function on a subspace, but this time with nonseparable objective possibly also nonsmooth, such as for problems of type (2.6).

Let us first examine the problems raised by nondifferentiability and suppose that M ( x ) is given in terms of a finite number of vectors, say g l , . . . , g k , i.e. N ( x ) is a polytope. A vector g that can play the role of the

(28)

gradient in this case is the solution of the following quadratic program [ 1 7 , 10, 191:

(5.6) ^{find A}^E

R$

such that

I 1

^v(

1

is minimized.

k k

and

x

^Ai ⁼¹^{, v}

⁼ x

^\gi

i = 1 i = 1

We shall refer to the solution of ( 5 . 6 ) as the gradient of 4? a t X . A variant of t h e convex simplex method with this definition of gradient would naturally lead to a corresponding notion of reduced gradient. When

I 1 . I

( denotes Euclidean norm, 4? is sepzrable and

a4?,(xi)

= [Ei

. ^ei],

( 5 . 6 ) gives:

For i

=

1, ..., m2,

( 5 . 7 ) find v i E

[Zi

,

G ]

such that

1

^{v i}

I

is minimized.

Let u s again denote the solution by g = ( g ] , . ..,g,,). The components of t h e reduced gradient corresponding to t h e variables

xi

are thus

- ci

=

gi

+

ni for i = 1 , ..., m2 ,

where ( u , n) a r e as in ( 5 . 4 ) , the simplex multipliers associated with t h e basis

B .

For t h e components corresponding t o t h e zj variables we have a s before

- cj = c j -uA' - n T J f o r j I n l .

The selection rule for the incoming variable is similar to ( 5 . 5 ) used in t h e convex simplex method implementation, viz.,

(29)

and p r o d u c e s t h e s a m e i n c o m i n g v a r i a b l e , as can easily be verified, provided naturally that the vector n is unambiguously defined. In both cases, we recognize optimality through the condition?, 2 0.

When

+(x)

is not separable we can continue t o define the reduced gradient through (5.6) and use (5.8). A further development is to minim- ize in a subspace of non-basic variables leading to a reduced gradient method for nondifferentiable optimization.

Again, let us briefly review the reduced gradient method for problems of the type (5.2), i.e.

find z 2 0 such that Ax

=

b and f ( z ) is minimized.

for any basic feasible solution say z ' , the constraints Az

=

b a r e always active. Suppose in addition t h a t a number of the constraints z, 2 0 , j

=

1, ... ,n a r e also active:

zi

=

0 for j E J c [ l ... n j

whereas for j E J they a r e inactive, zj

>

0. We thus have partitioned the constraints as follows:

z j = O f o r j ~ J

active A z = b

The normals t o the active constraints a r e the columns of N = [ A ~ , I ( ~ ) ] ^,

where consists of the columns I j of the (n x n)-identity matrix with

(30)

j ^E

J .

Let us denote by Z a matrix with linearly independent columns such that

N T z = 0 , lin. span Z u lin, span N T =

R n

If we consider as inactive all z j r 0 corresponding to a given basis B as well as all other nonzero variables, we have that ( J

I

= n - ( m

+

s ) . Now assuming N to be of full rank, we have

Under the usual nondegeneracy assumptions, if we ignore the inactive constraints in the neighbourhood of z ' problem (5.2) becomes

find x E

Rn

such that N T x =

(t)

and f ( x ) is minimized Let us make a transformation of variables

Z = z l + z y

where y E

R S .

If f ( z ) transforms to y ( y ) the preceding program becomes

( 5 . 9 ) find y E

RS

such that

y ( y )

is minimized.

There are no constraints since they become N T z '

+

N T z y

= (k)

Also

(5.10) v y ( y ) = g" =

z T g

where g

=

Vf ( z )

and

(31)

v2Y(y)

=ii!

= Z ~ H Z where H

= v2f

^{( z ) .}

An unconstrained optimization algorithm utilizing g" and (or suitable approximations to these quantities) combined with an active (index) set strategy for revising Z , gives the usual reduced gradient method, see for example [20]. The above discussion is, of course, clearly related to the one given earlier for the convex simplex method. In particular com- pare (5.10) with (5.3). Note that if we ignore the zero components of Z ( z ) in (5.3), namely, those corresponding to the basic variables, and when

I

^{J (}

=

n

-

rn then (5.10) and (5.3) are alternative expressions for the same quantity. The matrix Z plays a crucial role in the implementation of the reduced gradient method, and it has a special structure. If we partition A as follows:

and

then

This particular form of

Z

is exploited in MINOS which to date is the most effective implementation 01 nonlinear techniques for solving large problems of type (5.21, see [21]. The method is particularly effective on problems that are linear with respect to a large number of variables and nonlinear only with respect to a few variables. Then the dimensionality s of the unconstrained minimum can be kept relatively small. Ths is based

(32)

on the following observation [21].

5.12 PROPOSITION. If p r o b l e m (5.2) is s o l v a b l e a n d t h e r e is an o p t i m a l s o l u t i o n i n v o l v i n g o n l y t of t h e n o n l i n e a r v a r i a b l e s , t h e n t h e r e e z i s t s an o p t i m a l s o l u t i o n at w h i c h t h e n u m b e r of i n a c t i v e c o n s t r a i n t s is l e s s t h a n o r e q u a l t o m

+

t

.

Turning now to the application of these ideas to stochastic programs with complete recourse, let us consider our original problem (2.1) with nonstochastic tenders, i.e. with the equivalent deterministic form, cf.

(2.6):

(5.13) find x r 0 such that Az = b , T z =

x

and z = cz

+

+(x) is minimized.

Note that Proposition 5.12 is precisely the counterpart of Murty's result [B, 91 which asserts that (5.13) admits an optimal solution ( z * , x*) with no more than ( m l

+

a2)-positive entries in the z-vector. Thus there is an optimal solution to (5.13) with no more than m z nonbasic variables with positive value. I we have simple recourse with separable recourse costs and the marginals of the random variables l(qi ( w ) ^,pi ( w ) , i

=

1,

...

, m z ] are absolutely continuous, then the qi are differentiable, see Properties 2.8 and (3.2), and the reduced gradient method is directly applicable. In general however the functions qi are nonsmooth; for simple recourse and discretely distributed random variables, the subgradient at

xi

is given by

as follows from (3. I), where f is the probability associated with event wl , l = l , . . . , k i . In this case the gradient could be computed using (5.6)

(33)

and t h e reduced gradient g" by means of (5.10). This would then give a very natural extension of MINOS to handle this class of stochastic (linear) programs. (In practice one would extend the scheme so as to compute E -

approximates, to ensure convergence .)

For the more general case of nonseparable objective function (5.13) this approach would still apply provided the gradient can be computed using (5.6). Indeed the question of what information about the subdifferential of the function can be povided becomes even more pressing when we are outside the case of simple recourse. Very often it is necessary t o resort to an approximation scheme that would provide upper and lower bounds for the solutions [22, Section 31 or accept the fact that t h e gradient can only be estimated such as in the methods of stochastic quasi-gradient [23]. The approach that has been suggested above, based on adaptation and extension of MINOS, can also be pursued here with naturally some adjustments. We sketch out some of the possibilities in order to high-light the new obstacles t h a t need to be overcome but also to stress t h e fact that there is a natural continuation of t h ~ s approach t h a t provides solution procedures for more sophisticated stochastic programming problems.

Recall t h a t +(x) = E ! $ ( X , W ) { and

in this more general case, see Section 2. We consider only the linear case, i.e. when

(34)

For stochastic programs with complete recourse (that are bounded),

$(.,w) is finite for all (possible) w and

where

If the random variables have a discrete distribution withp (.) and q (.) taking on the values

i ( p L , g L ) , 1 = 1 ,..., L j

with probabilities f l , 1 = 1, ..., L , then a "gradient" as defined by (5.6) is obtained by solving the program:

find v E R~~ such that

I 1

v

1 1

is minimized,

L I I

where v

=

f i n L and nL W qL ,

d(pl

^-^X)²

$(x

^,(q^,p

1).

1 = 1

To solve t h ~ s program efficiently we need to take advantage of its special structure, use the fact that for most 1 there is only a unique nL that satisfies t h e inequalities and that for many 1 , nL will be determined by the same basis of ( W , I ) , and so on.

In general, when the random variables are not discretely distributed or when there a r e too many possible values for the discretely distributed random variables it may not be possible to obtain complete information about 'k(x) or a'k(x). We are then reduced to accepting approximates.

There is at present no theory that allows us to deal directly with t h s case.

What is needed is to extend the subgradient techniques, such as 117, 18,

(35)

191 and in particular [24], with appropriate reduced gradient calculations to handle t h s case. The convergence proofs could be derived by relying on the framework provided by the study of nonlinear programming methods in the presence of noise [25].

The question of how approximate the calculation of 9(x) and a9(x) should be is still very much open to much deeper investigation. The method of stochastic quasi-gradients [23] advocates the use of a single sample point, say ( p S , q S ) , to obtain

M>c'

¹ ^{( q S}⁸p S ) )

as a n estimate to a+(>c'). However, this estimate is only used to slightly change

xs,

rather than as an adjusted cost function as in the extensions of the revised simplex method discussed here.

We conclude this section by making some comments about implementation. As mentioned already earlier, the most natural vehicle for implementing the algorithms described above is the MINOS Code of Mur- tagh and Saunders [21]. But even for the case of simple recourse with discretely distributed random variables several augmentations of the code are necessary, including

1. Design and implementation of a standardized input format. A very natural situation is for a linear programming model specified in MPS format to be later modified to permit some demands and costs to be stochastic. Thus a standardized input would be based on a combination of (a) MPS format for specifying c , the LP matrices A , T, the right-hand

sides b and bounds. In particular, we could easily allow for bounds 1 S z s u in place of z r 0,

(36)

(b) a specification section, like SPECS in MINOS to describe the rows of A and T that correspond t o the technology matrix, the cost functions qi (linear or nonlinear), and the distribution functions Fi (piecewise constant, piecewise linear).

2. Routines to compute the "gradient", e.g, as in (5.6) and (5.10) suitably extended to ensure convergence.

3. Modification of the line search procedure t o the nonsmooth case, see [261.

4. Design of a suitable output routine for interpreting ihe results in statistical terms.

MINOS is also a natural vehcle for incorporating techniques for solving more general problems as discussed above. An alternative starting point is the code of Nguyen and Bihain, see [24], w h c h already handles nonsmooth objective functions but does not have all the linear programming features of MINOS that a r e bound t o play a n important role in t h e efficiency and stability of the method used t o solve problems of type ( l . l ) ,

6. INNER APPROXIMATION

The algorithms we consider next use inner approximation of the type discussed in Section 4, see (4.7). After a general discussion of the algorithm, we consider first how it applies t o problems with simple recourse and, a s in Section 5, see how to extend the approach t o more general classes of stochastic programs (with nonstochastic tenders).

(37)

The resulting algorithm is in effect the generalized programming technique, attributed by Dantzig [27, Chapter 241 to P. Wolfe. Here we apply it t o problems of type (1.1) taking advantage of the special structure and of the form of +(,y). As a means to obtain e r r o r bounds, Williams [28] already suggested a n approach of this nature, but apparently it has not been exploited as a general solution technique.

The algorithm as it applies to (1.1) or equivalently (2.5) can be summarized a s follows:

Step 0. Find a feasible solution of AZO

=

b , z 0 ²0 Set X0

=

T X O .

Choose ,yl, ,

. .

, ,yv (a selection of tenders, v 2 0).

Step 1. Solve the linear program:

Y

(6.1) Minimize c z

+

h1\k(,yL)

=

^z

1 =o

subject to Az = b

Let (uV , rv , gV) be the (optimal) multipliers associated with t h e solution of (6.1).

Step 2 . Find

xu+'

^Ea r g min [\k(x)

+ rVx]

If \k(,yV+')

+

~ r " ~ " + ' 2 gV, stop: optimal.

Otherwise r e t u r n to Step 1 with v = v

+

1

(38)

We have assumed here that for all nv generated in Step 1, the func- tion X H (+(x)

-

rrVx) attains its minimum. There are naturally regularity conditions for stochastic programs that will guarantee this [ I , 281, but mostly we have done so to simplify the presentation and interpretation of the algorithm. Note that both upper and lower bounds for the infimum are available. Let z v denote the optimal value of z , and (hy , 1 = 0 , ..

.

^,v) the optimal values of the h variables in ( 6 . 1 ) . Then

where z * is the optimal value of the original program. The second ine- quality follows from the fact that (6.1) is a n inner approximation, whereas the first one follows from Step 2 w h c h implies that

Adding c z and taking inf on both sides with respect to ( z , ~ ) on the set (z 2 0

1

^Az

=

b , Tz

= xj

yields the desired inequality, it suffices to observe that the first one of these two minimization problems admits for optimal solution the pair ( z v

.

^{h h i )}with ( z V A , 1 = 0 . ) the

1 =o optimal solution of ( 6 . 1 ) . Thus

v

O c z v ^{- Z * S} Max [(+(xV+')

+

rrV~"")

- x

h r ( + ( g ) + n V 2 ) ]

k=O. .... v ₁=o

We interpret the algorithm as the search for a particular (optimal) tender

**x*.**

It is easy to see that if X* is part of the collection

x0,

. .

. ,xu,

then solving (6.1) will yield the optimal z * . One reason for believing that this approach holds promise is that in practice, one should be able to ini- tialize the algorithm with a good choice of tenders

xO,

^.

. . ,xu.

The

(39)

subsequent iterations can t h e n be viewed a s refinements of the original guesses. A line of further research is to find effective strategies for choosing initial tenders.

The convergence of the algorithm, with the following assumptions (6.3) all tenders a r e retained, as part of (6.1),

(6.4) complete information is available about the function values of

+

so that Step 2 can be carried out exactly,

has been proved by Dantzig [ 2 7 , Chapter 241. Further, the algorithm applied to t h e convex program (1.1) is equivalent to a cutting plane algorithm applied t o its dual. We can thus translate the results about retention of cuts [29, 301 into retention of tenders. In particular in our case, they imply that all tenders not associated with a basic variable hl can be dropped a t the next iteration, without affecting the convergence proof of the algorithm.

A large number of tenders could be generated, although this is very unlikely in practice, especially if a good set of initial tenders is used.

From a theoretical standpoint however and for reasons of sound implementation, it is worth examining the question of w h c h tenders should be retained to enhance convergence. A t iteration v with the multipliers ( u V , srV , Ip") we have for all tenders X' , 1 = 0 ,..., v ,

At the next iteration, a tender is developed (several tenders could equally well be formed) and we need to resolve (6.1) with respect t o . Suppose prior to the commencement of the next iteration

(40)

v

+

1, a subset of tenders

{x'

, L E L J

c lX0,

. .

.

, X V + l J

must be found such that the optimal solution of (6.1) is unaffected. Since the (optimal) multipliers

(gv+l , 1 *v+l

I

1

are unknown a t t h s stage, we formulate this problem as (6.6) given any (u , IF , d ) and a fixed index k

find L c [0, ..., v

+

l j such that +(x')

+

nXL 2 d forall L E L implies + ( p )

+ IF^

²^{d .}

Let us write

with D = [Do, ..., DV+l].

6.7 PROPOSITION. A sufficient condition that (6.8) +($)

+

n$ ²d for all L E L

implies

is that

(41)

i . e . t h a t b b e l o n g s t o t h e p o s i t i v e h u l l of o r e g u i v a l e n t l y t h e c o n u e z c o n e , g e n e r a t e d b y t h e c o l u m n s of D c o r r e s p o n d i n g t o L a n d 1'.

Proof. To say that b E pos [I' ;

D i

, L ^EL ] is to say that the linear system

is solvable. Thus the system

(h,rr,rr)~' r 0 , L E L ; h r 0 and ( A , T , 7 9 ) . b

<

0

is not solvable, as follows from Farkas Lemma. Using now the definitions of

D L

and b , we see t h a t thls implies that for a choice of variables (A

=

1 , rr , 29) satisfying ( 6 . 8 ) , we necessarily must satisfy ( 6 . 9 ) .

The question raised in ( 6 . 6 ) can thus be translated into finding a minimum number of generators, i.e. a f r a m e , for the convex poyhedral cone

An algorithm for doing t h s is described by Wets and Witzgall [31]. Note also t h a t it may be worthwhile to also eliminate tenders that have not been utilized in the solution of (6.1) on several prior iterations.

The use of this algorithm in the context of stochastic programming makes assumption (6,4) nontrivial. Even in t h e case of simple recourse, situations can arise when

+(x)

cannot be calculated exactly or the cost of calculating it could be excessive. For example, if q (y , w ) is nonlinear in y and t h e dependence on w is not simple (e.g, linear), the cost of evaluat- ing

E t q ( p ( w ) - X

d l

(42)

could be very large. A similar situation could arise even after approximating the distribution functions by piecewise constant or piecewise linear distributions. In this case the generalized linear programming approach must be revised to include noisy functions and the question of convergence, theoretical and practical, still needs further investigation.

In the case of simple recourse with separable cost the evaluation of the function \k presents no serious challenge since

and each \ki defined on R is given by a 1-dimensional integral, viz.

with the subgradient given by (3.1). Special forms of qi and Fi lead to even simpler representation for \E. such as (3.3). Even more explicit is the expression obtained in (3.4) and (3.5) in the case of piecewise linear recourse costs and piecewise constant distributions; for piecewise linear recourse costs and piecewise linear distributions see [32, $31, for even more detailed expressions for specific distributions consult [33], [34].

Note also that in this case Step 2 of the algorithm consists in finding for i

=

^1,..., m2 , X;C' such that

where the subgradient is given by (3.1). Again, in many cases it is possible to use the special forms of qi and Fi to find efficient solutions pro- cedurs for the preceding relation. For example, in the situation covered by (3.3), the above becomes: find

Xr+l

^{such that}

(43)

It thus suffices to have a bracketing routine for finding t h e point a t w h c h the monotone function F, passes through the value (qi+

-

.rr,V)/ qi.

In the more general case, when it is not feasible to compute the value of 9 a t

x

exactly, see Section 3, there are basically two strategies available. The first one is to accept inaccurate evaluations of 9 , view them as noisy observations of 9 and rely on a convergence in probability argument [25]. How to design a n efficient and reliable algorithm t h a t proceeds in this fashion has not been investigated yet.

The second approach is to proceed by approximations. By this we mean replace the original problem (2.1) by an approximate one, solve the approximating problem, obtain if possible bounds using this approximat- ing solution and repeat the process with a refinement of the approximation if the bounds a r e not sufficiently tight. The subject of approximations, specially via discreteization of the random variables, is reviewed in [22, Section 31 and will not be taken up here. We only want to raised some of the questions that need to be resolved before such a scheme could be made operational:

(i) How should the initial approximation be designed so as t o obtain with minimal computational effort a "good" approximate of the solution?

(ii) How to improve (refine) the approximation so as to "maximize"

the resulting improvement?