Large Scale Linear Programming Techniques in Stochastic Programming

(1)

NOT FOR QUOTATION WITHOUT PERMISSION OF THE AUTHOR

LARGE SCALE

LINEAR

PROGRAMMING

TECHNIQUES

IN srOCHASIIC PROGRAMMlPJG

Roger J-B. Wets

November 1984 WP-84-90

W o r h g Rzpers a r e interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of t h e Institute or of its National Member Organizations.

INTERNATIOKAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria

(2)

SCALE

LlNEXR PROGRAMMING TECHNIQUES IN SI'OCHASTIC PROGFMMXU+JG

Roger J-B. Wets

INTRODUCTION

We study t h e use of large scale linear programming techniques for solving (linear) recourse problemsf whose random elements have discrete distributions (with finite support) m o r e precisely for problems of t h e type:

(0.1) ^find^z^E

R : '

such t h a t

Az =

b and z

=

c z

+

2 ( z ) is minimized where

(0.2)

2

^{( 2 )}

= ~ , L = , P , ^Q(z$) =

E I Q ( X . ~ ( U ) ) ~

f The potential use of large scale programming techniques for solving stochastic programs w i t h chance-constraints appeers to be less promising and has not yet been investigated.

The approrimation scheme for chmce-constraints proposed by Salinetti, 1983, would, if im- plemented require a detailed analysis of the structural properties of the resulting (large- scale) linear programs. Much of the melpsis laid out in this Section would also be a?pliceble to t h a t case but i t appears that fufLher properties

-

namely the connections between the upper and lower bounding problems

-

should be exploited.

(4)

and for each L=l,

...,

L , the r e c o u r s e c o s t Q(z,(') is obtained by solving the r e c o u r s e p r o b Lem:

(0.3) ~ ( 2 . c ' )

=

inf i q l y

I

^~y

=

h 1

-

^{P z ,}y ^ER:' { where

1 1

=

( q ' . h l . T ' )

=

( q : ,...,q n 2 ; h i

. . .

h m 2 ; t : l , . , t i n l , . , t m ~ n l l )

E

**R*'**

with N

=

n2

+

m 2

+

^m2.nl

an d

p1

=

Prob [ [ ( w )

=

('

] .

The sizes of t h e matrices a r e consistent with z E R n l , y ^ER ~ ' , b E

and for all I , hl E Rm2: for a more detziled description of t h e recourse model consult P a r t I of this Volume. Because W is nonstochastic we refer to this problem as a model with $ z e d r e c o u r s e . The ensuing development is aimed a t dealing with problems t h a t exhibit n o further structural properties. Problems with sirn.ple recourse for example, i.e.

when

W =

(I,-/), are best dealt with in a nonlinear programming framework, cf. Chapter 4.

Before we embark on t h e description of solution strategies for the problem at hand, i t is useful to review some of t h e ways in which a problem of this type might arise in practice. First, t h e problem is indeed a linear recourse model xvhose random elements follow a knomn discrete distribution function. In that case either q o r h or

T

is random, usually not all t h r e e matrices at once, but t h e number of independent random variables is liable to be relatively large and even if each one takes on only a m.oderate number of possible values, t h e total number L of possible vectors [' could be truly huge , for example a problem with 10

(5)

independent random variables each taking on 10 possible values leads us to consider 10 billion ( = L ) 10-dimensional vectors

('.

Certainly not the type of data we want, or c a n , keep in East access memory.

Second, t h e original problem is again a stochastic optimization problem of t h e recourse type but (0.1) is the result of an approximation scheme, either a discretization of an absolutely continuous probability measure or a coarser discretization of a problem whose "finite" number of possible realizations is too large to contemplate; for more about approximation schemes consult Chapter 2. In this case L, t h e n u m b e r of possible values taken on by ,$(.), could be relatively small, say a fenr hun- dreds, in particular if (0.1) i s part of a sequential approximation scheme, details can be found in Chapter 2, see also Birge and Wets, 1984, for example.

Third, the original problem is a stochastic optimization problem but we have only very limited statistical information about t h e distribution of the random elements, and

C1,...,CL

represents all t h e statistical data available. Problem (0.1) will be solved using the empirical distribution, t h e idea being of submitting i t s solution to statistical analysis such a s suggested by the work of Dupazova and Wets, 1984. In this case

L

is usually quite small, we a r e thinking in t e r m s of

L

less than 20 or 30.

Fourth, problem (0.1) resulted from an a t t e m p t a t modeling uncer- tainty, with no accompanying statistical basis t h a t allows for a c c u r a t e descriptions of the phenomena by stochastic variables. As indicated in Chapter 1, this mostly comes from situations when t h e r e is data uncer- tainty about some p a r a m e t e r s (of a deterministic problem) or we want t o analyse decision making o r policy setting and the future is modeled in t e r m s of scenarios (projections with tolerances for errors). In t h i s case

(6)

t h e number L of possible variants of a key scenario t h a t we want to consider is liable to be quite small, say 5 to 20, and t h e ,$' can often be expressed a s a sum:

+

where for k

=

1

...., K,

t h e E

R~

a r e fixed vectors and ( T ~ ( . )

....,

T ~ ( . ) ) a r e scalar random variables with possible values 7711,...,77M for 1

=

1,

.... L.

We think of K as being 2 or 3. The typical case being when we have a base projection:

fO + cl,

but we want to consider t h e possibility t h a t certain factors m a y vary by as much as 25% (plus or minus). In such a case t h e mo.del assigns t o t h e (only) random variable q l ( . ) some discrete distribution on t h e interval [.75,1.25].

With t h i s as background to our study i t is natural t o search solution procedures for recourse problems with discrete distributions when t h e r e is either only a moderate number of vectors

t1

to consider (scenarios, limited statistical information, approximation) or t h e r e is a relatively large n u m b e r of possible vectors ,$' t h a t r e s u l t from combinations of t h e values taken on by independent random variables. The techniques dis- cussed f u r t h e r on, apply t o both classes of problems, but t h e tendency is t o think of software development t h a t would be appropriate for problems with relatively small L , say frorrl 5 t o 1,000. Not just because this class of problems appears more manageable but also because when

L

is a c t u - ally very large, although finite, t h e overall solution strategy nrould still rely on t h e solution of approximate problems with relatively small L.

(7)

1. RECOURSE M O D E S AS LARGE SCALE

LTNEAR PROGRAMS

Substituting in (0.1) t h e expressions for Q a n d Q , we s e e t h a t we c a n obtain t h e solution by solving t h e linear program:

( 1 . 1 ) f i n d z ^ER:' and for 1

=

1

...

L , y' ^ER:Z s u c h t h a t

Az = b ,

9 z +

~ y '

=

h' , 1

=

1,

...,

L

L 1 1 .

a n d z

=

c z

+

~ l , l p l q y is minimized.

To each r e c o u r s e decision t o be chosen if ((.) takes on t h e value (I

=

( q 1 , h L , 9 ) corresponds the vector of variables y l . This is a l i n e a r program with

m ,

+

^{m 2 . L} c o n s t r a i n t s , a n d

n l

+

^{n 2 . L} variables.

The possibility of solving t h i s problem using s t a n d a r d l i n e a r programming software depends very m u c h on L , b u t even if i t were possible t o do so, in o r d e r t o avoid making t h e solving of ( 1 . 1 ) prohibitively expensive

-

in t e r m s of t i m e a n d required c o m p u t e r m e m o r y

--

it i s n e c e s s a r y t o exploit t h e properties of t h i s highly s t r u c t u r e d large scale linear pro- g r a m . The s t r u c t u r e of t h e tableau of d e t a c h e d coefficients t a k e s on t h e form:

(8)

1.2

FIGURE:

S t r u c t u r e of discrete stochastic program.

We have h e r e a so-called dual block angular s t r u c t u r e with the important additional feature t h a t all t h e matrices, except for A , along t h e block diagonal a r e t h e same.

I t

is this feature t h a t will lead us t o t h e algo- r i t h m s t h a t are analysed in Section 3 and which up to now have provided u s with t h e best computational results. It: is also this feature which led Dantzig and Madansky, 1961, to suggest a solution procedure for (1.1) by way of t h e dual. Indeed, t h e following problem is a dual of (1.1):

(9)

(1.3) find u E Rm', and for 1

=

I ,..., L ,

n1

E Rme s u c h t h a t

U A

+

C / = , p 1 ^7r1T I

s

C ,

1 1

7r W

s

q , 1 = I , ..., L

and w

=

ab

+

C / ~ , p ~ r r ' h ' is maximized.

Problem (1.3) is not quite t h e usual (formal) dual of (1.1) To obtain t h e classical linear program dual, s e t

E' =pin I

and substitute in (1.3). This problem has b l o c k a n g u l a r s t 7 u c t u r e , t h e block diagonal consisting again of identical matrices Hi. The tableau with detached coefficients takes on t h e form:

1.4 FIGURE. S t r u c t u r e of dual problem.

(10)

Transposition is denoted by ', e.g.

W'

is the transposed matrix of

W .

Observe t h a t we have now fewer (unconstrained) variables but a larger number of constraints, assuming t h a t n z 1 m 2 , as is usual when the recourse problem (0.3) is given its canonical linear programming formulation. In Section 2 we review briefly the methods t h a t rely on t h e structure of this dual problem for solving recourse models.

At least when the technology matrix T is nonstochastic, i.e. when

P = T ,

a substitution of variables, mentioned in Wets, 1966, leads to a linear programming s t r u c t u r e t h a t has received a lot of attention in the literature devoted to large scale dynamical systems. Using the constraints of (1.1). i t follows t h a t for all I =1, ..., L-1,

lk

=

h 1

- wyl

and substituting in the (1

+

1)-th system, we obtain - W y L + w y l + l = h ' + l

Problem (1.1) is t h u s equivalent to

(1.5) ^find²^ER:' and for 1 = I

...

L, y 1 E R:' such t h a t

L 1 1 -

and z

=

^cz

+ x1

= l p l y 1s minimized.

With h0

=

⁰and for I = I .

...,

L,

=

h' - h l - l ,

the tableau of detached coefficients exhibits a staircase stmcture:

(11)

1.5 FIGURE. Equivalent staircase structure.

We bring this to t h e fore in order to s t r e s s a t t h e s a m e time t h e close relationship and t h e basic difference between t h e problem a t hand and those encountered in t h e context of dynamical systems, i.e. discrete version of continuous linear programs or linear control problems.

Superficially, t h e problems a r e structurally similar, and indeed the matrix of a linear dynamical system may very well have precisely t h e s t r u c t u r e of t h e matrix t h a t appears in (1.5). Hence, one m a y conclude t h a t t h e results and t h e computational work for staircase dynamical sys- t e m s , cf. in particular Perold and Dantzig, 1979, Fourer, 1984, and

(12)

Saunders, 1983, is in some way transferable to the stochastic programming case. Clearly some of the ideas and artifices that have proved their usefulness in t h e setting of linear (discrete time) dynamical systems should be explored, adapted and tried in t h e stochastic programming context. But one should a t all times remain aware of the fact t h a t dynarnical systems have coefficients (data) that a r e I-parameter dependent (time) whereas we can view the coefficients of stochastic problems as being multi-parameter dependent. In some sense, the g a p b e t w e e n (1 - 4 ) a n d s t a i r c a s e s t r u c t u r e d l i n e a r p r o g r a m s t h a t a r i s e from d y n u m i c a l s y s t e m s is t h e s a m e a s

that

b e t w e e n o r d i n a r y d i f l e r e n t i a l e q u a t i o n s a n d p a r t i a l d i f f e r e n t i d e q u a t i o n s . We a r e not dealing h e r e with a phenomenon t h a t goes forward (in time) but one which can spread all over

R~

(which is only partially ordered)! Thus, i t is not so surprising t h a t from a computational viewpoint almost no effort h a s been made to exploit t h e s t r u c t u r e (1.5) t o solve stochastic programs with recourse.

However, t h e potential is t h e r e and should not remain unexplored.

(13)

2. METHODS THAT EXPLOIT THE DUAL SI'RUCTlJRE

Dantzig and Madansky, 1961, pointed o u t t h a t t h e dual problem (1.3) with m a t r i x s t r u c t u r e (1.4) is ripe for t h e application of t h e decomposition principle. I t was also t h e properties of (1.4) t h a t led Strazicky, 1980, t o suggest a n d implement a basis factorization s c h e m e , f u r t h e r analysed a n d modified by Kall, 1979, Vets, 1983, a n d Birge in Chapter 12. We give a brief description of both m e t h o d s and study t h e connections between t h e s e two procedures. We begin with t h e second one, giving a modified c o m p a c t version of the original proposal.

We a s s u m e t h a t W is of full row r a n k , if not t h e recourse problem (0.3) defining Q would be infeasible for s o m e of t h e values of hi and T' unless all belong t o t h e appropriate subspace of Rhr in which c a s e a row transformation would allow u s t o delete t h e r e d u n d a n t constraints. We also a s s u m e t h a t A i s of full row r a n k , (possibly 0 when t h e r e a r e n o con- s t r a i n t s of t h a t type). Thus with t h e columns of A ' a n d W' linearly independent (recall t h a t t h e variables o a n d rr a r e u n r e s t r i c t e d ) , a n d a f t e r introducing t h e slack variables ( s o ^ER:' a n d s1 € RY2 for 1 =1,

...,

L), we s e e t h a t e a c h basic feasible solution will include a t l e a s t n 2 variables of each subsystem

(2.1)

n ' ~

+ ~ ' l = ~ ' , s l 2 0 , I

=

1

,...,

L ,

t h e ( u n r e s t r i c t e d ) m 2 variables

d

and a choice of a t least ( n 2 m 2 ) slack variables ( s j . j = l

...

n2). Thus t h e portion of t b e basic c o l u m n s t h a t a p p e a r in t h e I - t h subsystem c a n be subdivided i n t o two p a r t s

[B~'.I~'~ ] =

[( w'.Il', ).I,'~]

where ( W ' . I ~ ' ~ ) i s an ( n 2 x n 2 ) invertible m a t r i x and t h e e x t r a columns, if any, a r e relegated t o IL2. Thus, schematically a n d u p t o a r e a r r a n g e -

(14)

rnent of columns, a feasible basis

8

has t h e structure:

a n d i n a detached coefficient form:

2.2

FIGURE.

Basis s t r u c t u r e of dual.

The matrix D' corresponding t o t h e columns of ( A ' . I ' , ~ ) t h a t belong t o t h i s basis and for 1

=

1.

...,

L, C ' is the n l X m 2 matrix:

c,' = b, ^T;,o]

(15)

-

^{1 3 -}

(recall t h a t T P l is of dimension n l x m2). Each

q',

after possible r e a r - r a n g e m e n t of row and columns, is of t h e following type:

= IW'. I,, ]

2.3

FIGURE.

S t r u c t u r e of B;.

where W b ) is a rn2 ^Xm2 invertible s u b m a t r i x of

w'.

and FVicl) a r e t h e remaining rows of W' t h a t correspond t o t h e rows of t h e identity t h a t have b e e n included in B" ( t h r o u g h

&

^). The simplex multipliers associ- a t e d with t h i s basis

B,

of dimension n l

+

n 2 . L , a r e denoted by

a n d a r e given by t h e relations

where

**[y*,p']**

is t h e appropriate r e a r r a n g e m e n t of t h e subvector of coefficients of t h e objective of (1.4) t h a t corresponds t o t h e columns of

B',

with

e'

being t h e subvector of [ b S , 0 ] whose components correspond t o

(16)

t h e columns of D'. This (dual feasible) basis is optimal if t h e v e c t o r s ( z t y l , 1

=

¹

,...,

L )

defined t h r o u g h (2.4) a r e primal feasible, i.e. satisfy t h e c o n s t r a i n t s of (1.1). To obtain z and y we s e e t h a t (2.4) yields

Substituting for z this becomes, for 1

=

1, ..., L,

where yL is t h e subvector of [plhL

,o]

t h a t correspond, t o t h e columns in

BL' .

We have used t h e fact t h a t

B

is a block &agonal with invertible m a t r i c e s (B~', 1

=

l , . . . , L ) on t h e diagonal. Going one s t e p f u r t h e r a n d using t h e properties of h1 a n d C, we g e t t h e s y s t e m for z :

(2.6) ( D - ~ ~ = I ~ ~ B L - ' c I ) ~

=~-C/=~II~BL-'YL

The s y s t e m (2.6) involves n l equations in n l variables and t h e L s y s t e m s (2.5) a r e of o r d e r nz. Thus instead of calculating t h e inverse of

--

a s q u a r e m a t r i x of order ( n l

+

n 2 . L ) -- all t h a t is needed is t h e inverse of L m a t r i c e s of order n z and a square m a t r i x of o r d e r n l .

Similarly t o calculate t h e values t o assign t o t h e basic variables associated t o t h i s basis, t h e s a m e inverses i s all t h a t is really required, a s c a n easily be verified. In order t o i m p l e m e n t t h i s method one nrould n e e d t o work o u t t h e updating procedures t o show t h a t t h e simplex m e t h o d c a n be performed in this compact f o r m , i.e. t h a t t h e updating p r o c e d u r e s involve only t h e restricted inverses. B u t t h e r e a r e o t h e r f e a t u r e s of which one should take advantage before one proceeds with i m p l e m e n t a t i o n .

(17)

Recall t h a t

where BL is a n invertible matrix of size m 2 x m 2 . Then

Thus it really suffices t o h o w t h e inverse of W ( L ) , and r a t h e r t h a n keeping and updating t h e n 2 x n 2

-

m a t r i x B ~ - I , all t h e information t h a t is really needed can be handled by updating a n m 2 x m2- m a t r i x , relying on sparse u p d a t e s whenever possible. This should r e s u l t in s u b s t a n t i a l savings. The algorithm could even be m o r e efficient by taking advantage of t h e repetition of similar (sub)bases W ( l ) . We shall n o t p u r s u e t h i s a n y f u r t h e r a t this t i m e because all of t h e s e computational s h o r t c u t s a r e best handled in t h e framework of m e t h o d s based on t h e decomposition principle t h a t we describe next.

The decomposition principle, a s u s e d t o solve t h e l i n e a r p r o g r a m (1.3), g e n e r a t e s t h e m a s t e r problem from t h e equations

by generating e x t r e m e points o r directions of recession (directions of unboundedness) from t h e polyhedral regions d e t e r m i n e d by t h e L subproblems,

n w c q ' 1

.

In o r d e r t o simplify t h e comparison with t h e factorization m e t h o d described earlier, l e t us a s s u m e t h a t

[srlsrwc 01

= lo]

,

i.e. t h e r e a r e no directions of recession o t h e r t h a n 0, which m e a n s t h a t for all I , t h e polyhedra

[d

W

s

q ' ] a r e bounded; feasibility of (1.3)

(18)

implying t h a t they a r e nonempty. For k =1, .... v, let

l k lk)

7)k

=

(771k ,..., rl ,..., q

t h e extreme point generated by the k-th iteration of the decomposition method, i.e.

where z k

=

^(2:. j = l .

....

n l ) a r e t h e multipliers associated to t h e first n l linear inequalities of the master problem :

(2.10) find

u

E

R ~ ' ,

hk E

R+,

k

=

1 ,...,

v

such t h a t UA

+

^x[=lh, ( ~ ~ = l ~ i 7 ) U C 7''. ) ^c

Ckv=lhk

=

1

and w

=

a b

+

C,Klhk ( C ~ l p l f l l k h t ) is maximized.

The basis associated to the master problem is ( n l x n l ) , whereas the basis for each subproblem is exactly of order n2. In the process of solving t h e subproblems the iterations of t h e simplex method bring u s from one basis of type (2.7) to another one of this type (all transposed, n a t u r - ally) with inverses given by (2.8). Here again, the implementation should take advantage of this s t r u c t u r a l property, and updates should be in t e r m s of t h e m z x m 2 submatrices W ( L ) . But w e should also take advantage of t h e fact t h a t all these subproblems a r e identical except for t h e right-hand sides and/or the cost coefficients, and this, in t u r n , would lead u s to t h e use of bunching and sifting procedures of Section 4.

I t is remarkable and important t o observe that the basis factorization method with the m o d i f i c a f i o n s alluded to earlier and the decomposition method applied to the dual, as proposed by Dantzig a n d Madansky, 1961, require t h e same computational effort; J. Birge gives a detailed analysis in Chapter 12, independently B. Strazicky arrived a t similar

(19)

results. In viewr of all of t h i s i t is appropriate t o view t h e method relying on basis factorization as a very close p a r e n t of t h e decomposition m e t h o d a s applied t o t h e dual problem (1.3), b u t i t does n o t give us t h e organizational flexibility provided by this l a t t e r algorithm. On conceptual ground, as well as in t e r m s of computational efficiency, i t is t h e decomposition based algorithm t h a t should be retained for potential software implementation. In fact, t h i s is essentially what h a s o c c u r r e d , b u t i t i s a "primal" version of this decomposition algorithm, which in t h i s class of (essentially) equivalent methods appears best suited for solving linear stochastic programs with recourse. I t is a primal method

--

which m e a n s t h a t we always have a feasible z E R ~ I a t o u r disposal

--

a n d i t allows us t o take advantage in t h e most straightforward m a n n e r of some of t h e properties of recourse models t o speed u p computations.

3.

METHODS THAT ARE PRIMAL ORIENTED

The g r e a t difference between t h e methods t h a t we consider n e x t and t h o s e of Section 2 is t h a t finding z t h a t solves t h e stochastic program (0.1) i s now viewed a s our major, if n o t exclusive, concern. Obtaining t h e corresponding recourse decisions (yl, I = 1 ,

....

L) or associated dual multipliers ( n t , I = 1

,...,

L) is of no r e a l i n t e r e s t , a n d we only perform s o m e of t h e s e calculations because t h e search for an optimal solution z r e q u i r e s knowing some of these quantities, a t l e a s t in a n amalgamated form. On t h e o t h e r hand, in t h e methods of Section 2 all t h e variables (o ,lr',...,nL) a r e t r e a t e d a s equals; t o have t h e optimality criterion fail for s o m e variable in subsysteni 1 (even when pl is relatively small) is handled with t h e s a m e concern as having t h e optimality c r i t e r i a fail for some of t h e

(u,,

i

=

¹

,...,

^m variables.

(20)

Another i m p o r t a n t property of these methods is their n a t u r a l exten- sion to stochastic programs with arbitrary distribution functions. In fact, they a r e particularly well-suited for use in a sequential scheme for solving stochastic programs by successive refinement of t h e discretization of t h e probability measure, each s t e p involving t h e solution of a problem of type (0. I), cf. Chapter 2.

We s t r e s s these conceptual differences, because they may lead t o different, m o r e flexible, solution strategies; although we a r e very much aware of t h e fact t h a t if a t each stage of t h e algorithm all operations a r e carried o u t ( t o optimality), i t is possible to find t h e i r exact counterpart in the algorithms described in Section 2; for t h e relationship between the L-shaped algorithm described h e r e a n d t h e decomposition method applied t o t h e dual, s e e Van Slyke a n d Wets, 1969; between t h e above a n d the basis factorization method s e e Chapter 12; consult also Ho, 1983, for the relationship between various s c h e m e s for piecewise linear functions which a r e widely utilized for solving certain classes of stochastic programming problems, a n d Chapter 4.

The Lshaped algorithm, which takes i t s n a m e from t h e m a t r i x lay- out of t h e problem t o be solved, was proposed by Van Slyke a n d Wets, 1969. I t can b e viewed a s a cutting hyperplane algorithm ( o u t e r linearization) b u t t o stay in t h e framework of o u r earlier development, it is best t o i n t e r p r e t it h e r e a s a partial decomposition method. We begin with a description of a very c r u d e version of t h e algorithm, only l a t e r do we ela- borate t h e modificatioris t h a t a r e vital to make t h e method really efficient. To describe t h e method it is useful t o consider t h e problem in i t s original form (0.1) which we repeat h e r e for easy reference:

(21)

(3.1) f i n d z ^E

RY1

such t h a t Az

=

b , a n d z

=

c z

+

( z ) is minimized

We a s s u m e t h a t t h e problem is feasible and bounded, implementation of t h e algorithm would require an appropriate coding of t h e initialization s t e p relying on t h e c r i t e r i a for feasibility and boundedness s u c h a s found in Wets, 1972. The m e t h o d consists of t h r e e steps t h a t can be i n t e r - p r e t e d as follows. In Step 1, we solve an approximate of (3.1) obtained by replacing by a n outer-linearization, t h i s brings us t o t h e solving of a linear programming whose c o n s t r a i n t s a r e Az = b , z

s

0 a n d t h e additional c o n s t r a i n t s (3.2) and (3.3) t h a t come from:

(i) i n d u c e d feasibility c u t s g e n e r a t e d by t h e fact t h a t t h e choice of z m u s t be r e s t r i c t e d t o those for which

2

( z ) is finite, or equivalently for which Q(z.,$ )< +m for all 1

=

^1,

...,

L , or still for which t h e r e exists y1 E

R?

s u c h t h a t Pyl

=

h l - p z for all 1

=

^1,

....

L.

(ii) linear approximations t o on i t s domain of finiteness.

These c o n s t r a i n t s a r e g e n e r a t e d systematically through Steps 2 a n d 3 of t h e algorithm, when a proposed solution z v of t h e l i n e a r program in Step

1 fails t o satisfy t h e induced constraints, i.e. ( z V )

=

^m(Step 2) or if t h e approximating problem does n o t y e t m a t c h t h e function a t z v (Step 3).

The row-vector g e n e r a t e d in S t e p 3 is actually a subgradient of a t z v

.

The convergence of t h e algorithm under t h e appropriate nondegeneracy assumptions, t o a n optimal solution of (3.1), is based on t h e f a c t t h a t t h e r e a r e only a finite n u m b e r of c o n s t r a i n t s of type (3.2) a n d (3.3) t h a t can be g e n e r a t e d by Steps 2 a n d 3 since each one corresponds t o some basis of W a n d a pair ( h l , p ) or t o a basis of W a n d t o one of a finite n u m b e r of weighted averages of t h e ( l = l L ) a n d

(22)

Step 0. Set v

=

r

=

^s

=

0 .

Sfep 1 . Set v = v t 1 a n d solve t h e linear p r o g r a m find z E R:'. I9 ^E

R

s u c h t h a t

Az = b

( 3 . 2 ) Dkz ²d k , k

=

1 , ..., r , ( 3 . 3 ) E k z + 1 9 l e k , k

=

1,

...,

^{s ,}^{a n d}

cz +29 = Z i s minimized.

Let ( z V , f l V ) be an optimal solution. If t h e r e a r e n o c o n s t r a i n t s of type ( 3 . 3 ) , t h e variable 6 is ignored in t h e computation of t h e optimal z V , t h e value of gV is then fixed a t -=.

Step 2 . For 1

=

1,

...,

L solve t h e linear p r o g r a m s ( 3 . 4 ) find y E R : ~ ^,v + E R Y 2 ^,v - E R y e s u c h t h a t

ev'

+

e v -

=

v i is minimized

( h e r e e denotes t h e row vector ( 1 , l , . . . , I ) ) , until for s o m e 1 t h e optimal value v 1

>

^0.Let uV be t h e associated simplex multipliers a n d define

d,+] =

uVhl

t o g e n e r a t e a n induced feasibility c u t . Return to S t e p 1 adding t h i s new c o n s t r a i n t of type ( 3 . 2 ) a n d s e t r

=

^r

+

^{1 .} ^If^for^all^{1 ,}t h e optimal value of the linear program ( 3 . 4 ) v i

=

^{0 ,}go to S e p 3.

(23)

S e p 3. For every I

=

1,

..., L ,

solve t h e linear program (3.5) find y E such t h a t

q L y

=

w L is minimized.

Let rrLV be t h e multipliers associated with the optimal solution of problem I . S e t

t =

t

+

1 and define

wV

= CLIPLrrLv(hl ^-

^{? z V )}

⁼

^e, - E t z V

.

If g V ; r w V , we stop; z V i s t h e optimal solution. Otherwise, we return t o Step 1 with a new constraint of type ( 3 . 3 ) .

An efficient implementation of this algorithm, whose steps can be identified with those of t h e decomposition method applied t o t h e dual problem (see Section 2), depends very much on t h e acceleration of Steps 2 and 3 . This is made possible by relying on t h e specific properties of the problem a t hand (3.1), and it is in order t o exploit these properties that we have separated Steps 2 and 3 which are t h e counterparts of Phase I and Phase I1 of t h e simplex method as applied t o t h e recourse problem (0.3). In practice one certainly does not s t a r t from scratch when solving the

L

linear programs in Step 3; Section 4 is devoted to the analysis of Step 3 , i.e. how t o take advantage of the fact t h a t the

L

linear programs t h a t need to be solved have t h e same technology matrix W as well as from t h e fact t h a t the

tL =

( q L , h l P ) a r e t h e realizations of a random vector. Here we concern ourselves with t h e improvements t h a t could be

(24)

made t o speed u p S t e p 2 , a n d we s e e t h a t in m a n y i n s t a n c e s , d r a m a t i c gains could be realized.

First a n d for all, Step 2 can be skipped a l t o g e t h e r if t h e s t o c h a s t i c program is with complete recourse, i.e. when

a quite c o m m o n o c c u r r e n c e in practice. This m e a n s naturally t h a t no induced feasibility c o n s t r a i n t s ( 3 . 2 ) need t o be g e n e r a t e d . This will also be t h e c a s e if we have a problem with relatively complete recourse i.e.

when for every z satisfying Az

=

b , z r 0 , a n d for every 1

=

1, ..., L , t h e linear s y s t e m

Fg

= h l - ? z , y > O ,

is feasible. This weaker condition is m u c h m o r e difficult t o recognize, a n d t o verify i t would precisely require t h e procedure given in S t e p 2.

Even in t h e general case, i t may be possible t o s u b s t i t u t e for S t e p 2:

for some ( h Y , P )

Step 2 . Solve t h e linear program

( 3 . 7 ) find y E

R:'.W+

E R:',v- E R y e s u c h t h a t

a n d e v +

+

e v -

=

v Y is minimized.

Let a" be t h e associated simplex multipliers a n d if t h e optimal value of v V

>

0 , define

Dz+] =

uYTY , a n d

(25)

t o g e n e r a t e an induced feasibility c u t of type (3.2). Return t o S e p 1 wlth r

=

r+1. If t h e optimal value of v V

=

0 , go t o Step 3.

This m e a n s t h a t we have replaced solving L linear p r o g r a m s by just solving 1 of t h e m . In s o m e o t h e r cases i t m a y be necessary t o solve a few problems of type (3.7) b u t t h e effort wrould in n o way be c o m m e n s u r a t e with t h a t of solving all

L

linear programs of Step 2. In Section 5 of Wets, 1974, one c a n find a detailed analysis of t h e c a s e s when s u c h a s u b s t i t u - tion is possible, as well a s s o m e procedures for t h e choice o r c o n s t r u c - tion of t h e quantities h" a n d TV t h a t a p p e a r in t h e formulation of (3.7).

Here we simply suggest t h e reasons why t h i s simplification is possible a n d pay p a r t i c u l a r a t t e n t i o n t o t h e case w h e n t h e m a t r i x T i s nonstochastic.

Let

<

be t h e partial ordering induced by t h e closed convex polyhedral cone pos W, s e e (3.6), i.e. a 1

<

a 2 if a2 - a 1 ^Epos W . Then for given z E

R='

and for e v e r y L=1,

...,

L , t h e linear s y s t e m

(3.8) Wg

=

h i - e x " , y 2 0

is feasible, if t h e r e exists a v E

R~~

s u c h t h a t for all 1=1,

,...

L , (3.9) av

<

h L - ? x u ,

a n d t h e l i n e a r system (3.10) Wy

=

a', y ²0

i s feasible

--

or equivalently a" ^Epos W . There always exists a v t h a t satisfies (3.9), recall

L

i s finite. If in addition, a" can be chosen s o t h a t (3.11) a"

=

A"-Px

for

v

E 11, ..., L j , t h e n (3.8) i s feasible for all 1 if and only if (3.10) i s feasible with a" a s defined by (3.11). Although in g e n e r a l s u c h an a" does n o t eldst, in practice, a t m o s t a fewr e x t r e m e points of t h e s e t

(26)

n e e d t o be considered in o r d e r to verify t h e feasibility of all t h e l i n e a r s y s t e m s (3.8). Computing lower bounds of Sv with r e s p e c t t o

<

m a y r e q u i r e m o r e work than we bargained for, b u t it really suffices, cf.

Theorem 4.17 of Wets, 1974, to c o n s t r u c t lower bounds of SV with r e s p e c t t o a n y closed cone contained in pos

W ,

and t h i s could be, a n d usually is t a k e n to be, a n o r t h a n t . In such a c a s e obtaining a" is effortless.

Let u s consider the c a s e when T i s nonstochastic and a s s u m e t h a t pos W contains t h e positive o r t h a n t , if it contains a n o t h e r o r t h a n t simply multiply some rows by -1 making t h e corresponding a d j u s t m e n t s in t h e v e c t o r s ( h L , 1

=

1,1 ..., L ) . This certainly would be t h e case i f slack variables a r e p a r t of t h e y-vector, for example.

For i

=

1, ..., m Z , l e t

q =

m i n ht

1

If

n

=

hv for s o m e v +E [l ,..., L j , which would always be t h e c a s e if t h e

( h ( . ) ,

ⁱ

=

l , . . . , m 2 ) a r e independent random variables, t h e n i t follows f r o m t h e above t h a t for 1 = I ,

....

L , t h e linear systems

F Y y = h L - B Y , y 1 0

a r e feasible if and only i f m = a - l ' ! c v , y 1 0 .

Note t h a t in this case t h e lower bound n u = , - l ' ! c v

i s a simple function of z v .

In o u r description of t h e L-shaped algorithm t h e connections t o l a r g e scale linear programming m a y have been somewhat lost, if any- t h i n g it i s how t o deal with t h e "nonlinearity" of

Q

which h a s played

(27)

c e n t e r stage. To regain maybe a more linear programming perspective i t may be useful t o view t h e algorithm in the following light. Let u s r e t u r n to t h e dual block angular s t r u c t u r e (1.2) from which it is obvious t h a t if we can adjust t h e simplex method so t h a t i t operates separately on the z-variables and t h e (yl-variables, l = 1 , ..., L), i t will be possible t o take advantage of t h e block diagonal s t r u c t u r e of t h e problem with respect t o t h e (yi-variables, l = 1 ,..., L ) . Given t h a t some z v is known which satisfy t h e c o n s t r a i n t z r 0 , Az

=

b , then finding t h e optimal solution of (1.2), with t h e additional constraint z

=

z v leads t o solving a linear program, whose tableau of detached coefficients has t h e s t r u c t u r e :

E l

3.12 FIGURE. S t r u c t u r e of t h e y-problem.

(28)

where for 1 = I ,

....

L, h L Y

=

h L

- T(

2". Clearly, when confronted with s u c h a problem we want t o take advantage of i t s separability properties a n d t h i s i s precisely what is done in Steps 2 and 3 of t h e L-shaped algorithm.

The s t r u c t u r e of (3.12), with t h e s a m e m a t r i x W on t h e block diago- nal, s u g g e s t s t h a t of a distributed s y s t e m . A continuous version would t a k e t h e form:

(3.13) find y :

R

-, RnZ s u c h t h a t for all w E

R

y ( w ) € a r g m i n [ g ( w ) y ) ~ y = h Y ( w ) , y

€ R y e ] .

Because of t h e linearity of t h e objective function, t h e t r a j e c t o r y w

I-+

y ( w ) wilI be linear with r e s p e c t t o h Y if t h e s a m e basis of W r e m a i n s optimal. The main t a s k in solving (3.13) would be t o decompose

R

in regions of linearity of y(.). Once this decomposition i s known t h e r e m a i n d e r i s r a t h e r straightforward. Finding this decomposition i s essentially t h e subject of Section 4, which c o n c e r n s itself with t h e organiza- tion of t h e computational work so a s t o bring t h e effort involved t o a n acceptable level. Problem (3.13) again brings t o t h e fore t h e connections between this work a n d t h a t on dynamicaI s y s t e m s ( c o n t i n u o u s linear programming). With n o t too m u c h difficulty i t should be possible t o f o r m u l a t e a bang-bang principle f o r s y s t e m s with disti-ibuted p a r a m e - t e r s s p a c e ( h e r e

R~')

t h a t would correspond t o our s c h e m e for decom- posing

R.

(29)

To conclude o u r discussion of t h e L-shaped algorithm, let u s r e c o r d a f u r t h e r modification suggested by L. Nazareth. When t h e m a t r i x T is nonstochastic, say

=

T for all I , then t h e linear program in S t e p 1 m a y be r e f o r m u l a t e d as

(3.14) f i n d z E R:',

x

E Rm2. d E R s u c h t h a t

Az = b

l-2 - X

=

0

Fkx ²f k

=

^1,

...,

^T

Gkx + d s g k , k

=

1

,...,

s , a n d

c z + d = z is minimized

The i n d u c e d feasibility c o n s t r a i n t s a r e g e n e r a t e d a s earlier in S t e p 2 with

Fr+l

= -

^ov^,^{f r + l}

=

ovhl

The optimality c u t s (approximation c u t s ) a r e g e n e r a t e d in S t e p 3 with

The l i n e a r program t h a t g e n e r a t e s t h e ov and lrl" a s (optimal) simplex multipliers of Phases I a n d I1 respectively, is given by

find y E Rye s u c h t h a t

@4 =h' - f , a n d qy

=

w 1 is minimized.

Note t h a t now t h e "nonlinearity" is handled in a space of dimension rn2 which is liable t o be m u c h s m a l l e r t h a n n l , a n d we should r e a p all t h e advantages t h a t usually c o m e from a reduction in t h e n u m b e r of non- l i n e a r variables.

(30)

All of t h e s e simplifications c o m e from t h e fact t h a t when T is non- s t o c h a s t i c we c a n i n t e r p r e t t h e s e a r c h for a n optimal solution, a s t h e s e a r c h f o r an optimal x * , " t h e c e r t a i n t y equivalent". I t is e a s y t o s e e t h a t knowing X ' would allow u s t o solve t h e original problem by simply solving

(3.15) find z ^ERT such t h a t

Az =

b , 712

= X*,

a n d z

=

c z is minimized

.

The sequence

i f ,

v

=

I....{ g e n e r a t e d by t h e p r e c e d n g algorithm c a n be viewed a s a s e q u e n c e of t e n d e r s ( t o be "bet" against t h e u n c e r t a i n t y r e p r e s e n t e d by A ) . This t h e n suggests o t h e r m e t h o d s based on finding X * by considering t h e best possible convex combination of t h e t e n d e r s g e n - e r a t e d s o far; t h e s e algorithms b a s e d o n generalized linear programming, s e e Nazareth a n d Wets, 1984, a n d Chapter 4 of t h i s Volume. HOW- ever, t h i s approach does n o t appear t o be very promising for t h e g e n e r a l class of problems considered h e r e , not even when T is nonstochastic.

Indeed, t h e algorithm nrould proceed a s follows:

Step 0. f i n d a feasible z O E

R:'

s u c h t h a t AzO

=

b S e t

f' =

z 0

Choose

2

^,

. . .

, f , potential t e n d e r s , v ²0.

Step 1. Find (uv, r V , I J ~ ) t h e (optimal) simplex multipliers associate with t h e solution of t h e l i n e a r program:

minimize c z

+

x r = o Al

Q u )

Az

= b :uv

h -

^x[=oAl ^{= O} ^:nV

CLo

^A1

=

1 :29

z r o , ~ ~

^{2 0}

(31)

(32)

(33)

(34)

(35)

(36)

(37)

(38)

(39)

(40)

(41)

(42)

(43)

(44)

(45)

(46)

(47)

Large Scale Linear Programming Techniques in Stochastic Programming

LINEAR

TECHNIQUES

CONTENTS

SCALE

R : '

Az =

=

+

2

= ~ , L = , P , Q(z$) =

-

-

...,

=

I

=

-

=

=

. . .

R*'

=

+

+

=

=

] .

W =

T

('.

C1,...,CL

L

L

=

...., K,

R~

....,

=

.... L.

fO + cl,

t1

L

LTNEAR PROGRAMS

=

...

9 z +

=

=

...,

=

+

=

+

+

-

--

FIGURE:

I t

=

n1

+

s

s

=

+

W'

W .

P = T ,

=

- wyl

+

...

=

+ x1

=

...,

=

that

R~

= ~ , L = , P , ^Q(z$) =

**R*'**

c,' = b, ^T;,o]

**[y*,p']**