Scenarios and Policy Aggregation in Optimization under Uncertainty

(1)

W O R K I N G P A P E R

SCENARIOS A N D POLICY AGGREGATION I N OPTIMIZATION UNDER UNCERTAINTY

R .

T .

Rockajellar R .

J - B

Wets

December

1987 WP-87-119

I n t e r n a t i o n a l I n s t i t u t e for Applied Systems Analysis

(2)

SCENARIOS A N D POLICY AGGREGATION I N OPTIMIZATION U N D E R U N C E R T A I N T Y

R. T . Rockafellar R .

J-B

Wets

December 1987 WP-87-119

Working Papers are interim reports on work of the International Institute for

Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of the Institute or of its National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS

A-2361 Laxenburg, Austria

(3)

FOREWORD

A common approach in coping with multiperiod optimization problems under uncertainty where statistical information is not really strong enough t o support a stochastic programming model, has been t o set up and analyze a number of scenarios. The aim then is t o identify trends and essential features on which a robust decision policy can be based.

This paper develops for the first time a rigorous algorithmic procedure for determining such a policy in response t o any weighting of the scenarios. T h e scenarios are bundled a t various levels t o reflect the availability of information, and iterative adjustments are made t o the decision policy t o adapt t o this structure and remove the dependence on hindsight.

Alexander

B.

Kurzhanski Chairman System and Decision Sciences Program

(4)

CONTENTS

1

.

Introduction

. . .

1

2

.

General F'ramework

. . .

6

3

.

Basic Assumptions

. . .

¹⁰

4

.

Optinlality and Duality

. . .

¹⁴

5

.

Convergence in the Convex Case

. . .

20

6

.

Convergence in the Nonconvex Case

. . .

30

References

. . .

32

(5)

SCENARIOS AND POLICY AGGREGATION IN OPTIMIZATION UNDER UNCERTAINTY

*

R. T. Rockafellar, University of Washington and

Roger J-B Wets, University of California

-

^Davis

1. Introduction

Most systems that need to be controlled or analyzed iiivolve some level of uncertainty about the value to be assigned to various paranieters or tlie actual layout of some of the systeni's coinponelits. Not.

~iiucli is lost by siinply assigning "reasonablen values to the uiik~iow~i elements, as long as tlieir role is relatively insignificant. But in other situations tlie model builder cannot do this witliout running tlie risk of invalidatilig all tlie implicatiolis that are supposed to be drawn fro111 tlie analysis.

When a proba.bilistic description of tlie unknowii elements is a t hand, eitlier because a subst,aiit.ial statistical base is available or because a probabilistic law can be derived from coliceptual considera.t.iolis (measurement errors, life and death processes, etc.), one is naturally led to consider stocl~astic models.

Wlieli only partial illformatioil, or no inforniation at all, is available, liowever, tliere is uliderstalidably a reluctance to rely on such models. In presulriing t h a t probabilit,y distributions exist. they seen1 inlierelitly misdirected. Besides, tlie problelns of stocliastic optilniza.tiol1 tliat they lead t40 call he notoriously hard to solve.

A common approach in practice is to rely on scenario analysis. The uncertainty about parameters or colnponents of the system is modeled by a sniall number of versions of subproblenis derived froiii an underlying optimization problem. These correspond to different "scenariosn, a word that is used to suggest some kind of limited representatioii of information on the ulicertaiii elellleiits or liow such information may evolve. The idea is that by studying tlie different subproblems and their optimal solutions one may be able to discover similarities and trends and eventually come up witli a 'well hedgedn solution to the underlying problem, something which can be expected t o perform rather well under all scenarios, relative to some weighting of scenarios. As examples, see [ I ] and 121.

*

The work of both authors was supported in part by grants from the A i r Force Office of Scientific Research and the National Science Found a t ' ]on.

(6)

T o give this a mathematical formulation, let us write the scenario subproblems as ( p n minimize f,(x) over all x E C.

c IR"

where the index s ranges over a relatively modest, finite set S: the set of scenarios. It is not our intention t o address in this paper t h e question of how t h e scenario subproblems might be chosen or constructed. We take them for granted and suppose t h a t we know how t o solve the111 individually t o obtain optimal solutions x,. T h e question we d o raise is t h a t of how t o work w i t h t h e different vectors x,a n d consolidate them into an overall decision or decision policy. T h e esseritial difficulty obviously lies in t h e fact t h a t actions in t h e real world must be taken without t h e hindsight t h a t goes into solving t h e problems (P,). In multistage n ~ o d e l s the actions could, however, respond in time t o increasing degrees of information t h a t become available a b o u t t h e particular scenario being followed.

T h e expression of sucli information structure must be an important p a r t of t h e f o r n ~ u l a t i o n . Let us suppose we are dealing with time periods t = 1,.

.

, T

and write

where n l

+ . . . ,

^{+ n ~}= n. T h e component zt represents t h e decision t h a t n ~ u s t , be nlade a.t time t . More generally let X denote a function o r mapping t h a t assigns t o each s E S a vector

where Xt(s) denotes t h e decision t o be ma.de a t time t if the scenario happens t o be s. I t is such a mapping-let us call it a policy-that we are rea.lly looking for, b u t it has t o sat,isfy t h e crucial constraint t h a t if two different scennrios s a n d s' are indistillguishable a t tillle t on t h e basis of information available a b o u t the111 a t time t , then

Xf(s)

=

Xt(sl). A

policy, if it is t o make sense, cannot require different courses of action a t t i i r ~ e t relative t o s c e ~ ~ a r i o s s and s'if there is no way t o tell a t time t which of t h e two scenarios one happeils t o be following.

A good way of modeling this constraint is t o introduce an information structure by scenario bundling, i.e. by partitioning t h e scenario set S a t each time t into finitely mauy disjoint subsets, which can be termed scenario bundles. T h e scenarios in ally one bundle are regarded as observationally indistinguishable a t time t. Denoting t h e collection of all scenario bundles a t time t by

At,

we impose the requirement t h a t Xt(s) m u s t be constant relative to s E

A

for each A E

At.

T h u s from tlie space of all mappings

X

: S --,

IRn

with components

Xt

: S ^-+Rn' as in (1.2), a space we denote by

t ,

we single o u t t h e subspace

(1.3) ,U =

{X

E

l I ^Xt

is constant on each A E

At

for t = 1,.

. ., T )

as specifying t h e policies t h a t meet o u r fundamental c o n s t r a i i ~ t of not being based on h i i ~ d s i g l ~ t . T h e policies

X

belonging t o ,U will be called implementable policies. We make a distinction here between i m p l e ~ i ~ e n t a b l e policies a n d admissible policies, which belong t o tlle set

(1.4)

C

=

{X

E

EIXt(s)

E C , for all s E S ) .

(7)

For most purposes it would be reasonable to suppose t h a t t h e partition A t + l is a refinement of tlie partition

At

in tlie sense tliat each bundle A E A t is a union of bundles in A t + l . This would be consistent wit11 tlie idea t h a t iliforma.tion increases in time and is never lost. Interestiiigly enougli, none of what we say in this paper actually depends on sucli an assumption, tliougli. Inforlnatio~l about the scenario being followed could be allowed t o vary quite generally.

T h e central question of scenario analysis can now be stated. Given the collection of scenario subproblems

(P,)

and a license perhaps t o modify then1 (perturb their objectives) so as t o assist in adapting t o the information structure, we have tlie means of generating various policies

X

E

f

t h a t may be called contingent policies: X ( s ) is obtained by solving a version of tlie scenario subproble~il (P,) for each s E S. How can we use these means t o determine an implementable policy

X'

E .A/ t h a t in some sense is good for t h e underlying problem of optimization under uncertainty?

Note t h a t a contingent policy is a t least always adn~issible:

X

E

C.

But tliis condition is not built into our use of tlie tern1 "implementablen. Obviously a policy t h a t is botli admissible a n d iniple- mentable is what we really want-this is what we sliall n l e a ~ i by a feasible policy. But i ~ ~ ~ p l e n ~ e n t a b i l i t y is a logically inescapable requirement, whereas admissibility 111ig1it be waived by the modeler in sonie situations t h a t only risk t h e violation of X ( s ) E G,. for a few extreme or unlikely scenarios, or entail mild transgressio~is of certain lion-key coiistraints in inore ordinary scenarios.

T h e simplest case of a one-stage model (T = 1) helps t o illustrate these ideas. In t,liis case we only know the present. We know nothing t h a t would pin down a particular scenario o r subclass of scenarios, but are forced t,o make a decision "liere and nown. A policy X , with just one ttinie co~rlponent, is implementa.ble if for all s E S one has X ( s ) = z for soine (fixed) vector z. In other words, tlie space .A/ consists of just tlie constant mappings from S t,o

IR",

in coritrast t o t.he space

f ,

wliicli consists of all possible mappings from S t o

IR".

( T h e partition

A l

in tliis example is t h e "trivial parbitzionn consisting of tlie set S by itself, no scenario being regarded a s distinguishable from any otlier a t the time the single decision has t o be ta.ken. All of S is a sii~gle bundle.) In this sett,ing, tlie quest,ion is one of proceeding fro111 a mapping

X

t h a t is not c o n s t a ~ i t t o a mapping tliat is constant by some method making use of t h e insights gained by solving t h e individual scenario s u b p r o b l e ~ n s in various forms.

An attractive way of passing from a general policy

X

t o a policy t h a t is in~plementable is t o assign t o each scenario s E S a weight p , t h a t reflects its relative importance in the uncertain environment, wit11

p, > 0 for all s E S, and p , = 1 .

8E.G

These weights are used in blending t h e responses X ( s ) of

X

so as t o nieet tlie require~iielit of not allowing tlie decision a t time t t o distinguish among tlie scenarios in a bundle A E

At.

Specifically

(8)

one calculates for every time t and for every A E At the vector

which represeiits a "weighted average" of all the responses Xt(s) for scenarios in the bundle A. T l ~ e i ~ one defines a new policy by taking

(1.7)

2,

^(s)⁼^Xt^{( A )}for all s E A.

Clearly

2

is implementable:

9

^E

A.

(In the one-stage model,

2

would simply be the constant mapping whose value is the vector

CnE5

p P C X ( s ) . )

The transformation

defined by (1.6)-(1.7) is obviously linear and satisfies J 2 = J . It is a projectioii from

f

onto

N

wllicll depends only on the weights p,*. We call it tlie aggregation operator relative t o tlie given inforn1at.ion structure and weights. It aggregates the possibly different references tliat a policy illiglit, l i ~ a k e for tlre scenarios in any bundle into a single compron~ise response to tliat bundle.

If we were t o s t a r t from the contingent policy XO ill which X O ( s ) is for eacli s an optillla1 s01ut.ioi1 t o the uiiniodified scenario subproblen~

(P,),

wliicli is the typical beginniiig for all sceiiario ana.lysis, the correspondil~g in~plementable policy X" = J X 0 might be c o i ~ t e n ~ p l a t e d as a kind of solut.ion t o the underlying problem. There is IIO guarantee, however, tha.t

2"

will inherit from X0 t l ~ e propert'y of admissibility. Eveti if

9'

is admissible as well as iinplementable, therefore feasible, tlie sense in wliicl~ it m i g l ~ t be regarded as "optimal" needs to be clarified. As a matter of fact,

9')

is ail optillla1 solution to a certain "projectedn problem, which will be described presently, but this is not a t all (.he problem t h a t one is interested ill.

If instead of introducing the weigllts p, in an a posteriori manner we were to do so a t t11e outset, we would be led ill our search for a well liedged decision policy to the functional

(1.9)

and the problem

(1.10) minimize F ( X ) over all X E

C

n

N.

An optimal solution X* t o this problem would indeed be admissible and implementable. Anlong all admissible, implementable policies it would do the best job, in a certain specific sense, of respondilig t o the relative importance of the scenarios as assessed through the weights p,. It would provide a sound method of hedging against the unknowns.

(9)

T h e weights need not be regarded a s "hard d a t a n for this interpretation t o be valid. Tlie road is always open a t another level t o play wit11 the values of t h e weiglits and see how sensitively tlie problem is affected by them, altllough we d o not take t h a t issue u p here.

T h e trouble is t h a t problem (1.10) may be mucli larger and therefore much harcler t o solve t h a n tlie individual scenario subproblems (P,), so t h a t i t cannot be tackled directly. There is little prospect, either, t h a t t h e desired policy

X*

is approximated a t all closely by t h e policy

2"

already described.

T h i s is seen from t h e elementary fact t h a t

2"

actually solves

(1.11) minimize

F(v)

over all Y E

e,

where

(1.12)

& = { Y E N ~ ~ X E C

with

J X = Y ) ,

(1.12) P ( Y ) = m i n { F ( X )

( X

E

C , J X

= Y ) . T h e projected problem (1.11) is utterly different from (1.10).

Nonetheless there turns o u t t o be a relationsliip t h a t can be exploited t o trace a. path from

2"

t o

X*

by solving a sequence of projected problems in which t h e scenario subproblems are not tlie original ones b u t modified by t h e incorporatioli of tentative "information pricesn a n d penalties. At.

iteration Y we take a contingent policy

X"

obtained by solving modified scenario subproblenls (P:) a n d aggregat,e it into an implementable policy

2L'

wliose robustness in t h e face of all event,ualities is illcreasingly demanded. An advantage of t h i s approacll is tliat even if we d o not pursue tlie searcli until

2"

converges t o

X*,

we always have a t hand a solution estimate t h a t is better t h a t just

2"

or any otlier policy t h a t could reliably be gleaned from scenario analysis as practiced until now. Tlie word "bettern is given specific meaning by our convergence theory. T h e very process of blending decisiori componellts iteratively in t h e manner we suggest is likely moreover t o identify fairly early tlie trends and activities t h a t will lead t o t h e final solution.

T h e general principle t h a t allows us t,o proceed in this manner in generating improving sequences of policies is w h a t we call t h e principle of progressive hedging in optimization under uncertainby.

It enables us by simple means t o insist more and more on having o u r subproblenls reflect tlie ulti- m a t e requirement t h a t a policy, t o be implementable, cannot distinguish between scerlarios t h a t a t a particular t i m e a r e regarded a s indistinguisllable from each otlier on the basis of information so far available. T h e realization of t h e principle t h a t we give here is based matliematically 011 t h e theory of t h e proximal point algorithm in nonlinear programming. It does not depend on convexity, although convexity provides a big boost.

A notable byproduce of our hedging algorithm is t h e generation of information prices relative t o t h e chosen weights p,. Potentially these might be used in some larger scheme for adjusting tlie weiglits

(10)

or judiciously supplying more detail t o t h e set of scenarios. In t h e limit t h e information prices solve a dual problem, which l~owever is likely t o have dimension a t least as high as t h a t of t h e primal problem.

Because of this high dimensionality, approaches like Dantzig-Wolfe generalized programming, which ill effect applies a cutting-plane method t o the dual, are not suitable. O u r approach is not blocked by this difficulty and yet it retains properties of decompositio~i t h a t allow t h e separate scenario subproblems in each iteration t o be solved by parallel processors, if desired.

2. G e n e r a l F r a m e w o r k .

There is no h a r m in interpreting t h e weights p, mathematically as probabilities. They may indeed represent "subjective probabilitiesn, b u t t h e reader should not conclude from t h e probabilistic language tliat follows t h a t we necessarily regard them so. In passing t o a probability framework we merely ta.ke advantage of t h e fact t h a t it provides a convenient scheme for organizing ideas t h a t mathematically fall into t h e sanie patterns as are found in dealing wit11 probability. Much the saliie could be said about the use of geometric language in a nongeometric situation.

from

now o n , s u m s with t h e weights p, will be written as expectations in t h e traditional notation:

for instance. T1le11 ill (1.6) we have

( 2 . 1 )

Xt

( A ) = E { X (s)

(

^A),

the conditional expectation of X t ( s ) given t h a t s E A, and we can interpret the projectiox~ J : X ++

x

quite simply a s the conditional expectation operator relative t o tlie given information structure and values p,.

T h e i n f o r ~ n a t i o n structure can itself be furnished with a traditional interpretation in t,erliis of fields of sets:

5

is for each t t h e collection of all subsets expressible as unions of tlie (disjoint) sets ill

At.

Then

Rt

is t h e co~iditional expectation of

Xt

relative t o

5 .

Such terxninology, bringing t o mind all t h e subtleties of measure theory, is not in any way needed, however, in t h e present context where S is just a finite set. It could just get in t h e way of a "user-friendlyn e x p l a n a t i o l ~ of ideas t h a t are really quite elementary, so for t h e purposes a t hand we avoid it.

An inner product on the vector space

f

of all mappings from S t o IRn is defined by

(X,

Y ) := E { X ( s )

Y

(s)} =

C

p . ~ ( s ) .

Y

(s).

*E.q

We think of

f

a s a Euclidean space in this sense, tlie norm being

(11)

wliere

I

^.

I

is tlie ordinary Euclidean norm on Rn. Tlie aggregation operator J is then actually t h e orthogonal projection on t h e subspace N , as is well known. T h e operator

is t h e orthogonal projection on tlie subspace of f colnple~nentary to N , which we denote by

M:

(2.5)

M = N ~ = { w E ~ I J W = O )

= { W E f I E { W t ( s ) l A ) = O f o r a l l A E A ~ , t = 1

,...,

T).

Clearly, on t h e other h a n d ,

Tlius a policy

X

is implementable

if

a n d only if it sat.isfies t h e linear constraint equation K X = 0.

T h e functional F in (1.9) can be written now as

Tlie problem we wish t o solve tlien has the formulation

(PI

nii~iilliize

F ( X )

subject t o

X

E

C ,

K X = 0

An optimal solution

X*

t o this problem is what we take t o be tlie best response we can offer t.o tlie uncertain environment, relat,ive t o t h e given weighting of t h e scenarios. Tlle cha.llenge for us, ill adopting this point of view as a practical expedient, is t h a t of delnollstratilig how sucli all

X'

call be determined without going beyolid the tools t h a t are available.

We see o u r capabilities a s extending in two directions. First we can readily calcu1a.t.e for ally

X

the corresponding

2

=

J X

and therefore also

X

-

2

= K X . T h e projections J arid K are t.lius coniputa.ble a n d appropriate t o use in t h e context of an algoritlirn. Second, we call solve, a t least.

a p p r o x i ~ n a t e l y t o ally desired degree, t h e sceliario subproblelns

(P,)

and a certain class of lnodified versions of these subproblems. T h e specific form of modified scenario subproblenl t h a t we work will1 in this paper is

(k(%

^{W ,}^{r ) )} mininiize j , ( z )

+

^{z . w}

+

^{i r l z}^-^iI2over all z E C ,

T h e vector i will s t a n d for an estimate of z from which we d o not want t o stray t o o far; w E

IR'"

will be a price vector and r > 0 a penalty parameter.

Motivation comes in part from Lagrangian representations for problem ( P ) . T h e ordinary La- grangian for this problenl could b e defined a s t h e expression

(12)

-8-

with multiplier Y , but since K is an orthogonal projection one has

(2-8) ( K X , Y ) = ( X , K Y ) = ( K X , K Y ) .

Only the component W = K Y E M can really matter. We therefore find it convenient t o define

(2.9) L ( X , W ) = F ( X )

+

( X , W ) for

X

E

C ,

W E M,

as the Lagrangian. Tlie multiplier element W will be called an inforn~ation price system because of its role relative t o the implementability constraint K X = 0. More will be said a b o u t this later.

T h e ordinary Lagrangian (2.9), important as it can be for instance in stating optimality conditions, is limited in its numerical usefulness. More powerful in many ways if one can work with it, and not limited t o problenls where convexity is present, is t h e corresponding augmented Lagrangian

(2.10) L , ( X , W ) = F ( X )

+

( X , W )

+ ~ ~ ( ~ K x J J ~

= F ( X )

+

( X , W )

+

^$IIX^-x ~ ~ " o r z E

C ,

W E

M ,

where r > 0.

There is no place here for a general discussio~l of aug~ilented Lagrangians, except t o say t h a t t,liey combine features of ~nult~ipliers and penalties. Tllrough a good choice of W E M and r > 0 one can expect tha.t t h e subprobleln

(2.11) minimize L, ( X , W ) over

X

E

C

can be used as a close representation of

(P),

in the sense t h a t its nearly optimal solutions will be good approxi~llates t o a n optimal s o l u t i o ~ ~

X*

of

( P ) .

This is true without any a s s u i l l p t i o ~ ~ of convexity and does not necessarily entail r g e t t h g too large for comfort; much of the work can be done by W . Even in t h e convex case the augmented Lagrangian can be advantageous by providing greater stability t.o solution methods. We refer t h e reader t o Bertsekas (31 and Rockafellar 141 for more on this t,opic.

Unfortur~ately, the augmented Lagrangian (2.10) cannot serve directly in our scheme. To use it we would have t o be able t o solve subproblems of t h e form (2.11), which d o not meet our prescription.

T h e difficulty lies in the fact t h a t t h e term llKX112 is not "decomposablen into separate terms for each scenario. Nonetheless we are able t o take an approach which seenls quite similar and does achieve t h e required decomposition.

T h e approach can be described quite broadly in terms of the following algoritl~mic scheme. We shall subsequently make it more specific, in order t o have results on convergence. A fixed parameter value r > 0 is considered througllout this paper for simplicity. 111 practice one rnigllt wish t o 111a.ke adjustments in t h e value of r. This is an issue for which the theoretical backing is illcomplete, altllougll some elucidation will be provided in Proposition 5.3 and the comment t h a t follows it.

(13)

Progressive Hedging Algorithm. In iteration v (where u = 0 , 1 , . ..) o n e h a s an admissible but not necessarily implenlentable policy X " E

C

a n d a price system W" E

M.

(Initially one can take X "

to be the policy obtained by letting X " ( s ) be for each scenario s E S an o p t i n ~ a l solution to t h e given scenario s u b p r o b l e n ~ ( P , ) . One can take W O ⁼0.)

1. Calculate t h e policy

2"

= J X V , which is implen~en table but not necessarily a d l ~ ~ i s s i b l e . (lf ever one wishes to stop, this policy

2"

is to be offered a s the best substitute yet available for a solution to (PI.)

2. Calculate as

XV+'

an (approximately) optinlal solution t o the subproblem (P") m i n i n ~ i z e

F ( X ) +

^(X,^{W V )}

+

i r l J X -

2"(12

^{over all}^X^E

C

This decomposes in t o solving (approximately), for each scenario s E S, t h e s u b p r o b l e n ~ (P;) n ~ i n i n ~ i z e j" (z)

+

z . W1.'(s)

+

i r lz - 2 " ( s )

l2

over all z E C,

in order t o get X V + ' ( s ) . The policy X u + ' will again be adn~issible but not necessarily i n ~ p l e - n ~ e n table.

3. Update from WL' to W V + ' by t l ~ e rule W"" = W"

+

r K X V + ' . T l ~ e price s y s l e i ~ ~ WL'+' will again be an element of t h e subspace

M.

4. Return to S t e p 1 with v replaced by v

+

^1.

Left open in tliis sta.tement is the sense in which t l ~ e subproblem in St,ep 2 need only be solved

"approximatelyn. Actually the scenario s u b p r o b l e n ~ s in rllany applications will turn o u t t o be quadratic p r o g r a ~ ~ l n ~ i r ~ g problenis of reasonable d i n i e n s i o ~ ~ . Then one could well ima.gine solving them "exa.ctlyn.

This question of approximation therefore is not a sine q u a non. A substantial answer will nevertl~eless be presented in $5.

T h e updating rule for t h e price systems in S t e p 3 could in principle be replaced by something else without destroying t h e truly critical property of decomposability in S t e p 2. This rule is strongly motivated, though, by a.ugment-ed Lagrangian theory (cf. (41). It is essential not merely t o t h e proofs of o u r theorems on convergence but the very nature of the reformulation of t h e a l g o r i t l ~ m on which these proofs rely.

An obvious strength of t h e procedure we are proposing is t h a t it involves a t every iteration botli a n admissible policy

X"

and a n i n ~ p l e m e n t a b l e policy

2".

T h e distance expression

can readily be computed and taken as a measure of how far one is from satisfying all the constraints.

Note t h a t (2.12) is a kind of conditional variance relative t o the weiglits p,. In o u r convergence theorems for t h e convex case, a t least, this quantity will tend t o 0. At the s a m e tinie, the price systems W" will tend t o an optimal solution to t l ~ e Lagrangian dual of problem ( P ) .

(14)

Such results for t h e convex case are established in 95. T h e nonconvex case is taken u p in 56. We have much less t o say a b o u t it a t t h e present stage of development and try only t o indicate a potelltial in this direction. O u r immediate task, in 93 and 94, is t o lay t h e foundations for tlie derivatioli of these results.

3. Basic Assumption and Properties.

It will be assumed throughout tlle rest of this paper tliat for each 3 E S the feasible set C , in the scenario subproblem (P,) is nonempty and closed, and the objective function f, is locally Lipschitz continuous on IRn with all level sets of the form

bounded. T h i s last condition is trivially satisfied, of course, if C , itself is bounded. T h e closedness of C , presumably comes from the constraint structure used t o define C,", but such explicit structure will not play any role liere. Tlle local Lipscliitz continuity of f, is present if f, is sinooth (i.e. of class C ' on IRn) or, on the other hand, if f, is convex.

We shall speak of the convex case of our problenl (PI when for every s E S tlie ful~ction f. is convex and the set C, is convex. T h e linear-quadratic case will refer t o the more special sit,uat.ior~

where f, is quadratic (convex) a n d C , is polyl~edral (convex). We regard linear a r ~ d affine furict.iolls a s included under the lieading of 'quadraticn.

We proceed wit11 some of tlie elemelitary coilsequences of these conditions. Tlie first topic is their effect on the given scenario subproblems

(P,),

whose solution is called for a t the outset of o u r proposed algorithrll.

Proposition 3.1. Each of t h e scenario subprobler~is (P,) h a s finite optinla1 value a n d a t least one optinial solution. Furtherniore, t h e value

(3.2) ii= nlin F ( X ) ,

X E C

exists a n d is given by

(3.3) ⁼E{a.), where a, = min(P.).

It is a lower bound for the optinla1 value in

(P).

Proof. For t h e first part t h e argument is t h e standard one. T h e sets (3.1) for a > inf (P,) are i ~ o n e m p t y arld compact under o u r assumptions, alld since they are nested tliey m u s t have a nonernpty intersection. T h i s intersection consists of the optima.] solutions t o (P,). T h e exist,ence of an optinlal solution implies of course t h a t t h e optimal value in (P,) is finite. Tlie second part of t,he proposition merely records tha.t because of decomposability we are actually minimizing

F

over

C

when solving

(15)

each of the problems (P,). Indeed,

C

is just the direct product of t h e sets C, a n d F is by (2.7) separable, witli components p, f.. Tlie minimum value for p, f, over C, is p.a,, and tlie s u ~ i l of all these quantities p.a, is therefore 8. This sum is E{a,,) in our probabilistic notation. Problem ( P ) requires t h e minimization of F over

C n N ,

now just

C,

so 8 is merely a lower bound for tlie optiiilal value in (P).

Next we provide background for the decomposed solution of t h e subproblems (P") appearing in our algorithm.

Proposition 5.2. Every modified scenario subproblem of the form

(f',

( 2 , w, r ) ) (where r > 0) h u finite optinlal value a n d a t least one optinlal solution. In t h e convex case, this optinlal solution is unique.

Proof. Let

f,

denote t h e objective function in (f',(2, w, r ) ) ,

In tlie convex case, this is of course a strictly convex function on C, and therefore has at. rliost, one minimizing point relative t o C,. To reacli the desired conclusio~is it will suffice (in view of t h e existence argurileiit used for the preceding proposition) t o denloristrate t h a t all level sets of tlie for111 { z E C,

I

^f:.(z)

⁵

^{a ) ,}^{a E}

^IR,

are closed and bounded. They are obviously closed, since C , is closed and f, is continuous. T h a t tliey are bounded can be seen from t h e inequa1it.y

where a , is t h e o p t i r ~ l a l value in (P,.) as in Proposition 2.2. This yields tlie inclusion {z E C,

I

f , ( z )

5

a )

c

{z E

IRn

1 z . w

+

^:rlz ^-

?I2 ⁵

^a^-^{a , ) ,}

where tlie right side is a certain ball in IRn.

I11 t h e convex case

f',

( 2 , w, r ) ) is a convex programming problerl~. T h u s in executing our algoritlirl~

t h e critical s t e p of solving all t h e modified scenario subproblems ( P y ) is open t o the metliods of convex programming. In t h e linear-quadratic case, these problenis fall into t h e category of quadratic (corivex) programming: a quadratic function with positive definite Hessian is minimized over a polyhedron.

Special techniques such as pivoting algorithms can then produce a n "exactn optinial solution t o (Pf') as long as t h e dimension n and t h e number of linear constraints used in defining C, are not t o o large.

In

the i m p o r t a n t case where

f,

is linear, i.e. where t h e original scenario subproblems ( P , ) arise from a linear programming model, t h e nature of ( f ' , ( f , w, r, )) and ( P r ) is even more special. Although the proximal term in lz -

?I2

requires a quadratic programming technique r a t h e r t.11a11 th e simplex method, say, in solving such a subproblem, the Hessian matrix is just

TI.

It is possible tlien by

(16)

elementary algebra t o reduce the subproblem t o one of finding t h e point of C, nearest t o a certain point in

IRn,

namely in t h e case of (P:) t h e poiiit

Special methods are available for such a problem too.

Another thing t h a t sliould be noted about the modified scenario s u b p r o b l e i ~ ~ s solved ill o u r algorithm is t h e quite simple way they call be updated from one iteration t o t h e next. In iteratioil v we must (approximately) solve

(PY minimize j , ( z )

+

( z , W " ( s ) )

+

^{i r l z}^-2"(s)12 over C,, b u t in t h e preceding iteration we already solved

(PY-')

minimize j , ( z )

+

( z , W"-'(s))

+

i r l z -

xu-'

^{( s )}

1'

^over^C,

in order t o get X L ' ( s ) , and we then set W u ( s ) = W1'-'(s)

+

r [ X u ( s ) - z L ' ( s ) ] . By expanding t h e objectives in these two subproblenls around t h e initial X o ( s ) (as suitable reference poilit,), we call express t h e object.ive in

(PL,'-')

as

(3.6)

I f..

( z ) - a,

+

^{i r l z}^-X n ( s )

1' +

z . ( W V - ' ( s ) - r[X"-'(s) - X n ( s ) ] )

+

const.

wliere a, = n i i ~ i ( P , ) , and the objective in

(Pi')

as

(3.7)

I I.

^{( z )}- a,

+

^{+ r l z}^-^X0(s)12

+

z - (W L ' ( s ) - r ( T U ( s ) - X O ( s ) ] )

+

^const.

T h e value a , has beeii introduced ill these e x p r e s s i o ~ ~ s , for what it might be worth, because t h e first two terms are tlieil both nonnegative and vanish wliei~ z = X f ' ( s ) . Tlie important observa.tioii, siiice constant teriils in an objective have no effect oil t h e calculatioii of an optimal s o l u t i o i ~ , is t h a t t,be objectives in

(Py-')

a n d (P',') differ only in a linear tern,. As a m a t t e r of fact, tlie linear t . e r ~ n s ill t h e objectives differ in coefficient only by

In passing froni

(Py-')

t o

(Py)

we therefore need only add t o the objective a linear tern1 with this vector as i t s coefficient vector, in order t o move toward calculating the new elements X L ' + ' ( s ) .

T h e reason this observation can be useful is t h a t it allows parametric techniques t o come ir1t.o play, particularly in the linear-quadratic case, in solving the modified scenario subproblerils. T h e work involved can thereby be reduced very significantly. Other possibilities for reducing effort could lie in t h e informa.tion structure a t hand.

If

scenarios s a.nd s' are almost the same, for instance if they are

(17)

indistinguisllable t o t h e decision maker until final time periods, then ( P r ) a n d ( P r , ) ought t o liave strong similarities. One might be able t o take advantage of an overlap in form t o increase efficiency in solviiig t h e two problems, o r a "bunch" of such problems. This is an idea t h a t can only be developed in terms of greater detail a b o u t the scenario subproblems t h a n we are ready t o explore in tlie present paper.

Let us now look a t problem ( P ) itself.

Proposition 3.3. In probleni (P) t h e feasible set

C

is nonempty a n d closed, t h e objective

F

is locally Lipschitz continuous on P , a n d all level sets of the form

(3.8)

{X

E

C I

^F(X)

^I

^{a ) ,} âÊÎR,

Proof. T h e assertions a b o u t C a n d

F

are obvious from the corresponding assumptions a b o u t C, a n d f , for each s E S. They imply t h e closedness of tlie sets (3.8). T h e boundedi~ess is verified by using t h e constallt 8 in Proposition 3.1 t o express t h e inequality

F ( X ) <

^a^{a s}

- 8 2 F ( X ) - 8 = C p , [ f , ( ~ ( s ) ) ^-a,].

r E

.'

T h i s inequality implies

f 8 ( x ( s ) )

5

a,

+

^[a

-

B]/p, for each s E S.

Any set (3.8) is therefore included in a set of the form

I-JIZEC*

I f 8 ( Z )

I

a,.+ [ " - ~ I / P . } ,

sE.5

wliere each factor is bounded by one of o u r basic assurrlptions. It follows t h a t any set (3.8) is coli~pact.

All sets of t h e form

are then compact too. In ( P ) we minimize

F

over C

n

4 , so this conipactness leads by tlie staiidard existence argument in t h e proof of Proposition 3.1 to t h e assurance t h a t , when C

n N # 0,

probleli~

(P) has an optimal solution and consequently finite optimal value.

A further observation a b o u t t h e nature of (P) will complete this section.

Proposition 3.4. In t h e convex case (P) is a (large-scale) problem of convex progranin~ing: the feasible set C is convex a n d t h e objective

F

is convex. In the linear-quadratic case ( P ) is a (large- scale) problem of linear o r quadratic programming: C is a polyhedron in C a n d

F

is linear o r (convex) quadratic.

Proof. In t h e first case

C

is a product of convex sets and

F

is a weighted s u m of convex functions.

In the second case

C

is a product of polyhedral sets, therefore polyhedral, a n d

F

is a weighted sum of functions t h a t are a t most quadratic, hence itself is a t most quadratic.

T h e large-scale n a t u r e of ( P ) , mentioned in Proposition 3.4, stems partly fro111 tlie very int.roduc- tion of scenarios in t h e mathematical model. As so011 a s one att,empts t o cover a variety of occurrences

(18)

that could influence tlie decision process, oue aliilost inevitably becomes interested in a scenario set S as large as technically can be managed in the calculation of solutions. Tlien in addition tliere is the presence of multiple time periods. This could itself lead to large-scale structure. Ea.cli of t,lie scenario subproblems ( P , ) might itself be a challelige. The fact that we shall be able to decoiiipose

(P)

into solving modified versions

(Py)

of such subproblems may in that situatioii seem to Iiave only a muted effect, even if parallel processing or tlie like is available for the subproble~ns. However, tlie principle developed in this paper need only cover an outer layer. The problems (PI:), with their multiple time periods but fixed (not %ncertainn) structure, could themselves be decomposed by otlier techniques. In particular we have in mind here tlie idea of algorithms based on the separable saddle point representations we have developed recently in 151.

4. O p t i m a l i t y a n d D u a l i t y .

The question of what conditioris can be used to cha.racterize optimal solutions t o ( P ) has t.o be a.ddressed for its own reasons, but it is critical also in the formulation of a notion of "approxi~ilate"

solutioii tha.t can be used in ilnplementing our algorit,hm. Tlie iiiterpretatio~i of tlie ri~ultiplier ele~iie~its W" in the algorithm is involved witli this matter as well.

To cover with adequate generality tlie diverse instances of tlie scenario subproble~ns ( P , ) that interest us, wliere j, miglit be a sn1oot.h fu~iction hut on the otlier hand miglit be colivex slid orily piecewise smooth, due to tlie introduction of pelialty terms, we use the noti011 of nonsrilootli aria.lysis.

Tlie syriibol a j , ( z ) will denote tlie set of generalized subgradients of j, at z , as defined by Clarke 161 for arbitrary Lipscliitz contiliuous functions. The reader does not need to understarid fully wliat this mearis in order to appreciate our results. Tlie main facts are simply that if j, liappeiis to be snlooth (continuously differentiable) tlie set

a

j,(z) consist,^ of the single vector V j , ( z ) , whereas if j, is coiivex a j , ( z ) is the usual subgradierit set in corivexity theory. In all cases a j , . ( z ) is a lionempty compact convex set that depends on z.

Sinlilarly the symbol N(:,(z) will denote tlie generalized normal cone to C, at z , as defined lor any closed set C, 161. If C, is convex, this is the normal cone of convex analysis. If C, is convex, this is the normal cone of coiivex analysis. If C,, whether convex or not, is defined by a systelii of snlooth constraints such that the Mangasarian-Fromovitz constraint qualification is satisfied a t z , then N(;,(z) is the polyhedral cone generated by the gradients of the active constraints a t z . (Nonnegative coefficients are used for the gradients of the active inequality constraints, of course, and arbitrary coefficients for the equality constraints.) The set Nc.:,(z) is always a closed convex cone colitai~iirig tlie zero vector, and it reduces solely to the zero vector if and oiily if z is all interior point of C,..

This notation and its interpretations can be carried over to C and

F

in problem

(P)

as well.

T h e o r e m 4.1. Let

X*

be a feasible solution to (P): one has X * E

U

and X * E

C ,

i.e.

(4.1) X'(s) E C, for all s E S.

(19)

-15-

Suppose t h a t X* is locally optimal a n d t h a t the following constraint qualification is satisfied a t X*:

( 4 . 2 ) The only element W E

M

satisfying - W ( s ) E NI;, ( X ' ( s ) ) for all s E S is W = 0.

Then there exists W * E

M

satisfying

a n d this is equivalent to

(4.4) ^-W * ( s ) E

a

f3 ( X * (s ) )

+

^{Nr:, (X*}( s ) ) for all ³E S.

In t h e convex case, t h e existence of such an elenient W * implies conversely t h a t X * is an optinial solution t o ( P ) (in t h e global sense).

Proof. T h e overall character of tliis result is not surprisilig, b u t its forn~ulation in terms of conditions ill ( 4 . 1 ) , (4.2) a n d (4.4) tliat colicern C , and f, for each s E S needs t o be cliecked for correctiiess.

T h e two crucial formulas which yield tliis formulation are

(4.5) aF(X) = { Y E f

1

Y ( s ) E a f , . ( x ( ~ ) ) for all s E

s),

(4.6) a N c ( X ) = { Y E

f I

^{Y ( s )}^EN , , ( X ( s ) ) for all s E S).

These are perhaps more subtle tlian may first appear, because subgradients and normal vectors depend by definition or1 tlie inner product being used iri tlie Euclidean space in question, and o u r iriner product.

(2.2) is a specially adapted one.

We can tliink of t h e Euclidean space

l

as t h e direct product of Euclidean spaces f , for s E S, where

l,

is

lRn

under t h e rescaled iliner product

Correspondingly

F

can b e viewed in t h e separable form

and

C

can b e viewed of course as t h e product of t h e sets

C,

in t h e spaces

l,.

According t o a general formula of nonsmooth analysis proved in Rockafellar 17, Prop. 2.5 and Corollary 2.5.1.1, one then has

(20)

(4.10) NC(X) =

n

a : . ( ~ ( ~ ) ) ,

n € . S

where the tilde is introduced t o indicate t h a t the subgradient set and normal cone are t o be taken relative t o the inner product (4.7) rather tlian t h e canonical one. I11 t h e case of tlie lloril~al coiles this modification makes no difference a t all, because the nature of a cone is not affected by a positive rescaling. T h u s (4.10) is equivalent t o (4.6). On the otlier hand

Z F ~ ( X ( ~ ) )

⁼{p,llz I z E a F , ( X ( s ) ) ) , a F n ( X ( s ) ) = { P ~ Y

1

^Y^Ea f A ( X ( s ) ) ) ,

so in the end we just have ZF,(X(S)) = a f , ( X ( s ) ) . Formula (4.9) therefore reduces t o (4.5).

Armed with (4.5) and (4.6) we can apply t h e general theory of necessary conditions in nonsmooth analysis t o problem ( P ) . Viewing ( P ) in terms of minimizing

F

(which is Lipscliitz coiitinuous by Proposition 3.3) over

C

n .A/, we first invoke t h e basic result [7, Corollary 2.4.11 t o conclude t h a t if X * gives a local lniiiiilium theii

Next we recall from 17, Corollary 8.1.21 t h a t

as long as there does not exist

(4.13) W E N u ( X * ) wilh - W E N c ( X * ) , W

#

0.

Formula (4.6) gives us N c ( X * ) , a n d since Jl is a subspace of

C,

tlie normal cone N N ( X ' ) is just t h e subspace orthogonally complementary t o .A/ (with respect t o t h e specified inner product for

C ) ,

namely

M.

T h e lionexistence of a vector W having the properties in (4.13) is thus tlie condit,ion we have set u p in (4.2) as t h e constraint qualification for ( P ) . T h e combination of (4.11) and (4.12) iiow comes down t o t h e assertion t h a t

(4.14) -W* E a F ( X * )

+

N c ( X * ) for some W * E

ht,

where the subgra.dient condition reduces by (4.5) and (4.6) t o the relations claimed in (4.4).

In t h e convex case, of course, all these subgradient calculations can be carried out in the less demanding context of convex analysis rather tlian general nonsmooth analysis. Tlie asserted conditiolis for optiniality, which are equivalent t o (4.14), are then sufficient because of t h e stronger meaning assigned t o subgradients and normal vectors in t h a t context. Specifically, (4.14) says t h a t for some

Y

^Ea F ( X * ) , which means

(4.15) F ( X )

>

F ( X * )

+

( X - X * , Y ) for all X E

C,

(21)

t h e vector W * - Y belongs t o N c ( X * ) , wliich means

( 4 . 1 6 ) ( X

-

X * , - W * - Y ) 5 0 for all X E

C .

Taking arbitrary X E

C

n U a n d using the fact t h a t ( X , W * ) = 0 and ( X * , W ) = 0 (because W E

ihl

slid

I

_{4 )}we see in ( 4 . 1 5 ) t h a t ( 4 . 1 6 ) implies ( X - X * , Y )

>

⁰and therefore F ( X ) 2 F ( X * ) .

T h u s X * is globally optimal for ( P ) in this case. 17

Theorem 4.2. In t h e convex case, t h e decomposed conditions (4.1) a n d (4.4) on a p a i r ( X ' , W ' ) E U x

M

are equivalent to ( X * , W * ) being a saddle point of the ordinary Lagrangian L ( X , W ) = F ( X )

+

^{( X ,}^{W )}relative t o nlinimizing over X E

C

a n d n l a x i ~ ~ l i z i n g over W E

M .

Proof. T h i s is just a small extension of the argunielit with which we concluded t h e preceding proof.

I t fits the s t a n d a r d patterns of convex analysis, so we omit i t .

Theorem 4.3. In t h e linear quadratic case, the constraint qualification in Theorem 4.1 is superfluous.

The condition given for optimality is always both necessary a n d sufficient.

Proof. In this case ( P ) is just a linear or quadratic programming problem, albeit of large size; cf.

Proposition 3 . 4 . In particular

C

is a polyhedrol~ and F is sniootli, so no c o n s t r a i l ~ t qualificatioli is needed for t h e general optilnality condition ( 4 . 3 ) t o be necessary. 17

As support for our algorithm we must develop optiniality conditions for t h e subproblellis ( P " ) and ( P y ) as well. Fortunately t h e circulnstallces ill these problems are closely parallel t o tlie ones already treated, so there is no call for going tlirougli tlie arguments in deta.il. We silnply stat,e t h e result,s wit,hout writing o u t tlie proofs.

Proposition 4.4.

If

a policy Xu+' is locally o p t , i ~ n a l for the subproblenl ( P"

1

nlini~llize F ( X )

+

( X , W L ' )

+

i r I ( X -

%L'1(2

over

C ,

i t satisfies

( 4 . 1 7 )

xu+'

^E

C

a n d - W Y - r [ X L ' + ' -

xL']

^E~ F ( x " + ' )

+

N ~ ( x ~ ' + ' ) , a n d this is equivalent to

( 4 . 1 8 ) X 1 ' + ' ( s ) E C , for all s E S,

In the convex case, this property of Xu+' implies conversely t h a t Xu+' is t h e unique (globally) optillla1 solution to ( P Y ) . Conditions (4.18) a n d (4.19) in fact characterize in the s a m e p a t t e r n t h e optilllality of X U + ' ( s ) for t h e subproblenl ( P y ) .

Tlle main point here is t h a t problem (PI') deconlposes illto t h e individual problenis (PL,'). Tlie conditions in Proposition 4.4 are the ones obtained for each ( P f ; ' ) . No constraint qualification is needed, because the subspace U is not involved.

(22)

Finally, t h e connection with duality in tlie convex case must be noted. T h e problerr~ d u a l t>o (P) with respect t o t h e ordinary Lagrangian L is

(Dl

maximize G ( W ) over all W E Dn M I

where

(4.20) C ( W ) = inf L ( X , W ) = inf

{ F ( X ) +

( X , W ) ) ,

x

EC X E C

T h e working o u t of t h e formula for the dual objective is not really relevant for our purposes. 111stea.d we are interested in t h e relationship between

( D )

and

(P)

insofar as it reflects on t h e character a n d interpretation of t h e multipliers W . T h e facts can be derived fronl the general dualiby theory for convex programming problems in 181. They focus most significantly on t h e ful~ctioli

(4.22) @ ( U ) = m i n { F ( X )

1 X

^E

C , KX

⁼U ) for U E

M.

This expresses t h e optimal value in a perturbed forni of ( P ) , where the implenieiitability constrn.i~it

XK

⁼0 is relaxed t o

K X

⁼U. Note t h a t t h e n i i ~ i i n ~ u m in tlie fornlula is indeed attained as long as there does exist a n

X

E

C

satisfying

KX

= U. T h i s is clear from the compactness in Propositioii 3.3. When tliere does not exist such an

XI

@ ( U ) is regarded a s m. Thus is extended-real-valued but nowhere ta.kes on -m (because of t h e a t t , a i l i n ~ e l ~ t of t h e ~~liniriiulii). Its donisin of f i l ~ i t e i ~ e s s is t l ~ e nonempty set

KC,

the projection of C on

M.

Proposition 4.5. The functional on

M

is lower senlicontinuous, in fact i t s level sets { [ I E M

1

^{@ ( U )}

5

^{a )}f o r a E IR are all compact. F u r t , l l e r ~ ~ ~ o r e

@ ( 0 ) = nlin (P)

(where mi11

(P)

is the optinla1 value in

(P)

a n d is interpreted a s m if

(P)

Bas n o feasible solution, i.e.

if

C

n .M =

0),

a n d

min @ ( U ) = 6 ,

l J E M

where 6 is t h e value in Proposition 2.1. In the convex case, cP is convex on M.

Proof. T h e level set {U E

M I

^{@ ( U )}5 a ) is simply t h e image under t h e projection

K

of t h e level set

{X

E

C 1 F ( X )

5 ^a).T h e latter is compact by Proposition 3.3, so the former is compact as well.

This point of view also makes obvious t h e fact t h a t t l ~ e minimum value of cP 011

M

is the sanie as tlie minimum value of

F

on

C ,

which is & by Proposition 3.1. T h e epigraph of cP is seen in t,he same way t o be t h e image of the epigraph of

F +

^6(:^{(with 6,;}t h e indicator of C ) under t h e extended project,ioii

(23)

(X, a ) ^H(KX, a ) from E x IR onto

M

x IR. In t h e convex case the epigraph of F

+

^6(:is a convex set, hence so is t h e epigraph of (9. T h u s (9 is a convex function.

Theorem 4.6. In the convex case the relation

(4.23) -co < min ( P ) = s u p ( D )

5

co holds, a n d moreover the set of all optimal solutions t o (D) is given by (4.24) a r g m a x ( D ) = -a(9(O)

= { W * E MI(9(U)

>

^@(O)^-^(U,W * ) for all U E A ) .

The elements W * in this set, if any, are precisely t h e ones associated with an optinlal solution X * t o ( P ) by t h e optililality conditions in Theorenl 4.1 o r Theorenl 4.2.

In particular t h e set (4.24) is nonempty if ( P ) h a s an optinial solution S* a n d the constraint qualification in Theorem 4.1 is satisfied. In t h e linear-quadratic case it is sure t o be nonenlpty just from ( P ) being feasible, i.e. having

C n N # 0.

Proof. T h i s specializes the theory in 18, $301 t o tlie present case. T h e assertions a b o u t the set (4.24) being nonempty are justified by Tlleorenls 4.1 and 4.2 (and t o sollle extent the existence ill Propositioli

3.3 of an optimal solution t o ( P ) when

C n N # 0).

U

T h e iniportalice of formula (4.24) in Theorem 4.6 is t h a t it identifies tlie optimal multipliers W' in o u r framework witli differential properties of the colivex fulictional (9. T h e subgradient inequa1it.y (4.25)

@(u) >

^@(O)^-(U, W * ) for all U E

M ,

furnishes us a means of seeing w h a t W * represent,^.

Let us go back t o tlie idea t h a t W * is an "information price systenin a n d give it t h e followillg, niore specific interpretation: W'(s) is a price vector t h a t can be used, if t h e scena.rio filially t u r n s out t o be s , t o take the decisioli X ( s ) , which liad t o be chosen a s part of a n implementa.ble policy X , and change i t with hindsight t o a different decision X 1 ( s ) = X ( s )

+

U ( s ) . T h e cost of this service is U ( s ) , W * ( s ) . Taking all possible scenarios into account with their various weights, and inlagiliil~g how one might want t o alter decisions after the fact in all cases, we come u p with t h e cost expression

C P . u ( s ) - w*(S) = (U, W * ) .

0 € .<

Only deviations U t h a t belong t o

M

need t o be considered, because all o t h e r aspects of the uncertain environment could already be taken care of in our model through t h e selection of X as an inlplenlen t able policy.

T h e inequality (4.25) expresses W * a s a system of "equilibrium" prices in t h e sense t h a t under this system there is no incentive for such a posteriori change of decisions.

A

change represented by a

(24)

deviation U E

M

would achieve @ ( U ) in place of @ ( O ) as the optimal value in the problem, b u t the cost of t h e change, as perceived a t the time of decision making, would be ( U , W e ) . T h e ]let result, for the decision maker would be @ ( U )

+

(U, W * ) . Because of t h e inequality in (4.25), there is no advalltage in this procedure a s conlpared with just accepting the implenlentability collstrailit X E Jl a n d t h e corresponding optimal value @(O).

In summary, t h e price systems W f are t h e ones t h a t would charge for hindsight everything it miglit, be worth. They d o therefore truly embody t h e value of information in the uncertain environnlent.

Tighter expressions t h a n (4.25) can be derived under additional assumptions. For instance, if W ' is unique then

w*

⁼^-V@(O).

We refer t o t h e theory of subgradients of convex functions in [8, 5523-251.

5 . Convergence in the Convex Case.

This section contains our main results. So as not t o overburden t,he reader wit11 all t,he details a t ollce, we begill with t h e forni of t h e a l g ~ r i t ~ h l n ill which exact solutiol~s are calculated for t h e subproblerlis in S t e p 2. Tliis is referred t o a s t h e case of exact n ~ i n i n ~ i z a t i o n , in c ~ n t ~ r a s t t o t h e case of a p p r o x i l ~ ~ a t e nlinin~ization t h a t will be trea.ted afterward. Exact ~ i l i ~ ~ i l n i z a t i o n , as tlie reader will recall, is entirely appropriate wliell t h e scenario subprobielns fall witllill tlie real111 of linear or quadra.tic prograrllrllilig and are not tliemselves of larger scale.

Theorem 5.1. Consider the a l g o r i t h l ~ ~ in the convex case with exact ~ ~ ~ i n i l l l i z a t , i o l l . Let {xL'),30. I

a n d {WL')F=l be the sequences it generates from arbitrary initial X " E C a n d W" E .hi. (In particular X " ( s ) could be obtained by solving

(P,),

but that is not essential here.)

l'hese sequences will be bounded if a n d only if optinla1 solutions exist for t h e s u h p r o b l e n ~ s ( P ) a n d ( D ) , i.e. there exist elelllents X * a n d W' satisfying the optinlality conditions in Theorell1 4.1, or equivalently t h e saddle point condition in Theoren] 4.2. In t h a t case, for some particular pair of such elements X * a n d W' (even though optinla1 solutions t o ( P ) a n d ( D ) might n o t be unique), i t will be true t h a t

(5.1 )

2"

^-+ X * a n d W" ^-+W * . Furthermore, in terlns of t h e nornl expression

one will have in every iteration v = 0 , 1 , 2 , .

. .

t h a t

( 5 . 3 ) ~ ~ ( ~ " + l j ~ u + l ) - ( X * , W * ] l l r

5 ~ ~ ( ~ " , w ~ )

^-( X * , W * ) l l r ,

with strict inequality unless

(RV,

W V ) = ( X * , W * ) .

(25)

Thus every iteration of t h e algorithm f r o n ~ the s t a r t makes a definite improvement until solutions are attained (if t h a t occurs in finitely m a n y steps). One will also have in every iteration v = 1 , 2 , .

. .

t h a t

(5.4) 1 1 ( 2 " + ~ ,

w"+')

^-

(A+,

WW)Ilr

5 11(2",

W") -

(2'.,-', ww-')Ilr

Proof. A sliglit shift of notation will be useful. Let

and rescale all multiplier vectors by

-

Y

-

(5.6) W = r-'W, W = r-'W", W = r-'W* (etc.).

O u r essential line of argument will be t o show t h a t in terms of the genera.tion of t h e sequence {(V", W Y

)I:==,

our procedure can be construed as a certain instance of t h e proximal point algorithm in Rockafellar [9]. It will be t h e form of t h a t algorithm obtained from t h e maximal monotolie operator associated with t h e projected saddle furictiol~

(5.8) t , ( V , W ) = i n f { r - ' F ( x ) +

(x,W) I X E C , R =

V ) for V E N,

W E M

In this we are claiming t h a t

(5.9) (V"+',WY+') is t h e (unique) saddle point of

-' ²

t,v(V,

W)

⁼

&(v, W) + $ 1 1 ~

^-

v " I ( ~

^-

;IIW

^-W

11

subject t o minimizing in V E N and maximizing in

W

E

M

T h e nature of t h e function

&

must be clarified first.

We can write t h e formula for

&

in (5.8) alternatively as

where

cpr (V, U ) =r-'F(x)

+

bc(X) for the unique X E

f

having J X = V, K X = IT.

This just amounts t o a 'change of coordinatesn corresponding t o t h e orthogonal decomposition

f

= N x

M

which expresses t h e closed proper convex funct,ion r - ' F

+

bc (where bc is the indicator of

Scenarios and Policy Aggregation in Optimization under Uncertainty

W O R K I N G P A P E R