W O R K I I G P A P E R
ASYMPTOTIC BEHAVIOR OF STATISTICAL ESTIMATYIRS
AND
OPTIMAL SOLUTIONS FOR STOCHASTIC OPTIMIZATION PROBLEMSJitka D'Upa&v&
Roger Wets
1 lASA
. Lm....
I n t e r n a t i o n a l I n s t i t u t e for Applied Systems Analysis
NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS
ASYMPTOTIC BEHAVIOR OF STATISI'ICAL ESIlkIATORS AND OPTIMAL SOLUTIONS FOR STOCHASTIC OPTIMIZATION PROBLEWS
J i t k a DupaE ovh Roger Wets
August 1986 WP-86-41
Working P a p e r s a r e interim r e p o r t s on work of t h e I n t e r n a t i o n a l I n s t i t u t e f o r Applied Systems Analysis a n d h a v e r e c e i v e d only limited review. Views o r opinions e x p r e s s e d h e r e i n d o not n e c e s s a r i l y r e p r e s e n t t h o s e of t h e Institute o r of i t s National Member Organizations.
INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria
FOREWORD
This p a p e r p r e s e n t s t h e f i r s t r e s u l t s o n a new s t a t i s t i c a l a p p r o a c h to t h e p r o b l e m of i n c o m p l e t e i n f o r m a t i o n in s t o c h a s t i c p r o g r a m m i n g . T h e t o o l s of nondif- f e r e n t i a b l e o p t i m i z a t i o n u s e d h e r e h e l p to p r o v e t h e c o n s i s t e n c y of ( a p p r o x i m a t e ) o p t i m a l s o l u t i o n s b a s e d o n a n i n c r e a s i n g i n f o r m a t i o n o n t h e t r u e p r o b a b i l i t y d i s t r i - b u t i o n without u n n a t u r a l s m o o t h n e s s a s s u m p t i o n s . T h e y also allow to t a k e f u l l y i n t o a c c o u n t t h e p r e s e n c e of c o n s t r a i n t s .
A l e x a n d e r B. K u r z h a n s k i C h a i r m a n S y s t e m a n d D e c i s i o n S c i e n c e s P r o g r a m
CONTENTS
1 I n t r o d u c t i o n 2 E x a m p l e s
3 C o n s i s t e n c y : C o n v e r g e n c e of Optimal S o l u t i o n s R e f e r e n c e s
ASYMPTOTIC BEHAVIOR OF STATISIICAL ESTIMATORS
AND
OPTIMAL SOLUTIONS FORSTOCHASTIC OPTIMIZATION PROBUDIS J i t k a D u p a E o v & a n d R o g e r Wets
The c a l c u l a t i o n of e s t i m a t e s f o r v a r i o u s s t a t i s t i c a l p a r a m e t e r s h a s b e e n o n e of t h e main c o n c e r n s of S t a t i s t i c s s i n c e i t s i n c e p t i o n , a n d a n u m b e r of e l e g a n t f o r - mulas h a v e b e e n d e v e l o p e d to o b t a i n s u c h e s t i m a t e s i n a n u m b e r of p a r t i c u l a r in- s t a n c e s . Typically s u c h cases c o r r e s p o n d to a s i t u a t i o n when t h e r a n d o m phenomenon i s u n i v a r i a t e in n a t u r e , a n d t h e r e are n o "active" r e s t r i c t i o n s o n t h e e s t i m a t e of t h e unknown s t a t i s t i c a l p a r a m e t e r . However, t h a t i s n o t t h e case in g e n e r a l , many e s t i m a t i o n p r o b l e m s are m u l t i v a r i a t e i n n a t u r e a n d t h e r e are res- t r i c t i o n s o n t h e c h o i c e of t h e p a r a m e t e r s . T h e s e c o u l d b e simple n o n n e g a t i v i t y c o n s t r a i n t s , b u t also much m o r e complex r e s t r i c t i o n s involving c e r t a i n mathemati- cal r e l a t i o n s b e t w e e n t h e p a r a m e t e r s t h a t n e e d to b e e s t i m a t e d . C l a s s i c a l t e c h - n i q u e s , t h a t c a n s t i l l b e u s e d to h a n d l e least s q u a r e e s t i m a t i o n with l i n e a r e q u a l i t y c o n s t r a i n t s o n t h e p a r a m e t e r s f o r e x a m p l e , b r e a k down if t h e r e are i n e q u a l i t y c o n s t r a i n t s or a n o n d i f f e r e n t i a b l e c r i t e r i o n f u n c t i o n . In s u c h cases o n e c a n n o t e x - p e c t t h a t a simple f o r m u l a will yield t h e r e l a t i o n s h i p b e t w e e n t h e s a m p l e s a n d t h e b e s t e s t i m a t e s . Usually, t h e latter must b e found b y solving a n optimization p r o b - lem. N a t u r a l l y t h e s o l u t i o n of s u c h a p r o b l e m d e p e n d s o n t h e c o l l e c t e d s a m p l e s a n d o n e i s c o n f r o n t e d with t h e q u e s t i o n s of t h e c o n s i s t e n c y a n d of t h e a s y m p t o t i c
b e h a v i o r of s u c h e s t i m a t o r s . This i s t h e s u b j e c t of t h i s a r t i c l e .
To o v e r c o m e t h e t e c h n i c a l p r o b l e m s c a u s e d b y t h e i n t r i n s i c l a c k of smooth- n e s s , we r e l y o n t h e g u i d e l i n e s a n d t h e tools p r o v i d e d b y t h e o r y of n o n d i f f e r e n t i - a b l e optimization. In f a c t , t h e p r o b l e m of p r o v i n g c o n s i s t e n c y of t h e e s t i m a t o r s , a n d t h e s t u d y of t h e i r a s y m p t o t i c b e h a v i o r i s c l o s e l y r e l a t e d to t h a t of o b t a i n i n g c o n f i d e n c e i n t e r v a l s f o r t h e s o l u t i o n of s t o c h a s t i c optimization p r o b l e m s when t h e r e i s o n l y p a r t i a l i n f o r m a t i o n a b o u t t h e p r o b a b i l i t y d i s t r i b u t i o n of t h e r a n d o m c o e f f i c i e n t s of t h e p r o b l e m . In f a c t i t was t h e n e e d to d e a l with t h i s class of p r o b -
lems t h a t originally motivated t h i s s t u d y . W e s h a l l s e e in S e c t i o n 2 t h a t s t o c h a s t i c optimization problems as well as t h e problem of finding s t a t i s t i c a l e s t i m a t o r s are t w o i n s t a n c e s of t h e following g e n e r a l c l a s s of problems:
find x E R n t h a t minimizes E t f ( x ,
- 4)
j ,w h e r e f : Rnx Z -4 R
y 1 +
o o j i s a n e x t e n d e d r e a l valued function a n d- #
i s a random v a r i a b l e with v a l u e s in E; f o r m o r e d e t a i l s see S e c t i o n 3. I t i s implicit in t h i s f o r - mulation t h a t t h e e x p e c t a t i o n i s c a l c u l a t e d with r e s p e c t to t h e t r u e p r o b a b i l i t y d i s t r i b u t i o n P of t h e random v a r i a b l e- #,
w h e r e a s in f a c t a l l t h a t i s known i s a c e r - t a i n a p p r o x i m a t e P V . Our o b j e c t i v e i s to s t u d y t h e b e h a v i o r of t h e optimal solution (estimate) x V , o b t a i n e d b y solving t h e optimization problem using P V i n s t e a d of P to c a l c u l a t e t h e e x p e c t a t i o n , when t h e { P V , v=
1,...
j i s a s e q u e n c e of p r o b a b i l i t y m e a s u r e s c o n v e r g i n g to P. I n S e c t i o n 3 w e give conditions u n d e r which c o n s i s t e n c y c a n b e p r o v e d . C o n s t r a i n t s o n t h e c h o i c e of t h e optimal x are i n c o r p o r a t e d in t h e formulation of t h e problem b y allowing t h e function f to t a k e o n t h e value+
w. The r e s u l t s are o b t a i n e d without e x p l i c i t r e f e r e n c e to t h e form of t h e s e c o n s t r a i n t s .T h e r e i s of c o u r s e a s u b s t a n t i a l s t a t i s t i c a l l i t e r a t u r e dealing with t h e ques- t i o n s b r o a c h e d h e r e , beginning with t h e seminal a r t i c l e of Wald (1949) a n d t h e work of H u b e r (1967) on maximum likelihood e s t i m a t o r s . Of more d i r e c t p a r e n t a g e , at l e a s t as f a r as formulation a n d u s e of mathematical t e c h n i q u e s , i s t h e work o n s t o c h a s t i c programming p r o b l e m s with p a r t i a l information. Wets (1979) r e p o r t s some p r e l i m i n a r y r e s u l t s , f u r t h e r developments were p r e s e n t e d at t h e 1 9 8 0 meet- ing o n s t o c h a s t i c optimization at IIASA (Laxenburg, A u s t r i a ) a n d r e c o r d e d in Solis a n d Wets (1981), see a l s o DupaEovA (1983a, b ) a n d (1984b) f o r a s p e c i a l case. In a p r o j e c t e d p a p e r w e s h a l l d e a l with e s t i m a t e s of t h e c o n v e r g e n c e rates, as well as with t h e c o n v e r g e n c e of t h e a s s o c i a t e d L a g r a n g i a n function.
2.
EXAMPLES
The r e s u l t s a p p l y equally well to estimation or s t o c h a s t i c optimization p r o b - lems with or without c o n s t r a i n t s , with d i f f e r e n t i a b l e or n o n d i f f e r e n t i a b l e c r i t e r i o n function. However, t h e e x a m p l e s t h a t w e d e t a i l h e r e are t h o s e t h a t f a l l o u t s i d e t h e c l a s s i c a l mold, viz. u n c o n s t r a i n e d smooth problems.
R e s t r i c t i o n s on t h e s t a t i s t i c a l estimates o r t h e optimal decisions of s t o c h a s t i c optimization problems, follow from t e c h n i c a l a n d modeling c o n s i d e r a t i o n s as well as n a t u r a l s t a t i s t i c a l assumptions. The l e a s t s q u a r e estimation problem with l i n e a r equality c o n s t r a i n t s , a b a s i c s t a t i s t i c a l method, see e.g. R a o (1965), c a n b e solved by a usual tools of d i f f e r e n t i a l calculus. The inequality c o n s t r a i n t s however i n t r o - d u c e a lack of smoothness t h a t d o e s n o t allow u s t o fall b a c k on t h e old stand-bys.
In Judge a n d Takayama (1966), Liew (1976) t h e t h e o r y of q u a d r a t i c programming i s used t o e x h i b i t a n d d i s c u s s t h e s t a t i s t i c a l p r o p e r t i e s of l e a s t s q u a r e e s t i m a t e s sub- j e c t t o inequality c o n s t r a i n t s f o r t h e case of l a r g e a n d small samples.
In connection with t h e maximum likelihood estimation, t h e case of p a r a m e t e r r e s t r i c t i o n s i n t h e form of smooth nonlinear equations was s t u d i e d by Aitchinson a n d Silvey (1958) including r e s u l t s on asymptotic normality of t h e estimates. The Lagrangian a p p r o a c h w a s f u r t h e r developed by Silvey (1959), e x t e n d e d t o t h e case of a multisample s i t u a t i o n by S e n (1979) including analysis of t h e situation when t h e t r u e p a r a m e t e r value d o e s n o t fulfill t h e c o n s t r a i n t s ( t h e nonnull c a s e ) .
Typically o n e must t a k e i n t o a c c o u n t in t h e estimation of v a r i a n c e s and v a r i - a n c e components nonnegativity r e s t r i c t i o n s . Unconstrained maximum likelihood estimation in f a c t o r analysis a n d in more complicated s t r u c t u r a l a n a l y s i s models, s e e e.g. Lee (1980), may l e a d t o negative e s t i m a t e s of t h e v a r i a n c e s . Replacing t h e s e u n a p p r o p r i a t e e s t i m a t e s by z e r o s gives estimates which a r e n o l o n g e r op- timal with r e s p e c t t o t h e c h o s e n fitting function. Similarly, t h e r e i s a problem of g e t t i n g negative e s t i m a t e s of v a r i a n c e components, see Example 2.3. In s t a t i s t i c a l p r a c t i c e , t h e s e nonpositive v a r i a n c e estimates are usually fixed at z e r o a n d t h e d a t a i s eventually r e a n a l y z e d . In g e n e r a l , s u c h a n a p p r o a c h may l e a d t o plausible r e s u l t s in c a s e of estimating one r e s t r i c t e d p a r a m e t e r only a n d i t i s mostly unap- p r o p r i a t e i n multi-dimensional situations; see e.g. t h e e v i d e n c e given by Lee
(1980).
The possibility of using mathematical programming techniques t o g e t con- s t r a i n e d estimates w a s e x p l o r e d by A r t h a n a r i a n d Dodge (1981). As mentioned i n t h e introduction w e use mathematical programming t h e o r y not only t o g e t inequali- t y c o n s t r a i n e d e s t i m a t e s b u t t o g e t asymptotic r e s u l t s f o r a l a r g e c l a s s of decision a n d estimation problems which contains, i n t e r a l i a , r e s t r i c t e d M-estimates and sto- c h a s t i c programming with incomplete information. In comparison with t h e r e s u l t s of a d h o c a p p r o a c h e s valid mostly f o r one-dimensional r e s t r i c t e d estimation o u r method c a n b e used f o r high-dimensional cases a n d without u n n a t u r a l smoothness assumptions, in s p i t e of t h e f a c t t h a t t h e violation of d i f f e r e n t i a b i l i t y assumptions
c a n n o t b e easily bypassed by t h e use of d i r e c t i o n a l d e r i v a t i v e s (in c o n t r a s t t o t h e one-dimensional c a s e ) .
EXAMPLE 2 . 1 Inequality constrained least squares estimation of regres- sion coe.f'$icients. Assume t h a t t h e d e p e n d e n t v a r i a b l e y c a n b e explained o r p r e d i c t e d on t h e b a s e of information provided by independent v a r i a b l e s x l ,
. . .
, x p . In t h e simplest case of l i n e a r model, t h e o b s e r v a t i o n s y, on y are sup- posed t o b e g e n e r a t e d a c c o r d i n g t ow h e r e
el, . . .
,pp
a r e unknown p a r a m e t e r s t o b e estimated, E , , j=
1,. . .
, v, d e n o t e t h e o b s e r v e d values of r e s i d u a l and X=
(xl,) i s a (p, v ) matrix whose rows c o n s i s t of t h e o b s e r v e d v a l u e s of t h e independent v a r i a b l e s .In t h e p r a c t i c a l implementation of t h i s model, t h e r e may b e in addition some a p r i o r i c o n s t r a i n t s imposed on t h e p a r a m e t e r s s u c h as nonnegativity c o n s t r a i n t s on t h e e l a s t i c i t i e s , see Liew (1976), a r e q u i r e d p r e s i g n e d positive d i f f e r e n c e between input a n d o u t p u t tonnage d u e t o t h e meeting loss, A r t h a n a r i a n d Dodge (1981). As- sume t h a t t h e s e c o n s t r a i n t s are of t h e form
where A(m, p ) , c(m, 1 ) a r e given m a t r i c e s . The use of t h e least s q u a r e s method l e a d s t o t h e optimization problem:
2
minimize J = I
z " I
y,-
i = 1f
xi,pi]
s u b j e c t t o
f
akl 5 ck, k=
1.. . . .
m ,1 =1
which c a n b e solved by q u a d r a t i c programming techniques.
In o u r g e n e r a l framework, problem (2.1) c o r r e s p o n d s t o t h e case of o b j e c t i v e function:
=+
o t h e r w i s ewith t h e P V t h e e m p i r i c a l d i s t r i b u t i o n s .
Alternatively, minimizing t h e sum of absolute e r r o r s c o r r e s p o n d s t o t h e op- timization problem
s u b j e c t t o
5
anPi
5 ck.
1 5 k 5 m.
i =1
which c a n b e solved by means of t h e simplex method f o r l i n e a r programming, see e.g. A r t h a n a r i a n d Dodge (1981). The formulation of (2.3) i s again based o n t h e em- p i r i c a l d i s t r i b u t i o n function P v , t h e o b j e c t i v e functions is:
=+
o t h e r w i s eNote, t h a t t h i s function f i s not d i f f e r e n t i a b l e on S.
Finally, when robustizing t h e l e a s t s q u a r e s a p p r o a c h , instead of minimizing a sum of s q u a r e s a sum of l e s s r a p i d l y i n c r e a s i n g functions of r e s i d u a l s i s minimized, see e . g . H u b e r (1973):
minimize p y,
- 5
xi,J =1
[
i = ls u b j e c t l o
2
skiPI
5 c k , 1 5 k S m.
1 =1
The function p i s assumed t o b e convex, non-monotone a n d t o p o s s e s s bounded d e r i v a t i v e s of sufficiently high o r d e r , e.g.
p(u)
=
-u2 12 f o r J u ( < c
= c ) u l - - c 2 1 f o r ) u 1 5 c
.
2
This a l s o f i t s t h e g e n e r a l framework; t h e o b j e c t i v e function is:
=+
= o t h e r w i s eand t h e empirical d i s t r i b u t i o n function P V i s again used t o o b t a i n (2.5).
EXAMPLE 2.2 Heywood cases i n factor analysis. The model f o r confirmative f a c t o r analysis ( J o r e s k o g (1969)) is
where x(n, 1 ) i s a column v e c t o r containing t h e o b s e r v e d v a r i a b l e s , f i s a column v e c t o r containing t h e k common f a c t o r s , e ( n , 1 ) i s a column v e c t o r containing t h e individual p a r t s of t h e o b s e r v a b l e s components a n d A(n, k ) i s t h e matrix of f a c t o r loadings. I t is assumed t h a t f and e are normally d i s t r i b u t e d with mean z e r o , v a r f
=
8 a n d v a r e = Q, which i s diagonal. Consequently, x i s normally d i s t r i b u t e d with mean z e r o a n d with t h e v a r i a n c e matrixThe p a r a m e t e r v e c t o r c o n s i s t s of t h e f r e e elements of A, 9 a n d cP and i t should b e estimated using t h e sample v a r i a n c e matrix S of o b s e r v a b l e s x. This is done by minimizing a s u i t a b l e fitting function, s u c h as
f l ( z , S)
=
logI +
t r ( S C - l )-
log ( SI -
n (2.8) ( t h e maximum likelihood method), o rwhere V i s a matrix of weights ( t h e weighted l e a s t s q u a r e s method). Evidently, both (2.8) a n d (2.9) with (2.7) s u b s t i t u t e d f o r C, are o b j e c t i v e functions of non- t r i v i a l unconstrained optimization problems, which c a n b e solved by d i f f e r e n t methods s u c h a s t h e method of Davidon-Fletcher-Powell ( s e e F l e t c h e r a n d Powell (1963) o r by t h e Gauss-Newton algorithm. In p r a c t i c e , however, a b o u t o n e t h i r d of t h e d a t a yield o n e o r more nonpositive estimates of t h e diagonal elements of t h e matrix 9, which a r e individual v a r i a n c e s . These solutions are called Heywood cases and t o d e a l with them, (2.8) or (2.9) should b e minimized u n d e r conditions
2 0 , i
=
1,. . .
, n. Thus t h e a p p r o p r i a t e formulation defines f as follows:=+ =
o t h e r w i s eand similarly f o r f2.
EXAMPLE 2.3 N e g a t i v e e s t i m a t e s of v a r i a n c e c o m p o n e n t s . Consider a gen- e r a l l i n e a r model with random e f f e c t s
where y ( v , 1 ) i s t h e v e c t o r of o b s e r v a t i o n s on t h e v a r i a b l e y , Z(v, r ) , Xi(v, r i ) , i
=
1. . .
, p a r e mutually u n c o r r e l a t e d random v e c t o r s with Epi =
0 , v a rpi =
u f I r , , i =1,.
..
, p and Er = O . v a r r=
U ~ I , , a n d 71,. . .
7rs uoB--.n u p 2 2 a r e unknown p a r a m e t e r s t o b e estimated.One of t h e simplest examples i s t h e following v a r i a n c e a n a l y s i s model f o r r a n - dom e f f e c t one-way classification: Consider k populations w h e r e t h e j-th measure- ment ( o b s e r v a t i o n ) in t h e i-th population is given by
In (2.11), p i s t h e fixed e f f e c t , a i , i
=
1,. . .
, k , i s t h e random e f f e c t of t h e i-th po- pulation a n d el, i s r e s i d u a l . Random v a r i a b l e s a l ,. . .
, a k a n d e l l ,. . .
, e k n are in- d e p e n d e n t with d i s t r i b u t i o n s N(0, 0:) and N(0, u:), r e s p e c t i v e l y . The p a r a m e t e r s p, u z , u z are t o b e estimated. The t r a d i t i o n a l e s t i m a t e s of t h e v a r i a n c e components u:, u: in model (2.11) are o b t a i n e d by a simple p r o c e d u r e : o n e e q u a t e s t h e mean s q u a r e sand
1 1 T k
w h e r e f i e
= - CTzl
yi,. i=
I .. . . .
k , a n d7.. = ;;i; Li C;=I
yi,, with t h e i r ex- np e c t a t i o n s u: a n d u:n
+
u: t h a t give t h e e s t i m a t e sWhereas sz i s evidently nonnegative, t h i s need n o t b e t h e case of
si,
s o t h a t t h e problem of negative e s t i m a t e of t h e v a r i a n c e component s? comes t o t h e f o r e .The r e s u l t i n g e s t i m a t e s (2.12), (2.13) of t h e v a r i a n c e components in (2.11) fol- low a l s o as a s p e c i a l r e s u l t of t h e MIVQUE and MINQUE estimation developed f o r t h e g e n e r a l model (2.10): Unbiased estimates of a l i n e a r p a r a m e t r i c function
zf,o
ofqi a r e s o u g h t in t h e form y T ~ y whereAZ
=
0 , A(v, v) i s symmetric matrix (2.14)a n d which a r e optimal in some s e n s e . The MIVQUE e s t i m a t e s c o r r e s p o n d to a matrix A t h a t minimizes t h e v a r i a n c e of y T ~ y s u b j e c t t o t h e conditions (2.14) a n d t h e MINQUE e s t i m a t e s c o r r e s p o n d to a matrix A t h a t minimizes tr(A(1
+ zf=l
Xi x:))'s u b j e c t t o conditions (2.14). In none of t h e mentioned a p p r o a c h e s , however, t h e na- t u r a l nonnegativity c o n s t r a i n t s on t h e estimates of t h e v a r i a n c e s a:, i
=
1,. . .
P I a r e i n t r o d u c e d explicitly.Again, t h e r e are two possible explanations of negative e s t i m a t e s of v a r i a n c e components: t h e model may b e i n c o r r e c t or a s t a t i s t i c a l noise o b s c u r e d t h e under- laying situation. Among o t h e r s , H e r b a c h (1959) a n d Thompson (1962) s t u d i e d v a r i - a n c e analysis models with random e f f e c t s by means of d i f f e r e n t v a r i a n t s of t h e maximum likelihood method u n d e r nonnegativity c o n s t r a i n t s . Correspondingly, in t e r m s of t h e g e n e r a l model, w e h a v e f o r i n s t a n c e
-- nk -- k -- k(n -1)
f(a,2, a:, P, Y)
=
(.rr) (0: +nu:) (a:) 2- -- -
o t h e r w i s e ,I :
Similarly, nonnegative MINQUE a n d MIVQUE e s t i m a t e s are of i n t e r e s t . e x p
--
1EXAMPLE 2 . 4 M-estimates. Let 8 b e a given locally compact p a r a m e t e r s e t , ( Z , A , P ) a p r o b a b i l i t y s p a c e a n d f : E9 x Z -+ R a given function. F o r a sample
Itl, . . .
, from t h e c o n s i d e r e d distribution, a n y estimate TV=
TV(C1,. . . .
Cv)E O defined by condition k n
C C
( ~ 1 ,-
P ) ~-
0:20: I = l J = l U: +nu:
"li:
= I .I,- upr]J
v
T" E argmin f(T t j )
j = 1
i s called a n M-estimate. In t h e pioneering p a p e r by H u b e r (1967) ( s e e a l s o H u b e r (1981)), n o n s t a n d a r d s u f f i c i e n t conditions were given u n d e r which jl"j c o n v e r g e s a.s. ( o r in p r o b a b i l i t y ) to a c o n s t a n t go E 8 a n d asymptotic normality of G ( T '
-
g o ) w a s p r o v e d u n d e r assumption t h a t 8 i s a n o p e n set.The problem (2.15) i s evidently a s p e c i a l case of o u r g e n e r a l framework; t h e P v again c o r r e s p o n d to t h e empirical d i s t r i b u t i o n functions a n d w e h a v e uncon- s t r a i n e d c r i t e r i o n function. W e s h a l l aim to remove both of t h e s e assumptions to g e t r e s u l t s valid f o r a whole c l a s s of p r o b a b i l i t y m e a s u r e s P v estimating P , which c o n t a i n s t h e empirical p r o b a b i l i t y m e a s u r e c o n n e c t e d with t h e o r i g i n a l definition
(2.15) of M-estimates, a n d f o r c o n s t r a i n e d estimates.
EXAMPLE 2.5 S t o c h a s t i c o p t i m i z a t i o n w i t h incompLete i q f o r m a t i o n . Con- s i d e r t h e following decision model of s t o c h a s t i c optimization:
Given a p r o b a b i l i t y s p a c e ( Z , A , P ) , a random element
<
o n Z, a measurable function f : R n x E -4 R a n d a set S cRnminimize E l f ( x , C)j
=
J f ( x , C)P(d<) on t h e set S c R n.
(2.16) 2!A wide v a r i e t y of s t o c h a s t i c optimization problems, e.g., s t o c h a s t i c p r o g r a m s with r e c o u r s e or p r o b a b i l i t y c o n s t r a i n e d models ( s e e e.g. Dempster (1980), Ermo- liev et a l . (1985), Kall (1976), P r d k o p a (1973), W e t s (1983)) f i t i n t o t h i s a b s t r a c t framework.
In many p r a c t i c a l s i t u a t i o n s , however, t h e p r o b a b i l i t y m e a s u r e P need not b e known completely. One possibility how t o d e a l with s u c h a s i t u a t i o n i s t o estimate t h e optimal solution x* of (2.16) by a n optimal solution of t h e problem
minimize
J
f ( x , C) P V ( d < ) o n t h e set S c R n Pwhere P v i s a s u i t a b l e e s t i m a t e of P b a s e d on t h e o b s e r v e d d a t e s . In t h i s c o n t e x t , t h e r e are d i f f e r e n t possibilities to e s t i m a t e o r a p p r o x i m a t e P a n d t h e u s e of em- p i r i c a l d i s t r i b u t i o n i s only o n e of them. The c a s e of P belonging to a given p a r a m e t r i c family of p r o b a b i l i t y m e a s u r e s b u t with a n unknown p a r a m e t e r v e c t o r w a s s t u d i e d e.g. i n DupaEovh (1984a, b).
F o r problem (2.16), l a r g e dimensionality of t h e decision v e c t o r x i s typical.
This c i r c u m s t a n c e t o g e t h e r with nondifferentiability ( o r e v e n with noncontinuity) of f a n d with t h e p r e s e n c e of c o n s t r a i n t s r a i s e s qualitatively new problems.
3. CONSISTENCY: CONVERGENCE OF OPTIMAL SOLUTIONS
From a c o n c e p t u a l viewpoint o r f o r t h e o r e t i c a l p u r p o s e s , i t i s convenient as well as e x p e d i e n t to s t u d y problems of s t a t i s t i c a l estimation as well as s t o c h a s t i c optimization problems with p a r t i a l information, in t h e following g e n e r a l framework.
Let ( Z , A , P ) b e a p r o b a b i l i t y s p a c e , with Z
-
t h e s u p p o r t of P-
a closed s u b s e t of a Polish s p a c e X , a n d A t h e Bore1 sigma-field r e l a t i v e to Z; w e may think of Z as t h e set of possible v a l u e s of t h e random elementt
defined o n t h e p r o b a b i l i t y s p a c e of e v e n t s ( Q , A ',p').
If P i s known, t h e problem i s to:find x* E
R n
t h a t minimizes Ef (x) , (3.1)where
a n d
i s a random lower semicontinuous function; w e set
whenever
t
k f ( x , t ) i s n o t bounded a b o v e by a summable (extended real-valued) function. W e r e f e r todom E f :
=
[x lEf(x)< -1
as t h e eflective d o m a i n of Ef. P o i n t s t h a t d o n o t belong t o dom Ef c a n n o t minimize Ef and t h u s are e f f e c t i v e l y excluded from t h e optimization problem (3.1). Hence, t h e model makes s p e c i f i c provisions f o r t h e p r e s e n c e of c o n s t r a i n t s t h a t may limit t h e c h o i c e of x . Note t h a t by definition of t h e i n t e g r a l , w e always h a v e
dom Ef c l x I f ( x , t )
< -
a.s.1An e x t e n d e d real-valued function h :
R n
--,=
[ - -,-1
i s s a i d t o b e proper ifh
>-
00 a n d n o t i d e n t i c a l l y+
=; i t i s l o w e r s e m i c o n t i n u o u s (1.sc.) at x if f o r a n y s e q u e n c e (x )[=1, k c o n v e r g i n g to xlim inf h ( x k ) 2 h ( x ) , k - + -
w h e r e t h e q u a n t i t i e s involved c o u l d b e
=
or-=.
T h e e x t e n d r e a l - v a l u e d f u n c t i o n f d e f i n e d o n R n X Z i s a r a n d o m l o w e r s e m i c o n t i n u o u s f i L n c t i o n iff o r a l l ( E
r ,
f ( . , () i s l . s c . (3.31)f i s Bn 63 A
-
m e a s u r a b l e (3.3ii)w h e r e Bn i s t h e Bore1 sigma-field o n Rn. This c o n c e p t , u n d e r t h e name of "normal i n t e g r a n d " , w a s i n t r o d u c e d b y R o c k a f e l l a r (1976), as a g e n e r a l i z a t i o n of C a r a t h e o - d o r y i n t e g r a n d s , to h a n d l e p r o b l e m s in t h e Calculus of V a r i a t i o n s a n d Optimal Con- t r o l T h e o r y . When d e a l i n g with p r o b l e m s of t h a t t y p e , as well as s t o c h a s t i c optimi- z a t i o n p r o b l e m s s u c h as (3.1), t h e t r a d i t i o n a l tools of f u n c t i o n a l a n a l y s i s are n o l o n g e r q u i t e a p p r o p r i a t e . T h e c l a s s i c a l g e o m e t r i c a l a p p r o a c h t h a t a s s o c i a t e s func- t i o n s wiLh t h e i r g r a p h must b e a b a n d o n e d in f a v o r of a new g e o m e t r i c a l viewpoint t h a t a s s o c i a t e s f u n c t i o n s with t h e i r " e p i g r a p h s " ( o r h y p o g r a p h s ) , f o r m o r e a b o u t t h e motivation a n d t h e u n d e r l y i n g p r i n c i p l e s of t h e e p i g r a p h i c a l a p p r o a c h c o n s u l t R o c k a f e l l a r a n d Wets (1984). T h e e p i g r a p h of a f u n c t i o n h : R n -+
R
i s t h e sete p i h = [ ( x , a ) E R n x R ( h ( x ) 5 a j
.
R o c k a f e l l a r (1976) s h o w s t h a t f : R n X E -+
R
i s a random l.sc. f u n c t i o n if a n d only ift h e multifunction ( k e p i f ( . , () i s nonempty, closed-valued , (3.4i) t h e multifunction
t
k e p i f ( - ,C)
i s m e a s u r a b l e ; (3.4ii) r e c a l l t h a t a multifunction ( b r([) : E -+ Rn + l i s m e a s u r a b l e if f o r a l l c l o s e d sets F C R " + ~f o r f u r t h e r d e t a i l s a b o u t m e a s u r a b l e multifunctions see R o c k a f e l l a r (1976), C a s t a - ing a n d V a l a d i e r (1976), a n d t h e b i b l i o g r a p h y of Wagner (1977) s u p p l e m e n t e d b y I o f f e (1978). W e s h a l l u s e r e p e a t e d l y t h e following r e s u l t d u e to Yankov, von Neu- man, a n d Kuratowski a n d R y l l Nardzewski.
PROPOSITION 3 . 1 Theorem of Measurable Selections. If
r :
E2
Rn i s a closed- v a l u e d measurable m u l t m n c t i o n , t h e n there e z i s t s a least one measurable selector, i.e. a measurable f u n c t i o n x : domr
--, Rn s u c h t h a t for all E domr,
x (C) E r(C), v h e r e dom
r
:= C
E Z1
r(C) #4 1 =
r - ' ( ~ ~ ) E A.
F o r a proof s e e R o c k a f e l l a r (1976), f o r example. As immediate c o n s e q u e n c e s of t h e definition (3.3) of random l.sc
.
functions, t h e equivalence with t h e conditions (3.4) a n d t h e p r e c e d i n g p r o p o s i t i o n , w e have:PROPOSITION 3 . 2 Let f : Rn x E --, be a r a n d o m 1.sc. f u n c t i o n . Then for a n y A m e a s u r a b l e f u n c t i o n x : Z --, Rn, t h e f u n c t i o n
Moreover, t h e i n f i m a l f u n c t i o n
tt-+
inf f ( - , C):=
i n f x E R n f ( x , C)i s A-measurable, a n d t h e set of optimal s o l u t i o n
t k
argmin f(., C):=
f x I f ( x , t )=
inf f ( . , C)ji s a closed-valued measurable m u l t ~ n c t i o n from Z i n t o Rn, a n d this implies t h a t t h e r e e x i s t s a measurable f u n c t i o n
k x*(t) : dom (argmin f(., ,$))
2
Rns u c h t h a t x * ( t ) m i n i m i z e s f ( - , C) whenever argmin f (., ,$)
+ 4.
F o r a s u c c i n c t p r o o f , s e e S e c t i o n 3 of R o c k a f e l l a r and Wets (1984).
If instead of P , w e only h a v e limited information a v a i l a b l e a b o u t P - e.g. some knowledge a b o u t t h e s h a p e of t h e distribution a n d a finite sample of values of
C
o r# . a
of a function of ,$
-
- t h e n to e s t i m a t e x* we usually h a v e t o r e l y on t h e solution of a n optimization problem t h a t "approximates" (3.1), viz.find x v E R n t h a t minimizes E v f ( x ) where
The measure P v i s n o t n e c e s s a r i l y t h e empirical m e a s u r e , b u t more g e n e r a l l y t h e
"best" (in t e r m s of a given c r i t e r i o n ) a p p r o x i m a t e t o P on t h e b a s i s of t h e informa- tion available. A s more information i s c o l l e c t e d , w e could r e f i n e t h e approximation t o P a n d hopefully find a b e t t e r estimate of x
* .
To model t h i s p r o c e s s , w e r e l y on t h e following set-up: l e t (Z, F, p ) b e a sample s p a c e with ( F v ) r = l a n i n c r e a s i n g se- q u e n c e of sigma-field contained in F. A sample< - -
e.g.< = It1, t'....
j obtained by independent sampling of t h e values of,.. t --
l e a d s u s t o a s e q u e n c e IPv(-, <), v=
1,...
j of p r o b a b i l i t y m e a s u r e s defined on (Z, A ). Since only t h e information collected up t o s t a g e v c a n b e used in t h e choice of P v , w e must a l s o r e q u i r e t h a t f o r a l l A E AS i n c e PV d e p e n d s on <, s o d o e s t h e a p p r o x i m a t e problem (3.5), in p a r t i c u l a r i t s solution x '. A s e q u e n c e of e s t i m a t o r s
is (strongly) c o n s i s t e n t if p-almost s u r e l y t h e y c o n v e r g e t o x
*
, t h i s , of c o u r s e , im- plies weak consistency ( c o n v e r g e n c e in probability).The following r e s u l t s e x t e n d t h e c l a s s i c a l Consistency Theorem of Wald (1940) a n d t h e e x t e n s i o n s by H u b e r (1967), t o t h e more g e n e r a l s e t t i n g laid o u t h e r e a b o v e . Consistency i s obtained by relying on assumptions t h a t are w e a k e r t h a n t h o s e of H u b e r (1967) e v e n in t h e unconstrained c a s e . To d o s o , w e r e l y on t h e t h e o r y of epi-convergence in conjunction with t h e t h e o r y of random sets (measur- a b l e multifunctions) and random l.sc. functions.
A s e q u e n c e of functions Ig ': R n -+ R,
-
v=
1,...
j i s said t o e p i - c o n v e r g e t o g : R" -+R
if f o r a l l x in Rn, we h a v elim inf g "(x ') 2 g(x) f o r a l l I x V j r = l c o n v e r g i n g t o x ,
v + m
and
f o r some I x V j c o n v e r g i n g t o x , lim s u p g V ( x V ) EG g ( x )
.
v + - (3.8)
Note t h a t a n y o n e of t h e s e conditions imply t h a t g i s lower semicontinuous. W e t h e n s a y t h a t g i s t h e e p i - l i m i t of t h e g V , a n d write g
=
epi-lim,, ,gv. W e r e f e r t o t h i s t y p e of c o n v e r g e n c e as epi-convergence, s i n c e i t i s equivalent o t t h e set- c o n v e r g e n c e of t h e e p i g r a p h s . F o r more a b o u t epi-convergence and i t s p r o p e r t i e s , consult Attouch (1984). Our i n t e r e s t in epi-convergence stems from t h e f a c t t h a tfrom a variational viewpoint i t is t h e weakest t y p e of convergence t h a t possesses t h e following p r o p e r t i e s :
PROPOSITION 3.3 [Attouch and Wets (1981), Salinetti and Wets (1986)l. Sup- pose 1g; g V : R n -+ R, v
- =
1,...
j i s a collection of functions s u c h that g=
epi -1im
,, ,
,gV. Thenlim s u p (inf gV)
s
inf g , v + -a n d , ig
x k E argmin g V k for some subsequence
1
vk, k=
I , . ..
j and x=
limk ,,xk, i t follows thatx E argmin g , and
lirn (inf gVk)
=
inf g ; k + -so in particular ig there e x i s t s a bounded set D c Rn s u c h that for some subse- quence
1
vk, k=
1,...
j,argmin g V k
n
D+
$ ,t h e n the m i n i m u m o f g i s attained at some point in the closure of D.
Moreover, ig argmin g
+
$, t h e n lim,,
, (inf g v )=
inf g ig and o n l y ig x E argmin g implies the existence of sequencesI&,
r 0, v=
1 ,...
j and l x V E Rn, v=
1 ,...
j w i t hlirn E,,
=
0 , and lirn x V=
xv + - v + -
s u c h that for all u
=
1,...
x V E E,
-
argmin g V :=
Ix ( g V ( x )s
E,,+
inf g v j.
The next theorem t h a t p r o v e s t h e p-almost s u r e epi-convergence of e x p e c t a - tion functionals, is build upon approximation r e s u l t s f o r s t o c h a s t i c optimization problems, f i r s t derived in t h e c a s e f(.,
C)
convex (Theorem 3.3, Wets (1984)), and l a t e r f o r t h e locally Lipschitz c a s e (Theorem 2.8, Birge and Wets (1986)). W e work with t h e following assumptions.ASSUMPTION 3.4 "Continuities" o f f . The f i n c t i o n
w i t h
dom f :
=
{ ( x , # ) l f ( x , #)< ={
= S X E , S c R" closed a n d n o n e m p t y , i s s u c h that for a l l x E S ,#
t-b f ( x , #) i s c o n t i n u o u s o n E , a n d for a l l#
E Ea n d Locally Lower L i p s c h i t z o n S , in t h e f o l l o w i n g sense: t o a n y x in S , t h e r e c o r r e s p o n d s a n e i g h b o r h o o d V of x a n d a b o u n d e d c o n t i n u o u s f i n c t i o n
8 :
E -+ R s u c h t h a t f o r a l l x ' E Vn
S a n d#
E Z,ASSUMPTION 3.5 Convergence i n distribution. G i v e n t h e s a m p l e s p a c e ( Z , F , p) a n d a n i n c r e a s i n g s e q u e n c e of s i g m a - f i e l d s (Fv),"=l c o n t a i n e d i n F, Let
P V : A
x
Z -+ [0, I], v=
1,...
be s u c h t h a t for a l l ( E Z
P v ( . , () i s a p r o b a b i l i t y m e a s u r e o n ( E , A ) , a n d f o r a l l A E A
(t-b P v ( A , () i s F v - m e a s u r a b l e
.
For p-almost a l l ( in Z, t h e s e q u e n c e
P V , ) v
=
1 . . c o n v e r g e s in d i s t r i b u t i o n t o P ,a n d w i t h P
=
: P O ( - , (), f o r a l l x E S , t h e s e q u e n c e l P v ( . , (){r=O
i s f ( x , - ) - t i g h t ( a s y m p t o t i c n e g l i g i b i l i t y ) , i.e. t o e v e r y x E S a n d E>
0 t h e r e c o r r e s p o n d s a com- p a c t setK,
c s u c h t h a t f o r v=
0 , 1,...
j E \ K e l f ( x , # ) l P V ( d # . <)
<
E.
a n d
The assumption t h a t
<I+ dorn f ( . , <):
=
l x I f ( x . <)<
-f=
Si s c o n s t a n t , which i s s a t i s f i e d by a l l t h e e x a m p l e s in S e c t i o n 2, may a p p e a r m o r e r e s t r i c t i v e t h a n i t a c t u a l l y i s . Indeed, i t i s e a s y to see t h a t
dorn Ef
= n
dorn f (. , <) , ( E Lif Z i s t h e s u p p o r t of t h e m e a s u r e
P
a n d f o r a l l x €n C , ~
dorn f ( . , <), t h e function f ( x , .) i s bounded a b o v e by a summable function. Then, with S= nC,
2 dorn f ( . , <) andf ( x , [) if x E S
+ -
o t h e r w i s e ,we may as well work with f + i n s t e a d of f , s i n c e
and now [ k dorn f + ( . , [)
=
S i s c o n s t a n t .Assumption 3.4 implies t h a t f i s a random lower semicontinuous function (nor- mal i n t e g r a n d ) . Indeed, f o r a l l [ € =, f ( . , [) i s p r o p e r and lower semicontinuous
(3.3.i) and (x, [) k f ( x , [) i s B" 60 A-measurable (3.3.ii) s i n c e f o r a l l a E R , l e v , f : = {(x, [)lf(x, [ ) S a f i s c l o s e d
.
To s e e t h i s , s u p p o s e {(xk, [ k ) f r = l C lev,f i s a s e q u e n c e c o n v e r g i n g to (x, [); t h e n from Assumption 3.4 we h a v e t h a t f o r k sufficiently l a r g e , and a l l
#
in p a r t i c u l a r
w h e r e
B =
max(, @([) i s f i n i t e , s i n c e B(.) i s bounded. Now#
k f ( x , #) i s continu- o u s o n Z, t h u s t a k i n g limits as k g o e s to a, w e o b t a i nf ( x , [) 6 a
+ B
lim Ilx-
xkll=
a ,k-*-
i.e. (x, C) E lev,f. Since f is a random l.sc. function if follows from P r o p o s i t i o n 3.2 t h a t
i s measurable. Thus condition (3.12) d o e s not s n e a k in a n o t h e r measurability condi- tion, i t r e q u i r e s simply t h a t t h e measurable function 7 b e quasi-integrable.
H u b e r (1967), as well as o t h e r s see e.g. Ibragimov a n d Has'minski (1981), as- sumes t h a t S is open. S i n c e c o n s t r a i n t s usually d o n o t involve s t r i c t inequalities, t h i s i s a n u n n a t u r a l r e s t r i c t i o n , e x c e p t when t h e r e are no c o n s t r a i n t s , i.e. S
=
Rn in which case S i s a l s o closed. In any c a s e , w h a t e v e r b e t h e optimality r e s u l t s o n e may b e a b l e t o p r o v e with S o p e n , t h e y remain valid when S i s r e p l a c e d by i t s clo- s u r e , assuming minimal continuity p r o p e r t i e s f o r t h e e x p e c t a t i o n functionals, b u t t h e c o n v e r s e d o e s n o t hold.To simplify notations w e s h a l l , whenever i t i s convenient, d r o p t h e e x p l i c i t r e f e r e n c e of t h e d e p e n d e n c e o n
<
of t h e p r o b a b i l i t y m e a s u r e s P v a n d t h e r e s u l t i n g e x p e c t a t i o n functionals E v f , n o n e t h e l e s s t h e r e a d e r should always b e aware t h a t a l l p-as. s t a t e m e n t s r e f e r t o t h e underlying p r o b a b i l i t y s p a c e (Z, F, p ) . W e begin by showing t h a t Ef, as well as t h e Evf, are well-defined functions.LEMMA 3.6 U n d e r A s s u m p t i o n s 3.4 a n d 3.5, t h e r e e x i s t s Zo E F. p(Zo)
=
1 s u c h t h a t for a l l<
E ZO, Ef a n d lEvf, v=
I , . ..
j a r e p r o p e r lower s e m i c o n t i n u o u s a n c t i o n s s u c h t h a tS
=
dom Ef=
dom Evf(., <)o n w h i c h t h e e x p e c t a t i o n a n c t i o n a l s a r e f i n i t e .
PROOF Let us f i r s t f i x <, a n d assume t h a t f o r t h i s
<
a l l t h e conditions of As- sumption 3.5 are satisfied. If x C S , t h e n f(x, [)= =
f o r a l lC
in=
a n d h e n c e Ef=
EVf=
=, i.e.,S 3 dom E f , S 3 dom EVf
.
With PO
=
P , f o r x E S a n d a n y E>
0 , t h e r e i s a compact set K c (Assumption 3.5) s u c h t h a tas follows from (3.11) a n d t h e f a c t t h a t f ( x , .) i s continuous a n d f i n i t e on K c c E . Thus Evf (x)
<
w.The f a c t t h a t Ef
> -
w, a n d Evf> -
00 follows d i r e c t l y from condition (3.12). I t i s also t h i s condition t h a t we use to show t h a t t h e e x p e c t a t i o n f u n c t i o n a l s are lower semicontinuous s i n c e i t allows u s to a p p e a l to Fatou's Lemma to obtain: given) x
1
=: a s e q u e n c e c o n v e r g i n g to x ; l i m i n f E f ( x V ) 2f
lim f ( x v , #)P(dt)v + = ' v + -
w h e r e t h e l a s t inequality follows from t h e lower semicontinuity of f(., t ) at x. Of c o u r s e , t h e same s t r i n g of inequalities holds f o r all ) P V , v
=
1 ,... 1.
S i n c e t h e a b o v e holds f o r e v e r y v p-almost s u r e l y on Z, t h e set Z,
=
) { E Z J E V f ( . , {) i s f i n i t e , 1-sc. on S, f o r v=
0, 1,... 1
i s of m e a s u r e 1.0
THEOREM 3.7 S u p p o s e )E 'f, v
=
1,... 1
i s a s e q u e n c e of e z p e c t a t i o n f u n c - t i o n a l ~ d e f i n e d b ya n d E f ( x )
=
E ) f ( x , #){ s u c h t h a t f a n d t h e c o l l e c t i o n ) P ; P V , v=
1,... 1
s a t i s & As- s u m p t i o n s 3.4 a n d 3.5. Then, p-almost s u r e l yEf
=
e p i -1im EVf=
ptwse -1im EVfv + = ' V + = '
w h e r e ptwse-lim,, ,Evf d e n o t e s t h e p o i n t w i s e l i m i t .
PROOF The a r g u m e n t e s s e n t i a l l y follows t h a t of Theorem 2.8 Birge and W e t s (1986), with minor modifications to t a k e care of t h e slightly w e a k e r assumptions a n d t h e f a c t t h a t t h e e x p e c t a t i o n functionals d e p e n d o n
<.
W e begin b y showing t h a t p-almost s u r e l y Ef i s t h e pointwise limit of t h e E V f . W e fix { E Z, and assume t h a t t h e conditions of Assumption 3.5 are s a t i s f i e d f o r t h i s p a r t i c u l a r<.
S u p p o s e x E S , a n d setFrom condition (3.11), i t follows t h a t f o r a l l E
>
0 , t h e r e i s a compact set K c s u c h t h a t f o r a l l vL e t 7,:
=
m a x t E K t l h ( # ) ) . W e know t h a t 7, i s f i n i t e s i n c eK t
i s c o m p a c t a n d h i s con- t i n u o u s o n Z (Assumption 3 . 4 ) . L e t h C b e a t r u n c a t i o n of h , d e f i n e d b yI
h(#) if Ih(#)I s
7,he(#)
=
7, if h ( t )>
7 c-
7, if h ( t )<
7,T h e f u n c t i o n h C is bounded a n d c o n t i n u o u s , a n d f o r all
#
in Z IhC(#)Is
lh(#)lNow, f r o m t h e c o n v e r g e n c e i n d i s t r i b u t i o n of t h e P Y , lim [a::
=
/ E h c ( # ) ~ u ( d # ) ]=
/ E h c ( # ) ~ ( d # ) :=
a t.
,+-
M o r e o v e r , f o r all v
Now, let
W e h a v e t h a t f o r all v
la,
-
a,CI=
~ & , ~ ~ ( h ( # )-
h c ( 0 ) P V ( d # ) (<
2 r.
a n d also
( E f ( x )
-
aCI<
2 rT h e s e t w o last e s t i m a t e s , when u s e d in c o n j u n c t i o n with (3.13) yield: f o r all E
>
0 J E f ( x )-
awl<
6 r.
Thus f o r all x i n S
E f ( x )
=
lim E Y f ( x )=
lim a, ,u - + - u - + -
a n d s i n c e , by Lemma 3.6, S
=
dom Ef=
dom EYf ,i t means t h a t Ef
=
ptwse -limv, ,Evf, and t h a t condition (3.8) of epi-convergence i s s a t i s f i e d , s i n c e w e c a n c h o o s e I x v=
x f o r t h e s e q u e n c e converging t o x .T h e r e r e m a i n s t o v e r i f y condition (3.7) of epi-convergence. If x @ S , t h e n f o r e v e r y s e q u e n c e lx '{
rZl
converging t o x , s i n c e S i s closed we h a v e t h a t x u @ S f o rv sufficiently l a r g e and h e n c e E V f ( x Y )
=
-, which implies t h a t lim inf EYf(x Y + Q ")= -
2 Ef (x)= - .
If x E S, a n d
l ~ ' { , " = ~
i s a s e q u e n c e converging t o x , unless x v i s in S infinitely o f t e n , lim inf,, , E Y f ( x Y )=
-, and t h e n condition (3.7) i s t r i v i a l l y s a t i s f i e d . S o l e t u s assume t h a t !X c S. F o r v sufficiently l a r g e , from (3.10) i t follows t h a t t h e r e i s a bounded continuous functionB
s u c h t h a tI n t e g r a t i n g both s i d e s with r e s p e c t t o PV, and taking lim inf,, ,, w e obtain lirn EYf(x)
-
lim B Y .I ~ x
- x Y ( ( S lirn infEVf(xv)LJ+m V + Q Y - Q
where
BV = J
@(.$I Pw(d.$) c o n v e r g e t o a finite limit s i n c e t h e P V c o n v e r g e in d i s t r i - bution t o P , and by pointwise c o n v e r g e n c e of t h e EYf t h i s yieldsEf (x) zs lim inf EVf (xu)
.
Ov + -
To a p p l y in t h i s c o n t e x t , P r o p o s i t i o n s 3.2 a n d 3.3, we must show t h a t t h e e x - p e c t a t i o n functionals lEYf, v
=
I,...
{ are random l.sc. functions.THEOREM 3.8 U n d e r Assumptions 3.4 and 3.5, t h e e z p e c t a t i o n f ' u n c t i o n a L s E ~ ~ : R " X Z
-+E,
f o r v = I , .. .
,a r e p-almost s u r e l y r a n d o m l o w e r s e m i c o n t i n u o u s f'unctions, s u c h t h e < k epi Evf ( a , <) is F"measurab1e.
PROOF Lemma 3.6 shows t h a t t h e r e e x i s t s a set ZO c Z of p-measure 1 s u c h t h a t f o r a l l
<
E ZO, t h e multifunction<
k e p i EYf(., <) : Z,2
R n i s nonempty, closed-valued.
This i s condition (3.4.i), t h u s t h e r e remains only t o e s t a b l i s h (3.4.ii), i.e.
<
k epi EYf (., <) i s FY-measurable.
f o r v
=
1,.. . .
Theorem 3.7 p r o v e s t h a t with r e s p e c t to t h e topology of c o n v e r g e n c e in d i s t r i b u t i o n , t h e mapP V b e p i Evf i s continuous
.
Moreover, s i n c e
<
b PV(A, <) i s Fv-measurable f o r a l l A E A , i t means t h a t given a n y f i n i t e c o l l e c t i o n of c l o s e d sets [F, c E J ~ , ~ a n d s c a l a r s [ f i i j f = l c10,
I], t h e setwhich means t h a t t h e function
<
b P v ( . , <) : Z- P
:=
t p r o b a b i l i t y m e a s u r e s on ( E , A ) ji s Fv-measurable. To see t h i s , o b s e r v e t h a t t h e " c o n v e r g e n c e in distributionu- topology c a n b e o b t a i n e d f r o m t h e b a s e of o p e n sets
see Billingsley (1968), t h a t also g e n e r a t e t h e Bore1 field on P. Thus
<
k e p i EVf(., <)i s t h e composition of a continuous function, a n d a F v - m e a s u r a b l e f u n c t i o n , a n d h e n c e i s F v - m e a s u r a b l e . ~
In t h e proof of Theorem 3.8, we h a v e used t h e continuity of t h e map P V k e p i E V f , in f a c t Theorem 3.7 only p r o v e s epi-convergence, without i n t r o d u c i n g ex- plicitly t h e epi-topology f o r t h e s p a c e of lower semicontinuous functions. The f a c t t h a t e p i - c o n v e r g e n c e i n d u c e s a topology on t h e s p a c e of l.sc. functions i s well- e s t a b l i s h e d , see f o r example Dolecki, S a l i n e t t i a n d Wets (1983) a n d Attouch (1984), a n d t h u s with t h i s p r o v i s o , Theorem 3.7 p r o v e s t h e epi-continuity of t h e map P V k
e p i EVf.
THEOREM 3.9 Consistency. U n d e r Assumptions 3.4 a n d 3.5 w e h a v e t h a t p- a l m o s t s u r e l y
lim s u p (inf E V f ) S inf Ef v + -
Moreover, t h e r e e z i s t s Zo E F w i t h p ( Z \ ZO)
=
0, s u c h t h a t(i) for a l l
<
E ZO, a n y c l u s t e r p o i n t9
of a n y s e q u e n c e tx ', v = 1,... I
w i t h x Eargmin E V f V ( . , <) b e l o n g s t o argmin Ef (i.e. i s an o p t i m a l e s t i m a t e ) ,
(ii) f o r v
=
1,..
<
t, argmin EVf (. , <) : Zo2
Rn ,is a c l o s e d - v a l u e d F V - m e a s u r a b l e m u l t i f b n c t i o n .
In p a r t i c u l a r , if t h e r e is a compact s e t D c Rn s u c h t h a t f o r v
=
I,...
(argmin ~ ~
n
fD is n o n e m p t y p-a.s. ) , andtx* j
=
argmin Efn
D ,t h e n t h e r e e x i s t txu:Z, --+ Rnj,",l F V - m e a s u r a b l e s e l e c t i o n s of targmin E v f j F = l s u c h t h a t
x
* =
lim x V ( < ) f o r p - a l m o s t all<
,u + -
and a l s o
inf Ef
=
lim (inf EVf) p-a.s..
v - w
PROOF The inequality (3.14) immediately follows from (3.9) a n d t h e epi- c o n v e r g e n c e p-almost s u r e l y of t h e e x p e c t a t i o n functionals EVf to Ef (Theorem 3.7) as d o e s t h e a s s e r t i o n (i) a b o u t c l u s t e r points of optimal solutions (Proposition 3.2). The f a c t t h a t (argmin E V f ) i s a closed-valued F v - m e a s u r a b l e multifunction fol- lows from Theorem 3.8 a n d P r o p o s i t i o n 3.2.
Now s u p p o s e Zo c Z b e s u c h t h a t p(ZO)
= I ,
f o r a l l<
E Z o , Ef=
epi-lim,
,
,Evf, and f o r a l l v=
1, ... , (argmin E v f )n
D i s nonempty. F o r a l l v , t h e multifunction<
h (argmin E v f ( . , <)n
D) : Zo2
Rni s nonernpty compact-valued, and Fv-measurable; i t i s t h e i n t e r s e c t i o n of two closed-valued m e a s u r a b l e multifunctions, see R o c k a f e l l a r (1976). Now f o r a n y
<
E ZO, l e t t Z v j r = l b e a n y s e q u e n c e in Rn s u c h t h a t f o r a l l Y ,Zv(<) E argmin ~ " f ( . , < )
n
D.
Then, a n y c l u s t e r point of t h e s e q u e n c e is in D, s i n c e i t i s c o m p a c t , a n d in argmin Ef as follows f r o m P r o p o s i t i o n 3.2. Actually, x
* =
limv,,xv. To see t h i sn o t e t h a t , if x* i s not t h e limit point of t h e s e q u e n c e t h e r e e x i s t s a s u b s e q u e n c e I v k { F = s u c h t h a t f o r some b
>
0 , a n d a l l k=
1,. . .
,s k ~ a r g m i n ~ ' L f n D , a n d J J X * - Z ~ ) ( > ~ ,
b u t t h i s i s c o n t r a d i c t e d by t h e f a c t t h a t t h i s s u b s e q u e n c e included in D c o n t a i n s a f u r t h e r s u b s e q u e n c e t h a t i s c o n v e r g e n t .
N o w , f o r v
=
1,...
, l e t x V : Z -+R n
b e a n Fv-measurable s e l e c t i o n of t h e Fv- m e a s u r a b l e multifunction < b (argmin Evf(., <)n
D), c f . P r o p o s i t i o n 3.1. By t h e p r e c e d i n g a r g u m e n t f o r a l l<
E Zo, w h e r e p(Zo)=
1,x
* =
lim x u ( < ) v + -and from P r o p o s i t i o n 3.3, i t t h e n a l s o follows t h a t lim (inf Evf (., <))
=
inf Ef=
~f (x*)v --r
f o r a l l
<
E ZO.nI t should b e noted t h a t c o n t r a r y to e a r l i e r work
-
see Wald (1940), H u b e r (1967)-
w e d o not assume t h e uniqueness of t h e optimal solutions, at l e a s t in t h e c a s e of t h e s t o c h a s t i c programming model, i n t r o d u c e d in s e c t i o n 2, t h i s would n o t b e a n a t u r a l assumption. Also, l e t us o b s e r v e t h a t w e h a v e not given h e r e t h e most g e n e r a l possible v e r s i o n of t h e Consistency Theorem t h a t could b e o b t a i n e d by re- lying on t h e tools i n t r o d u c e d h e r e . T h e r e are conditions t h a t are n e c e s s a r y a n d s u f f i c i e n t f o r t h e c o n v e r g e n c e of infima-
see S a l i n e t t i a n d Wets (1986), Robinson (1985)-
t h a t could b e used h e r e in conjunction with c o n v e r g e n c e r e s u l t s f o r m e a s u r a b l e s e l e c t i o n s (Salinetti a n d Wets (1981)) t o yield a slightly s h a r p e r t h e o r e m , b u t t h e conditions would b e much h a r d e r t o v e r i f y , a n d would b e of v e r y limited i n t e r e s t in t h i s c o n t e x t . Also, s i n c e epi-convergence i s of l o c a l c h a r a c t e r , w e could r e w a r d o u r s t a t e m e n t s to o b t a i n "local" c o n s i s t e n c y by r e s t r i c t i n g o u r at- t e n t i o n to a neighborhood of some x* in a r g m i n Ef.W e conclude by a n e x i s t e n c e r e s u l t . A function h :
R n
-+R
i s inf-compact if f o r a l l a ER
l e v a h :
=
Ix E R n l h ( x ) 5 a { i s compact.
If h i s p r o p e r (h
> -
w , dom h # 0 ) a n d inf-compact, t h e n (inf h ) i s f i n i t e and at- t a i n e d f o r some x E R". F o r example, if h=
g+
qs, w h e r e g i s continuous and qs i st h e i n d i c a t o r f u n c t i o n of t h e nonempty c o m p a c t set S(.ks(x)
=
0 if x E S, a n d .o o t h - e r w i s e ) , t h e n h i s inf-compact. A n o t h e r s u f f i c i e n t c o n d i t i o n i s to h a v e g c o e r c i v e . Inf-compactness i s t h e most g e n e r a l c o n d i t i o n t h a t i s v e r i f i a b l e u n d e r which e x - i s t e n c e c a n b e e s t a b l i s h e d . T h e n e x t proposi!.ion g e n e r a l i z e s r e s u l t s of Wets (1973) a n d H i r i a r t - U n r u t y (1976). E s s e n t i a l l y , we a s s u m e t h a t f(., #) i s inf-compact with p o s i t i v e p r o b a b i l i t y .PROPOSITION 3 . 1 0 U n d e r Assumptions 3.4 a n d 3.5, t h e c o n d i t i o n : t h e r e ex- i s t s A E A w i t h P(A)
>
0 ( r e s p . Pv(A)>
0 ) s u c h t h a t for aLL a ER, t h e setlev, f
n
( R n x A) i s b o u n d e d.
Then Ef is inJ-compact ( r e s p . Evf i s p - a . s . inf-compact).
PROOF I t c l e a r l y s u f f i c e s to p r o v e t h e p r o p o s i t i o n f o r P, t h e s a m e a r g u m e n t a p p l i e s f o r all P v p-as.. L e t
7 ( # ) :
=
inf to, inf f (x, #) j.
x € R n
The f u n c t i o n i s m e a s u r a b l e ( P r o p o s i t i o n 3 . 2 ) a n d P-summable, see (3.12). T h e func- t i o n f', d e f i n e d b y
i s t h e n n o n n e g a t i v e . M o r e o v e r f' 2 f a n d t h u s
Set al :
=
a / P ( A ) a n d l e t A1 b e t h e p r o j e c t i o n o n R n of lev,,fln
(Rn x A). Then if x g Al a n d#
E Aa n d s i n c e f' i s n o n n e g a t i v e , with
=
E t7(#)1,
H e n c e l e v -Ef C A l , a bounded s e t . To c o m p l e t e t h e p r o o f i t s u f f i c e s to o b s e r v e a + 7
t h a t f r o m Lemma 3.6 we know t h a t lev,Ef i s c l o s e d s i n c e Ef is l o w e r semicontinu- o u s , a n d t h i s with t h e a b o v e implies t h a t lev, +7Ef i s c o m p a c t f o r a l l a E R.U