Asymptotic Behavior of Statistical Estimators and Optimal Solutions for Stochastic Optimization Problems

(1)

W O R K I I G P A P E R

ASYMPTOTIC BEHAVIOR OF STATISTICAL ESTIMATYIRS

AND

OPTIMAL SOLUTIONS FOR STOCHASTIC OPTIMIZATION PROBLEMS

Jitka D'Upa&v&

Roger Wets

1 lASA

. Lm....

I n t e r n a t i o n a l I n s t i t u t e for Applied Systems Analysis

(2)

NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS

ASYMPTOTIC BEHAVIOR OF STATISI'ICAL ESIlkIATORS AND OPTIMAL SOLUTIONS FOR STOCHASTIC OPTIMIZATION PROBLEWS

J i t k a DupaE ovh Roger Wets

August 1986 WP-86-41

Working P a p e r s a r e interim r e p o r t s on work of t h e I n t e r n a t i o n a l I n s t i t u t e f o r Applied Systems Analysis a n d h a v e r e c e i v e d only limited review. Views o r opinions e x p r e s s e d h e r e i n d o not n e c e s s a r i l y r e p r e s e n t t h o s e of t h e Institute o r of i t s National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria

(3)

FOREWORD

This p a p e r p r e s e n t s t h e f i r s t r e s u l t s o n a new s t a t i s t i c a l a p p r o a c h to t h e p r o b l e m of i n c o m p l e t e i n f o r m a t i o n in s t o c h a s t i c p r o g r a m m i n g . T h e t o o l s of nondif- f e r e n t i a b l e o p t i m i z a t i o n u s e d h e r e h e l p to p r o v e t h e c o n s i s t e n c y of ( a p p r o x i m a t e ) o p t i m a l s o l u t i o n s b a s e d o n a n i n c r e a s i n g i n f o r m a t i o n o n t h e t r u e p r o b a b i l i t y d i s t r i - b u t i o n without u n n a t u r a l s m o o t h n e s s a s s u m p t i o n s . T h e y also allow to t a k e f u l l y i n t o a c c o u n t t h e p r e s e n c e of c o n s t r a i n t s .

A l e x a n d e r B. K u r z h a n s k i C h a i r m a n S y s t e m a n d D e c i s i o n S c i e n c e s P r o g r a m

(4)

CONTENTS

1 I n t r o d u c t i o n 2 E x a m p l e s

3 C o n s i s t e n c y : C o n v e r g e n c e of Optimal S o l u t i o n s R e f e r e n c e s

(5)

ASYMPTOTIC BEHAVIOR OF STATISIICAL ESTIMATORS

AND

OPTIMAL SOLUTIONS FOR

STOCHASTIC OPTIMIZATION PROBUDIS J i t k a D u p a E o v & a n d R o g e r Wets

The c a l c u l a t i o n of e s t i m a t e s f o r v a r i o u s s t a t i s t i c a l p a r a m e t e r s h a s b e e n o n e of t h e main c o n c e r n s of S t a t i s t i c s s i n c e i t s i n c e p t i o n , a n d a n u m b e r of e l e g a n t f o r - mulas h a v e b e e n d e v e l o p e d to o b t a i n s u c h e s t i m a t e s i n a n u m b e r of p a r t i c u l a r in- s t a n c e s . Typically s u c h cases c o r r e s p o n d to a s i t u a t i o n when t h e r a n d o m phenomenon i s u n i v a r i a t e in n a t u r e , a n d t h e r e are n o "active" r e s t r i c t i o n s o n t h e e s t i m a t e of t h e unknown s t a t i s t i c a l p a r a m e t e r . However, t h a t i s n o t t h e case in g e n e r a l , many e s t i m a t i o n p r o b l e m s are m u l t i v a r i a t e i n n a t u r e a n d t h e r e are res- t r i c t i o n s o n t h e c h o i c e of t h e p a r a m e t e r s . T h e s e c o u l d b e simple n o n n e g a t i v i t y c o n s t r a i n t s , b u t also much m o r e complex r e s t r i c t i o n s involving c e r t a i n mathematical r e l a t i o n s b e t w e e n t h e p a r a m e t e r s t h a t n e e d to b e e s t i m a t e d . C l a s s i c a l t e c h - n i q u e s , t h a t c a n s t i l l b e u s e d to h a n d l e least s q u a r e e s t i m a t i o n with l i n e a r e q u a l i t y c o n s t r a i n t s o n t h e p a r a m e t e r s f o r e x a m p l e , b r e a k down if t h e r e are i n e q u a l i t y c o n s t r a i n t s or a n o n d i f f e r e n t i a b l e c r i t e r i o n f u n c t i o n . In s u c h cases o n e c a n n o t e x - p e c t t h a t a simple f o r m u l a will yield t h e r e l a t i o n s h i p b e t w e e n t h e s a m p l e s a n d t h e b e s t e s t i m a t e s . Usually, t h e latter must b e found b y solving a n optimization p r o b - lem. N a t u r a l l y t h e s o l u t i o n of s u c h a p r o b l e m d e p e n d s o n t h e c o l l e c t e d s a m p l e s a n d o n e i s c o n f r o n t e d with t h e q u e s t i o n s of t h e c o n s i s t e n c y a n d of t h e a s y m p t o t i c

b e h a v i o r of s u c h e s t i m a t o r s . This i s t h e s u b j e c t of t h i s a r t i c l e .

To o v e r c o m e t h e t e c h n i c a l p r o b l e m s c a u s e d b y t h e i n t r i n s i c l a c k of smooth- n e s s , we r e l y o n t h e g u i d e l i n e s a n d t h e tools p r o v i d e d b y t h e o r y of n o n d i f f e r e n t i - a b l e optimization. In f a c t , t h e p r o b l e m of p r o v i n g c o n s i s t e n c y of t h e e s t i m a t o r s , a n d t h e s t u d y of t h e i r a s y m p t o t i c b e h a v i o r i s c l o s e l y r e l a t e d to t h a t of o b t a i n i n g c o n f i d e n c e i n t e r v a l s f o r t h e s o l u t i o n of s t o c h a s t i c optimization p r o b l e m s when t h e r e i s o n l y p a r t i a l i n f o r m a t i o n a b o u t t h e p r o b a b i l i t y d i s t r i b u t i o n of t h e r a n d o m c o e f f i c i e n t s of t h e p r o b l e m . In f a c t i t was t h e n e e d to d e a l with t h i s class of p r o b -

(6)

lems t h a t originally motivated t h i s s t u d y . W e s h a l l s e e in S e c t i o n 2 t h a t s t o c h a s t i c optimization problems as well as t h e problem of finding s t a t i s t i c a l e s t i m a t o r s are t w o i n s t a n c e s of t h e following g e n e r a l c l a s s of problems:

find x E R n t h a t minimizes E t f ( x ,

- 4)

j ,

w h e r e f : Rnx Z ^-4R

y 1 +

^{o o j}i s a n e x t e n d e d r e a l valued function a n d

- #

i s a random v a r i a b l e with v a l u e s in E; f o r m o r e d e t a i l s see S e c t i o n 3. I t i s implicit in t h i s f o r - mulation t h a t t h e e x p e c t a t i o n i s c a l c u l a t e d with r e s p e c t to t h e t r u e p r o b a b i l i t y d i s t r i b u t i o n P of t h e random v a r i a b l e

- #,

w h e r e a s in f a c t a l l t h a t i s known i s a c e r - t a i n a p p r o x i m a t e P V . Our o b j e c t i v e i s to s t u d y t h e b e h a v i o r of t h e optimal solution (estimate) x V , o b t a i n e d b y solving t h e optimization problem using P V i n s t e a d of P to c a l c u l a t e t h e e x p e c t a t i o n , when t h e { P V , v

=

1,

...

j i s a s e q u e n c e of p r o b a b i l i t y m e a s u r e s c o n v e r g i n g to P. I n S e c t i o n 3 w e give conditions u n d e r which c o n s i s t e n c y c a n b e p r o v e d . C o n s t r a i n t s o n t h e c h o i c e of t h e optimal x are i n c o r p o r a t e d in t h e formulation of t h e problem b y allowing t h e function f to t a k e o n t h e value

+

^w.The r e s u l t s are o b t a i n e d without e x p l i c i t r e f e r e n c e to t h e form of t h e s e c o n s t r a i n t s .

T h e r e i s of c o u r s e a s u b s t a n t i a l s t a t i s t i c a l l i t e r a t u r e dealing with t h e ques- t i o n s b r o a c h e d h e r e , beginning with t h e seminal a r t i c l e of Wald (1949) a n d t h e work of H u b e r (1967) on maximum likelihood e s t i m a t o r s . Of more d i r e c t p a r e n t a g e , at l e a s t as f a r as formulation a n d u s e of mathematical t e c h n i q u e s , i s t h e work o n s t o c h a s t i c programming p r o b l e m s with p a r t i a l information. Wets (1979) r e p o r t s some p r e l i m i n a r y r e s u l t s , f u r t h e r developments were p r e s e n t e d at t h e 1 9 8 0 meeting o n s t o c h a s t i c optimization at IIASA (Laxenburg, A u s t r i a ) a n d r e c o r d e d in Solis a n d Wets (1981), see a l s o DupaEovA (1983a, b ) a n d (1984b) f o r a s p e c i a l case. In a p r o j e c t e d p a p e r w e s h a l l d e a l with e s t i m a t e s of t h e c o n v e r g e n c e rates, as well as with t h e c o n v e r g e n c e of t h e a s s o c i a t e d L a g r a n g i a n function.

2.

EXAMPLES

The r e s u l t s a p p l y equally well to estimation or s t o c h a s t i c optimization p r o b - lems with or without c o n s t r a i n t s , with d i f f e r e n t i a b l e or n o n d i f f e r e n t i a b l e c r i t e r i o n function. However, t h e e x a m p l e s t h a t w e d e t a i l h e r e are t h o s e t h a t f a l l o u t s i d e t h e c l a s s i c a l mold, viz. u n c o n s t r a i n e d smooth problems.

(7)

R e s t r i c t i o n s on t h e s t a t i s t i c a l estimates o r t h e optimal decisions of s t o c h a s t i c optimization problems, follow from t e c h n i c a l a n d modeling c o n s i d e r a t i o n s as well as n a t u r a l s t a t i s t i c a l assumptions. The l e a s t s q u a r e estimation problem with l i n e a r equality c o n s t r a i n t s , a b a s i c s t a t i s t i c a l method, see e.g. R a o (1965), c a n b e solved by a usual tools of d i f f e r e n t i a l calculus. The inequality c o n s t r a i n t s however i n t r o - d u c e a lack of smoothness t h a t d o e s n o t allow u s t o fall b a c k on t h e old stand-bys.

In Judge a n d Takayama (1966), Liew (1976) t h e t h e o r y of q u a d r a t i c programming i s used t o e x h i b i t a n d d i s c u s s t h e s t a t i s t i c a l p r o p e r t i e s of l e a s t s q u a r e e s t i m a t e s sub- j e c t t o inequality c o n s t r a i n t s f o r t h e case of l a r g e a n d small samples.

In connection with t h e maximum likelihood estimation, t h e case of p a r a m e t e r r e s t r i c t i o n s i n t h e form of smooth nonlinear equations was s t u d i e d by Aitchinson a n d Silvey (1958) including r e s u l t s on asymptotic normality of t h e estimates. The Lagrangian a p p r o a c h w a s f u r t h e r developed by Silvey (1959), e x t e n d e d t o t h e case of a multisample s i t u a t i o n by S e n (1979) including analysis of t h e situation when t h e t r u e p a r a m e t e r value d o e s n o t fulfill t h e c o n s t r a i n t s ( t h e nonnull c a s e ) .

Typically o n e must t a k e i n t o a c c o u n t in t h e estimation of v a r i a n c e s and v a r i - a n c e components nonnegativity r e s t r i c t i o n s . Unconstrained maximum likelihood estimation in f a c t o r analysis a n d in more complicated s t r u c t u r a l a n a l y s i s models, s e e e.g. Lee (1980), may l e a d t o negative e s t i m a t e s of t h e v a r i a n c e s . Replacing t h e s e u n a p p r o p r i a t e e s t i m a t e s by z e r o s gives estimates which a r e n o l o n g e r optimal with r e s p e c t t o t h e c h o s e n fitting function. Similarly, t h e r e i s a problem of g e t t i n g negative e s t i m a t e s of v a r i a n c e components, see Example 2.3. In s t a t i s t i c a l p r a c t i c e , t h e s e nonpositive v a r i a n c e estimates are usually fixed at z e r o a n d t h e d a t a i s eventually r e a n a l y z e d . In g e n e r a l , s u c h a n a p p r o a c h may l e a d t o plausible r e s u l t s in c a s e of estimating one r e s t r i c t e d p a r a m e t e r only a n d i t i s mostly unap- p r o p r i a t e i n multi-dimensional situations; see e.g. t h e e v i d e n c e given by Lee

(1980).

The possibility of using mathematical programming techniques t o g e t con- s t r a i n e d estimates w a s e x p l o r e d by A r t h a n a r i a n d Dodge (1981). As mentioned i n t h e introduction w e use mathematical programming t h e o r y not only t o g e t inequali- t y c o n s t r a i n e d e s t i m a t e s b u t t o g e t asymptotic r e s u l t s f o r a l a r g e c l a s s of decision a n d estimation problems which contains, i n t e r a l i a , r e s t r i c t e d M-estimates and sto- c h a s t i c programming with incomplete information. In comparison with t h e r e s u l t s of a d h o c a p p r o a c h e s valid mostly f o r one-dimensional r e s t r i c t e d estimation o u r method c a n b e used f o r high-dimensional cases a n d without u n n a t u r a l smoothness assumptions, in s p i t e of t h e f a c t t h a t t h e violation of d i f f e r e n t i a b i l i t y assumptions

(8)

c a n n o t b e easily bypassed by t h e use of d i r e c t i o n a l d e r i v a t i v e s (in c o n t r a s t t o t h e one-dimensional c a s e ) .

EXAMPLE 2 . 1 Inequality constrained least squares estimation of regres- sion coe.f'$icients. Assume t h a t t h e d e p e n d e n t v a r i a b l e y c a n b e explained o r p r e d i c t e d on t h e b a s e of information provided by independent v a r i a b l e s x l ,

. . .

, x p . In t h e simplest case of l i n e a r model, t h e o b s e r v a t i o n s y, on y are sup- posed t o b e g e n e r a t e d a c c o r d i n g t o

w h e r e

el, ^{. . .}

^,

^pp

a r e unknown p a r a m e t e r s t o b e estimated, ^{E , ,} j

=

1,

. . .

, v, d e n o t e t h e o b s e r v e d values of r e s i d u a l and X

=

(xl,) i s a (p, v ) matrix whose rows c o n s i s t of t h e o b s e r v e d v a l u e s of t h e independent v a r i a b l e s .

In t h e p r a c t i c a l implementation of t h i s model, t h e r e may b e in addition some a p r i o r i c o n s t r a i n t s imposed on t h e p a r a m e t e r s s u c h as nonnegativity c o n s t r a i n t s on t h e e l a s t i c i t i e s , see Liew (1976), a r e q u i r e d p r e s i g n e d positive d i f f e r e n c e between input a n d o u t p u t tonnage d u e t o t h e meeting loss, A r t h a n a r i a n d Dodge (1981). As- sume t h a t t h e s e c o n s t r a i n t s are of t h e form

where A(m, p ) , c(m, 1 ) a r e given m a t r i c e s . The use of t h e least s q u a r e s method l e a d s t o t h e optimization problem:

2

minimize ^{J = I}

z " I

^y,

^-

^{i = 1}

f

^xi,

^pi]

s u b j e c t t o

f

^akl ⁵^ck,^k

⁼

^1.

^. ^{. .} .

^m^,

1 =1

which c a n b e solved by q u a d r a t i c programming techniques.

In o u r g e n e r a l framework, problem (2.1) c o r r e s p o n d s t o t h e case of o b j e c t i v e function:

=+

o t h e r w i s e

(9)

with t h e P V t h e e m p i r i c a l d i s t r i b u t i o n s .

Alternatively, minimizing t h e sum of absolute e r r o r s c o r r e s p o n d s t o t h e optimization problem

s u b j e c t t o

5

^an

^Pi

⁵^ck

.

¹⁵^k⁵^m

.

i =1

which c a n b e solved by means of t h e simplex method f o r l i n e a r programming, see e.g. A r t h a n a r i a n d Dodge (1981). The formulation of (2.3) i s again based o n t h e em- p i r i c a l d i s t r i b u t i o n function P v , t h e o b j e c t i v e functions is:

=+

o t h e r w i s e

Note, t h a t t h i s function f i s not d i f f e r e n t i a b l e on S.

Finally, when robustizing t h e l e a s t s q u a r e s a p p r o a c h , instead of minimizing a sum of s q u a r e s a sum of l e s s r a p i d l y i n c r e a s i n g functions of r e s i d u a l s i s minimized, see e . g . H u b e r (1973):

minimize p y,

- 5

^xi,

J =1

[

^{i = l}

s u b j e c t l o

2

^ski

^PI

⁵^{c k} ^, ¹⁵^k^S^m

^.

1 =1

The function p i s assumed t o b e convex, non-monotone a n d t o p o s s e s s bounded d e r i v a t i v e s of sufficiently high o r d e r , e.g.

p(u)

=

-u2 1

2 f o r J u ( < c

= c ) u l - - c 2 1 f o r ) u 1 5 c

.

2

This a l s o f i t s t h e g e n e r a l framework; t h e o b j e c t i v e function is:

(10)

=+

= o t h e r w i s e

and t h e empirical d i s t r i b u t i o n function P V i s again used t o o b t a i n (2.5).

EXAMPLE 2.2 Heywood cases i n factor analysis. The model f o r confirmative f a c t o r analysis ( J o r e s k o g (1969)) is

where x(n, 1 ) i s a column v e c t o r containing t h e o b s e r v e d v a r i a b l e s , f i s a column v e c t o r containing t h e k common f a c t o r s , e ( n , 1 ) i s a column v e c t o r containing t h e individual p a r t s of t h e o b s e r v a b l e s components a n d A(n, k ) i s t h e matrix of f a c t o r loadings. I t is assumed t h a t f and e are normally d i s t r i b u t e d with mean z e r o , v a r f

=

⁸a n d v a r e = Q, which i s diagonal. Consequently, x i s normally d i s t r i b u t e d with mean z e r o a n d with t h e v a r i a n c e matrix

The p a r a m e t e r v e c t o r c o n s i s t s of t h e f r e e elements of A, 9 a n d cP and i t should b e estimated using t h e sample v a r i a n c e matrix S of o b s e r v a b l e s x. This is done by minimizing a s u i t a b l e fitting function, s u c h as

f l ( z , S)

=

log

I +

t r ( S C - l )

-

log ( S

I -

ⁿ (2.8) ( t h e maximum likelihood method), o r

where V i s a matrix of weights ( t h e weighted l e a s t s q u a r e s method). Evidently, both (2.8) a n d (2.9) with (2.7) s u b s t i t u t e d f o r C, are o b j e c t i v e functions of non- t r i v i a l unconstrained optimization problems, which c a n b e solved by d i f f e r e n t methods s u c h a s t h e method of Davidon-Fletcher-Powell ( s e e F l e t c h e r a n d Powell (1963) o r by t h e Gauss-Newton algorithm. In p r a c t i c e , however, a b o u t o n e t h i r d of t h e d a t a yield o n e o r more nonpositive estimates of t h e diagonal elements of t h e matrix 9, which a r e individual v a r i a n c e s . These solutions are called Heywood cases and t o d e a l with them, (2.8) or (2.9) should b e minimized u n d e r conditions

2 0 , i

=

1,

. . .

, n. Thus t h e a p p r o p r i a t e formulation defines f as follows:

=+ =

o t h e r w i s e

(11)

and similarly f o r f2.

EXAMPLE 2.3 N e g a t i v e e s t i m a t e s of v a r i a n c e c o m p o n e n t s . Consider a gen- e r a l l i n e a r model with random e f f e c t s

where y ( v , 1 ) i s t h e v e c t o r of o b s e r v a t i o n s on t h e v a r i a b l e y , Z(v, r ) , Xi(v, r i ) , i

=

1

. . .

, p a r e mutually u n c o r r e l a t e d random v e c t o r s with E

pi =

0 , v a r

pi =

u f I r , , i =1,

.

^.

.

, p and Er = O . v a r r

=

U ~ I , , a n d 71,

. . .

7rs uoB--.n u p ² ² a r e unknown p a r a m e t e r s t o b e estimated.

One of t h e simplest examples i s t h e following v a r i a n c e a n a l y s i s model f o r r a n - dom e f f e c t one-way classification: Consider k populations w h e r e t h e j-th measure- ment ( o b s e r v a t i o n ) in t h e i-th population is given by

In (2.11), p i s t h e fixed e f f e c t , a i , i

=

1,

. . .

, k , i s t h e random e f f e c t of t h e i-th population a n d el, i s r e s i d u a l . Random v a r i a b l e s a l ,

. . .

^,a k a n d e l l ,

. . .

, e k n are in- d e p e n d e n t with d i s t r i b u t i o n s N(0, 0:) and N(0, u:), r e s p e c t i v e l y . The p a r a m e t e r s p, u z , u z are t o b e estimated. The t r a d i t i o n a l e s t i m a t e s of t h e v a r i a n c e components u:, u: in model (2.11) are o b t a i n e d by a simple p r o c e d u r e : o n e e q u a t e s t h e mean s q u a r e s

and

1 1 T k

w h e r e f i e

= - CTzl

^{yi,. i}

=

I .

. . . .

^{k ,}^{a n d}

^7.. ⁼ ^;;i; ^Li ^C;=I

yi,, with t h e i r ex- n

p e c t a t i o n s u: a n d u:n

+

u: t h a t give t h e e s t i m a t e s

(12)

Whereas sz i s evidently nonnegative, t h i s need n o t b e t h e case of

si,

s o t h a t t h e problem of negative e s t i m a t e of t h e v a r i a n c e component s? comes t o t h e f o r e .

The r e s u l t i n g e s t i m a t e s (2.12), (2.13) of t h e v a r i a n c e components in (2.11) follow a l s o as a s p e c i a l r e s u l t of t h e MIVQUE and MINQUE estimation developed f o r t h e g e n e r a l model (2.10): Unbiased estimates of a l i n e a r p a r a m e t r i c function

zf,o

ofqi a r e s o u g h t in t h e form y T ~ y where

AZ

=

0 , A(v, v) i s symmetric matrix (2.14)

a n d which a r e optimal in some s e n s e . The MIVQUE e s t i m a t e s c o r r e s p o n d to a matrix A t h a t minimizes t h e v a r i a n c e of y T ~ y s u b j e c t t o t h e conditions (2.14) a n d t h e MINQUE e s t i m a t e s c o r r e s p o n d to a matrix A t h a t minimizes tr(A(1

+ zf=l

^Xi^x:))'

s u b j e c t t o conditions (2.14). In none of t h e mentioned a p p r o a c h e s , however, t h e na- t u r a l nonnegativity c o n s t r a i n t s on t h e estimates of t h e v a r i a n c e s a:, i

=

1,

. . .

P I a r e i n t r o d u c e d explicitly.

Again, t h e r e are two possible explanations of negative e s t i m a t e s of v a r i a n c e components: t h e model may b e i n c o r r e c t or a s t a t i s t i c a l noise o b s c u r e d t h e under- laying situation. Among o t h e r s , H e r b a c h (1959) a n d Thompson (1962) s t u d i e d v a r i - a n c e analysis models with random e f f e c t s by means of d i f f e r e n t v a r i a n t s of t h e maximum likelihood method u n d e r nonnegativity c o n s t r a i n t s . Correspondingly, in t e r m s of t h e g e n e r a l model, w e h a v e f o r i n s t a n c e

-- nk -- ^k -- ^{k(n -1)}

f(a,2, a:, P, Y)

=

(.rr) (0: +nu:) (a:) 2

- -- -

o t h e r w i s e ,

I :

Similarly, nonnegative MINQUE a n d MIVQUE e s t i m a t e s are of i n t e r e s t . e x p

--

1

EXAMPLE 2 . 4 M-estimates. Let 8 b e a given locally compact p a r a m e t e r s e t , ( Z , A , P ) a p r o b a b i l i t y s p a c e a n d f : E9 x Z -+ R a given function. F o r a sample

Itl, . . .

, from t h e c o n s i d e r e d distribution, a n y estimate TV

=

TV(C1,

. . . .

^Cv)

E O defined by condition k n

C C

^{( ~ 1 ,}

-

^{P ) ~}

-

⁰^:

20: I = l J = l ^U^: +nu:

"li:

^{= I}^.I,

^- upr]J

(13)

v

T" E argmin f(T t j )

j = 1

i s called a n M-estimate. In t h e pioneering p a p e r by H u b e r (1967) ( s e e a l s o H u b e r (1981)), n o n s t a n d a r d s u f f i c i e n t conditions were given u n d e r which jl"j c o n v e r g e s a.s. ( o r in p r o b a b i l i t y ) to a c o n s t a n t go ^E8 a n d asymptotic normality of G ( T '

-

g o ) w a s p r o v e d u n d e r assumption t h a t 8 i s a n o p e n set.

The problem (2.15) i s evidently a s p e c i a l case of o u r g e n e r a l framework; t h e P v again c o r r e s p o n d to t h e empirical d i s t r i b u t i o n functions a n d w e h a v e uncon- s t r a i n e d c r i t e r i o n function. W e s h a l l aim to remove both of t h e s e assumptions to g e t r e s u l t s valid f o r a whole c l a s s of p r o b a b i l i t y m e a s u r e s P v estimating P , which c o n t a i n s t h e empirical p r o b a b i l i t y m e a s u r e c o n n e c t e d with t h e o r i g i n a l definition

(2.15) of M-estimates, a n d f o r c o n s t r a i n e d estimates.

EXAMPLE 2.5 S t o c h a s t i c o p t i m i z a t i o n w i t h incompLete i q f o r m a t i o n . Con- s i d e r t h e following decision model of s t o c h a s t i c optimization:

Given a p r o b a b i l i t y s p a c e ( Z , A , P ) , a random element

<

^{o n}^Z,^ameasurable function f : R n x E ^-4R a n d a set S cRn

minimize E l f ( x , C)j

=

J f ( x , C)P(d<) on t h e set S c R n

.

(2.16) 2!

A wide v a r i e t y of s t o c h a s t i c optimization problems, e.g., s t o c h a s t i c p r o g r a m s with r e c o u r s e or p r o b a b i l i t y c o n s t r a i n e d models ( s e e e.g. Dempster (1980), Ermo- liev et a l . (1985), Kall (1976), P r d k o p a (1973), W e t s (1983)) f i t i n t o t h i s a b s t r a c t framework.

In many p r a c t i c a l s i t u a t i o n s , however, t h e p r o b a b i l i t y m e a s u r e P need not b e known completely. One possibility how t o d e a l with s u c h a s i t u a t i o n i s t o estimate t h e optimal solution x* of (2.16) by a n optimal solution of t h e problem

minimize

J

f ( x , C) P V ( d < ) o n t h e set S c R n P

where P v i s a s u i t a b l e e s t i m a t e of P b a s e d on t h e o b s e r v e d d a t e s . In t h i s c o n t e x t , t h e r e are d i f f e r e n t possibilities to e s t i m a t e o r a p p r o x i m a t e P a n d t h e u s e of em- p i r i c a l d i s t r i b u t i o n i s only o n e of them. The c a s e of P belonging to a given p a r a m e t r i c family of p r o b a b i l i t y m e a s u r e s b u t with a n unknown p a r a m e t e r v e c t o r w a s s t u d i e d e.g. i n DupaEovh (1984a, b).

(14)

F o r problem (2.16), l a r g e dimensionality of t h e decision v e c t o r x i s typical.

This c i r c u m s t a n c e t o g e t h e r with nondifferentiability ( o r e v e n with noncontinuity) of f a n d with t h e p r e s e n c e of c o n s t r a i n t s r a i s e s qualitatively new problems.

3. CONSISTENCY: CONVERGENCE OF OPTIMAL SOLUTIONS

From a c o n c e p t u a l viewpoint o r f o r t h e o r e t i c a l p u r p o s e s , i t i s convenient as well as e x p e d i e n t to s t u d y problems of s t a t i s t i c a l estimation as well as s t o c h a s t i c optimization problems with p a r t i a l information, in t h e following g e n e r a l framework.

Let ( Z , A , P ) b e a p r o b a b i l i t y s p a c e , with Z

-

t h e s u p p o r t of P

-

a closed s u b s e t of a Polish s p a c e X , a n d A t h e Bore1 sigma-field r e l a t i v e to Z; w e may think of Z as t h e set of possible v a l u e s of t h e random element

t

defined o n t h e p r o b a b i l i t y s p a c e of e v e n t s ( Q , A ',

p').

If P i s known, t h e problem i s to:

find x* E

R n

t h a t minimizes Ef (x) , (3.1)

where

a n d

i s a random lower semicontinuous function; w e set

whenever

t

^kf ( x , t ) i s n o t bounded a b o v e by a summable (extended real-valued) function. W e r e f e r to

dom E f :

=

[x lEf(x)

< -1

as t h e eflective d o m a i n of Ef. P o i n t s t h a t d o n o t belong t o dom Ef c a n n o t minimize Ef and t h u s are e f f e c t i v e l y excluded from t h e optimization problem (3.1). Hence, t h e model makes s p e c i f i c provisions f o r t h e p r e s e n c e of c o n s t r a i n t s t h a t may limit t h e c h o i c e of x . Note t h a t by definition of t h e i n t e g r a l , w e always h a v e

dom Ef c l x I f ( x , t )

< -

^a.s.1

An e x t e n d e d real-valued function h :

R n

--,

=

[ - -,

-1

i s s a i d t o b e proper if

(15)

h

>-

⁰⁰a n d n o t i d e n t i c a l l y

+

=; i t i s l o w e r s e m i c o n t i n u o u s (1.sc.) at x if f o r a n y s e q u e n c e (x )[=1, k c o n v e r g i n g to x

lim inf h ( x k ) 2 h ( x ) , k - + -

w h e r e t h e q u a n t i t i e s involved c o u l d b e

=

or

-=.

T h e e x t e n d r e a l - v a l u e d f u n c t i o n f d e f i n e d o n R n X Z i s a r a n d o m l o w e r s e m i c o n t i n u o u s f i L n c t i o n if

f o r a l l ( E

r ,

^{f ( . ,}⁽⁾i s l . s c . (3.31)

f i s Bn 63 A

-

m e a s u r a b l e (3.3ii)

w h e r e Bn i s t h e Bore1 sigma-field o n Rn. This c o n c e p t , u n d e r t h e name of "normal i n t e g r a n d " , w a s i n t r o d u c e d b y R o c k a f e l l a r (1976), as a g e n e r a l i z a t i o n of C a r a t h e o - d o r y i n t e g r a n d s , to h a n d l e p r o b l e m s in t h e Calculus of V a r i a t i o n s a n d Optimal Con- t r o l T h e o r y . When d e a l i n g with p r o b l e m s of t h a t t y p e , as well as s t o c h a s t i c optimi- z a t i o n p r o b l e m s s u c h as (3.1), t h e t r a d i t i o n a l tools of f u n c t i o n a l a n a l y s i s are n o l o n g e r q u i t e a p p r o p r i a t e . T h e c l a s s i c a l g e o m e t r i c a l a p p r o a c h t h a t a s s o c i a t e s func- t i o n s wiLh t h e i r g r a p h must b e a b a n d o n e d in f a v o r of a new g e o m e t r i c a l viewpoint t h a t a s s o c i a t e s f u n c t i o n s with t h e i r " e p i g r a p h s " ( o r h y p o g r a p h s ) , f o r m o r e a b o u t t h e motivation a n d t h e u n d e r l y i n g p r i n c i p l e s of t h e e p i g r a p h i c a l a p p r o a c h c o n s u l t R o c k a f e l l a r a n d Wets (1984). T h e e p i g r a p h of a f u n c t i o n h : R n ^-+

R

i s t h e set

e p i h = [ ( x , a ) E R n x R ( h ( x ) 5 a j

.

R o c k a f e l l a r (1976) s h o w s t h a t f : R n X E ^-+

R

i s a random l.sc. f u n c t i o n if a n d only if

t h e multifunction ( k e p i f ( . , () i s nonempty, closed-valued , (3.4i) t h e multifunction

t

^ke p i f ( - ,

C)

i s m e a s u r a b l e ; (3.4ii) r e c a l l t h a t a multifunction ( b r([) : E ^-+ Rn + l i s m e a s u r a b l e if f o r a l l c l o s e d sets F C R " + ~

f o r f u r t h e r d e t a i l s a b o u t m e a s u r a b l e multifunctions see R o c k a f e l l a r (1976), C a s t a - ing a n d V a l a d i e r (1976), a n d t h e b i b l i o g r a p h y of Wagner (1977) s u p p l e m e n t e d b y I o f f e (1978). W e s h a l l u s e r e p e a t e d l y t h e following r e s u l t d u e to Yankov, von Neu- man, a n d Kuratowski a n d R y l l Nardzewski.

(16)

PROPOSITION 3 . 1 Theorem of Measurable Selections. If

r :

^E

²

^Rni s a closed- v a l u e d measurable m u l t m n c t i o n , t h e n there e z i s t s a least one measurable selector, i.e. a measurable f u n c t i o n x : dom

r

^--,^Rns u c h t h a t for all E dom

r,

x (C) E r(C), v h e r e dom

r

^:

= C

^E^Z

1

^r(C)^#

4 1 =

r - ' ( ~ ~ ) E A

.

F o r a proof s e e R o c k a f e l l a r (1976), f o r example. As immediate c o n s e q u e n c e s of t h e definition (3.3) of random l.sc

.

functions, t h e equivalence with t h e conditions (3.4) a n d t h e p r e c e d i n g p r o p o s i t i o n , w e have:

PROPOSITION 3 . 2 Let f : Rn x E --, be a r a n d o m 1.sc. f u n c t i o n . Then for a n y A m e a s u r a b l e f u n c t i o n x : Z --, Rn, t h e f u n c t i o n

Moreover, t h e i n f i m a l f u n c t i o n

tt-+

inf f ( - , C):

=

i n f x E R n f ( x , C)

i s A-measurable, a n d t h e set of optimal s o l u t i o n

t k

argmin f(., C):

=

f x I f ( x , t )

=

inf f ( . , C)j

i s a closed-valued measurable m u l t ~ n c t i o n from Z i n t o Rn, a n d this implies t h a t t h e r e e x i s t s a measurable f u n c t i o n

k x*(t) : dom (argmin f(., ,$))

2

Rn

s u c h t h a t x * ( t ) m i n i m i z e s f ( - , C) whenever argmin f (., ,$)

+ 4.

F o r a s u c c i n c t p r o o f , s e e S e c t i o n 3 of R o c k a f e l l a r and Wets (1984).

If instead of P , w e only h a v e limited information a v a i l a b l e a b o u t P - e.g. some knowledge a b o u t t h e s h a p e of t h e distribution a n d a finite sample of values of

C

o r

# . a

of a function of ,$

-

- t h e n to e s t i m a t e x* we usually h a v e t o r e l y on t h e solution of a n optimization problem t h a t "approximates" (3.1), viz.

find x v E R n t h a t minimizes E v f ( x ) where

The measure P v i s n o t n e c e s s a r i l y t h e empirical m e a s u r e , b u t more g e n e r a l l y t h e

(17)

"best" (in t e r m s of a given c r i t e r i o n ) a p p r o x i m a t e t o P on t h e b a s i s of t h e information available. A s more information i s c o l l e c t e d , w e could r e f i n e t h e approximation t o P a n d hopefully find a b e t t e r estimate of x

* .

To model t h i s p r o c e s s , w e r e l y on t h e following set-up: l e t (Z, F, p ) b e a sample s p a c e with ( F v ) r = l a n i n c r e a s i n g se- q u e n c e of sigma-field contained in F. A sample

< - -

e.g.

< ⁼ It1, t'....

j obtained by independent sampling of t h e values of

,.. t --

l e a d s u s t o a s e q u e n c e IPv(-, <), v

=

1,

...

j of p r o b a b i l i t y m e a s u r e s defined on (Z, A ). Since only t h e information collected up t o s t a g e v c a n b e used in t h e choice of P v , w e must a l s o r e q u i r e t h a t f o r a l l A E A

S i n c e PV d e p e n d s on <, s o d o e s t h e a p p r o x i m a t e problem (3.5), in p a r t i c u l a r i t s solution x '. A s e q u e n c e of e s t i m a t o r s

is (strongly) c o n s i s t e n t if p-almost s u r e l y t h e y c o n v e r g e t o x

*

, t h i s , of c o u r s e , implies weak consistency ( c o n v e r g e n c e in probability).

The following r e s u l t s e x t e n d t h e c l a s s i c a l Consistency Theorem of Wald (1940) a n d t h e e x t e n s i o n s by H u b e r (1967), t o t h e more g e n e r a l s e t t i n g laid o u t h e r e a b o v e . Consistency i s obtained by relying on assumptions t h a t are w e a k e r t h a n t h o s e of H u b e r (1967) e v e n in t h e unconstrained c a s e . To d o s o , w e r e l y on t h e t h e o r y of epi-convergence in conjunction with t h e t h e o r y of random sets (measur- a b l e multifunctions) and random l.sc. functions.

A s e q u e n c e of functions Ig ': R n -+ R,

-

v

=

1,.

..

j i s said t o e p i - c o n v e r g e t o g : R" ^-+

R

if f o r a l l x in Rn, we h a v e

lim inf g "(x ') 2 g(x) f o r a l l I x V j r = l c o n v e r g i n g t o x ,

v + m

and

f o r some I x V j c o n v e r g i n g t o x , lim s u p g V ( x V ) ^EGg ( x )

.

v + - (3.8)

Note t h a t a n y o n e of t h e s e conditions imply t h a t g i s lower semicontinuous. W e t h e n s a y t h a t g i s t h e e p i - l i m i t of t h e g V , a n d write g

=

epi-lim,, ,gv. W e r e f e r t o t h i s t y p e of c o n v e r g e n c e as epi-convergence, s i n c e i t i s equivalent o t t h e set- c o n v e r g e n c e of t h e e p i g r a p h s . F o r more a b o u t epi-convergence and i t s p r o p e r t i e s , consult Attouch (1984). Our i n t e r e s t in epi-convergence stems from t h e f a c t t h a t

(18)

from a variational viewpoint i t is t h e weakest t y p e of convergence t h a t possesses t h e following p r o p e r t i e s :

PROPOSITION 3.3 [Attouch and Wets (1981), Salinetti and Wets (1986)l. Sup- pose 1g; g V : R n -+ R, v

- =

1,

...

j i s a collection of functions s u c h that g

=

epi -1im

,, ,

,gV. Then

lim s u p (inf gV)

s

inf g , v + -

a n d , ^ig

x k E argmin g V k for some subsequence

1

vk, k

=

I , . .

.

j and x

=

limk ,,xk, i t follows that

x E argmin g , and

lirn (inf gVk)

=

inf g ; k + -

so in particular ig there e x i s t s a bounded set D c Rn s u c h that for some subse- quence

1

vk, k

=

1,

...

j,

argmin g V k

n

D

+

^$ ^,

t h e n the m i n i m u m o f g i s attained at some point in the closure of D.

Moreover, ig argmin g

+

^$, ^{t h e n} ^lim,

^,

^,(inf g v )

=

inf g ig and o n l y ig x E argmin g implies the existence of sequences

I&,

^r^0,^v

=

1 ,

...

j and l x V E Rn, v

=

1 ,

...

j w i t h

lirn E,,

=

0 , and lirn x V

=

x

v ^{+ -} v + -

s u c h that for all u

=

1,

...

x V E E,

-

argmin g V :

=

Ix ( g V ( x )

s

E,,

+

inf g v j

.

The next theorem t h a t p r o v e s t h e p-almost s u r e epi-convergence of e x p e c t a - tion functionals, is build upon approximation r e s u l t s f o r s t o c h a s t i c optimization problems, f i r s t derived in t h e c a s e f(.,

C)

convex (Theorem 3.3, Wets (1984)), and l a t e r f o r t h e locally Lipschitz c a s e (Theorem 2.8, Birge and Wets (1986)). W e work with t h e following assumptions.

(19)

ASSUMPTION 3.4 "Continuities" o f f . The f i n c t i o n

w i t h

dom f :

=

{ ( x , # ) l f ( x , #)

< ={

^{= S}^X^{E , S}^c^R"closed a n d n o n e m p t y , i s s u c h that for a l l x E S ,

#

^t-b f ( x , #) i s c o n t i n u o u s o n E , a n d for a l l

#

^EE

a n d Locally Lower L i p s c h i t z o n S , in t h e f o l l o w i n g sense: t o a n y x in S , t h e r e c o r r e s p o n d s a n e i g h b o r h o o d V of x a n d a b o u n d e d c o n t i n u o u s f i n c t i o n

8 :

E -+ R s u c h t h a t f o r a l l x ' E V

n

^S^{a n d}

#

^EZ,

ASSUMPTION 3.5 Convergence i n distribution. G i v e n t h e s a m p l e s p a c e ( Z , F , p) a n d a n i n c r e a s i n g s e q u e n c e of s i g m a - f i e l d s (Fv),"=l c o n t a i n e d i n F, Let

P V : A

x

Z ^-+[0, I], v

=

^1,

...

be s u c h t h a t for a l l ( E Z

P v ( . , () i s a p r o b a b i l i t y m e a s u r e o n ( E , A ) , a n d f o r a l l A E A

(t-b P v ( A , () i s F v - m e a s u r a b l e

.

For p-almost a l l ( in Z, t h e s e q u e n c e

P V , ) v

=

1 . . c o n v e r g e s in d i s t r i b u t i o n t o P ,

a n d w i t h P

=

^:^{P O ( -} , (), f o r a l l x E S , t h e s e q u e n c e l P v ( . , (){

r=O

^{i s}f ( x , - ) - t i g h t ( a s y m p t o t i c n e g l i g i b i l i t y ) , i.e. t o e v e r y x E S a n d E

>

0 t h e r e c o r r e s p o n d s a com- p a c t set

K,

_c s u c h t h a t f o r v

=

0 , 1,

...

j E \ K e l f ( x , # ) l P V ( d # . <)

<

^E

.

a n d

(20)

The assumption t h a t

<I+ dorn f ( . , <):

=

l x I f ( x . <)

<

-f

=

S

i s c o n s t a n t , which i s s a t i s f i e d by a l l t h e e x a m p l e s in S e c t i o n 2, may a p p e a r m o r e r e s t r i c t i v e t h a n i t a c t u a l l y i s . Indeed, i t i s e a s y to see t h a t

dorn Ef

= n

dorn f (. , <) , ( E L

if Z i s t h e s u p p o r t of t h e m e a s u r e

P

a n d f o r a l l x ^€

n C , ~

dorn f ( . , <), t h e function f ( x , .) i s bounded a b o v e by a summable function. Then, with S

= nC,

²dorn f ( . , <) and

f ( x , [) if x E S

+ -

o t h e r w i s e ,

we may as well work with f + i n s t e a d of f , s i n c e

and now [ k dorn f + ( . , [)

=

S i s c o n s t a n t .

Assumption 3.4 implies t h a t f i s a random lower semicontinuous function (normal i n t e g r a n d ) . Indeed, f o r a l l [ ^€=, f ( . , [) i s p r o p e r and lower semicontinuous

(3.3.i) and (x, [) k f ( x , [) i s B" 60 A-measurable (3.3.ii) s i n c e f o r a l l a E R , l e v , f : = {(x, [)lf(x, [ ) S a f i s c l o s e d

.

To s e e t h i s , s u p p o s e {(xk, [ k ) f r = l C lev,f i s a s e q u e n c e c o n v e r g i n g to (x, [); t h e n from Assumption 3.4 we h a v e t h a t f o r k sufficiently l a r g e , and a l l

#

in p a r t i c u l a r

w h e r e

B =

max(, @([) i s f i n i t e , s i n c e B(.) i s bounded. Now

#

k f ( x , #) i s continu- o u s o n Z, t h u s t a k i n g limits as k g o e s to a, w e o b t a i n

f ( x , [) 6 a

+ B

^{lim Ilx}

-

^xkll

=

a ,

k-*-

(21)

i.e. (x, C) E lev,f. Since f is a random l.sc. function if follows from P r o p o s i t i o n 3.2 t h a t

i s measurable. Thus condition (3.12) d o e s not s n e a k in a n o t h e r measurability condition, i t r e q u i r e s simply t h a t t h e measurable function 7 b e quasi-integrable.

H u b e r (1967), as well as o t h e r s see e.g. Ibragimov a n d Has'minski (1981), as- sumes t h a t S is open. S i n c e c o n s t r a i n t s usually d o n o t involve s t r i c t inequalities, t h i s i s a n u n n a t u r a l r e s t r i c t i o n , e x c e p t when t h e r e are no c o n s t r a i n t s , i.e. S

=

Rn in which case S i s a l s o closed. In any c a s e , w h a t e v e r b e t h e optimality r e s u l t s o n e may b e a b l e t o p r o v e with S o p e n , t h e y remain valid when S i s r e p l a c e d by i t s clo- s u r e , assuming minimal continuity p r o p e r t i e s f o r t h e e x p e c t a t i o n functionals, b u t t h e c o n v e r s e d o e s n o t hold.

To simplify notations w e s h a l l , whenever i t i s convenient, d r o p t h e e x p l i c i t r e f e r e n c e of t h e d e p e n d e n c e o n

<

of t h e p r o b a b i l i t y m e a s u r e s P v a n d t h e r e s u l t i n g e x p e c t a t i o n functionals E v f , n o n e t h e l e s s t h e r e a d e r should always b e aware t h a t a l l p-as. s t a t e m e n t s r e f e r t o t h e underlying p r o b a b i l i t y s p a c e (Z, F, p ) . W e begin by showing t h a t Ef, as well as t h e Evf, are well-defined functions.

LEMMA 3.6 U n d e r A s s u m p t i o n s 3.4 a n d 3.5, t h e r e e x i s t s Zo E F. p(Zo)

=

1 s u c h t h a t for a l l

<

^E^{ZO, Ef}a n d lEvf, v

=

I , . .

.

j a r e p r o p e r lower s e m i c o n t i n u o u s a n c t i o n s s u c h t h a t

S

=

dom Ef

=

dom Evf(., <)

o n w h i c h t h e e x p e c t a t i o n a n c t i o n a l s a r e f i n i t e .

PROOF Let us f i r s t f i x <, a n d assume t h a t f o r t h i s

<

a l l t h e conditions of As- sumption 3.5 are satisfied. If x C S , t h e n f(x, [)

= =

f o r a l l

C

in

=

a n d h e n c e Ef

=

EVf

=

=, i.e.,

S 3 dom E f , S 3 dom EVf

.

With PO

=

P , f o r x E S a n d a n y E

>

^{0 ,}t h e r e i s a compact set K c (Assumption 3.5) s u c h t h a t

(22)

as follows from (3.11) a n d t h e f a c t t h a t f ( x , .) i s continuous a n d f i n i t e on K c c E . Thus Evf (x)

<

w.

The f a c t t h a t Ef

> -

^w,a n d Evf

> -

⁰⁰follows d i r e c t l y from condition (3.12). I t i s also t h i s condition t h a t we use to show t h a t t h e e x p e c t a t i o n f u n c t i o n a l s are lower semicontinuous s i n c e i t allows u s to a p p e a l to Fatou's Lemma to obtain: given

) x

1

⁼^: a s e q u e n c e c o n v e r g i n g to x ; l i m i n f E f ( x V ) 2

f

lim f ( x v , #)P(dt)

v + = ' v + -

w h e r e t h e l a s t inequality follows from t h e lower semicontinuity of f(., t ) at x. Of c o u r s e , t h e same s t r i n g of inequalities holds f o r all ) P V , v

=

1 ,

... 1.

S i n c e t h e a b o v e holds f o r e v e r y v p-almost s u r e l y on Z, t h e set Z,

=

) { E Z J E V f ( . , {) i s f i n i t e , 1-sc. on S, f o r v

=

0, 1

,... 1

i s of m e a s u r e 1.0

THEOREM 3.7 S u p p o s e )E 'f, v

=

1

,... 1

i s a s e q u e n c e of e z p e c t a t i o n f u n c - t i o n a l ~ d e f i n e d b y

a n d E f ( x )

=

E ) f ( x , #){ s u c h t h a t f a n d t h e c o l l e c t i o n ) P ; P V , v

=

1,

... 1

s a t i s & As- s u m p t i o n s 3.4 a n d 3.5. Then, p-almost s u r e l y

Ef

=

e p i -1im EVf

=

ptwse -1im EVf

v + = ' V + = '

w h e r e ptwse-lim,, ,Evf d e n o t e s t h e p o i n t w i s e l i m i t .

PROOF The a r g u m e n t e s s e n t i a l l y follows t h a t of Theorem 2.8 Birge and W e t s (1986), with minor modifications to t a k e care of t h e slightly w e a k e r assumptions a n d t h e f a c t t h a t t h e e x p e c t a t i o n functionals d e p e n d o n

<.

W e begin b y showing t h a t p-almost s u r e l y Ef i s t h e pointwise limit of t h e E V f . W e fix { E Z, and assume t h a t t h e conditions of Assumption 3.5 are s a t i s f i e d f o r t h i s p a r t i c u l a r

<.

S u p p o s e x E S , a n d set

From condition (3.11), i t follows t h a t f o r a l l E

>

^{0 ,}t h e r e i s a compact set K c s u c h t h a t f o r a l l v

(23)

L e t 7,:

=

m a x t E K t l h ( # ) ) . W e know t h a t 7, i s f i n i t e s i n c e

K t

i s c o m p a c t a n d h i s con- t i n u o u s o n Z (Assumption 3 . 4 ) . L e t h C b e a t r u n c a t i o n of h , d e f i n e d b y

I

h(#) if Ih(#)

I ^s

^7,

he(#)

=

7, if h ( t )

>

7 c

-

7, if h ( t )

<

7,

T h e f u n c t i o n h C is bounded a n d c o n t i n u o u s , a n d f o r all

#

in Z IhC(#)I

s

lh(#)l

Now, f r o m t h e c o n v e r g e n c e i n d i s t r i b u t i o n of t h e P Y , lim [a::

=

/ E h c ( # ) ~ u ( d # ) ]

=

/ E h c ( # ) ~ ( d # ) :

=

a t

.

,+-

M o r e o v e r , f o r all v

Now, let

W e h a v e t h a t f o r all v

la,

-

a,CI

=

~ & , ~ ~ ( h ( # )

-

h c ( 0 ) P V ( d # ) (

<

^{2 r}

.

a n d also

( E f ( x )

-

^aCI

<

^{2 r}

T h e s e t w o last e s t i m a t e s , when u s e d in c o n j u n c t i o n with (3.13) yield: f o r all E

>

0 J E f ( x )

-

awl

<

6 r

.

Thus f o r all x i n S

E f ( x )

=

lim E Y f ( x )

=

lim a, ,

u - + - u - + -

a n d s i n c e , by Lemma 3.6, S

=

dom Ef

=

dom EYf ,

(24)

i t means t h a t Ef

=

ptwse -limv, ,Evf, and t h a t condition (3.8) of epi-convergence i s s a t i s f i e d , s i n c e w e c a n c h o o s e I x v

=

x f o r t h e s e q u e n c e converging t o x .

T h e r e r e m a i n s t o v e r i f y condition (3.7) of epi-convergence. If x @ S , t h e n f o r e v e r y s e q u e n c e lx '{

rZl

converging t o x , s i n c e S i s closed we h a v e t h a t x u @ S f o r

v sufficiently l a r g e and h e n c e E V f ( x Y )

=

-, which implies t h a t lim inf EYf(x Y + Q ")

= -

²^{Ef (x)}

⁼ - ^.

If x E S, a n d

l ~ ' { , " = ~

^{i s}a s e q u e n c e converging t o x , unless x v i s in S infinitely o f t e n , lim inf,, , E Y f ( x Y )

=

^-, and t h e n condition (3.7) i s t r i v i a l l y s a t i s f i e d . S o l e t u s assume t h a t !X c S. F o r v sufficiently l a r g e , from (3.10) i t follows t h a t t h e r e i s a bounded continuous function

B

s u c h t h a t

I n t e g r a t i n g both s i d e s with r e s p e c t t o PV, and taking lim inf,, ,, w e obtain lirn EYf(x)

-

lim B Y .

I ~ x

- x Y ( ( S lirn infEVf(xv)

LJ+m V + Q Y - Q

where

BV = J

^@(.$IPw(d.$) c o n v e r g e t o a finite limit s i n c e t h e P V c o n v e r g e in d i s t r i - bution t o P , and by pointwise c o n v e r g e n c e of t h e EYf t h i s yields

Ef (x) zs lim inf EVf (xu)

.

^O

v + -

To a p p l y in t h i s c o n t e x t , P r o p o s i t i o n s 3.2 a n d 3.3, we must show t h a t t h e e x - p e c t a t i o n functionals lEYf, v

=

I,..

.

{ are random l.sc. functions.

THEOREM 3.8 U n d e r Assumptions 3.4 and 3.5, t h e e z p e c t a t i o n f ' u n c t i o n a L s E ~ ~ : R " X Z

-+E,

f o r v = I , .

. .

^,

a r e p-almost s u r e l y r a n d o m l o w e r s e m i c o n t i n u o u s f'unctions, s u c h t h e < k epi Evf ^{( a ,} <) is F"measurab1e.

PROOF Lemma 3.6 shows t h a t t h e r e e x i s t s a set ZO c Z of p-measure 1 s u c h t h a t f o r a l l

<

^EZO, t h e multifunction

<

^ke p i EYf(., <) : Z,

2

R n i s nonempty, closed-valued

.

This i s condition (3.4.i), t h u s t h e r e remains only t o e s t a b l i s h (3.4.ii), i.e.

<

^k^{epi EYf}^{(., <)}i s FY-measurable

.

(25)

f o r v

=

1,.

. . .

Theorem 3.7 p r o v e s t h a t with r e s p e c t to t h e topology of c o n v e r g e n c e in d i s t r i b u t i o n , t h e map

P V b e p i Evf i s continuous

.

Moreover, s i n c e

<

^b^PV(A,^<)i s Fv-measurable f o r a l l A E A , i t means t h a t given a n y f i n i t e c o l l e c t i o n of c l o s e d sets [F, c E J ~ , ~ a n d s c a l a r s [ f i i j f = l c

10,

I], t h e set

which means t h a t t h e function

<

^bP v ( . , <) : Z

- ^P

^:

⁼

t p r o b a b i l i t y m e a s u r e s on ( E , A ) j

i s Fv-measurable. To see t h i s , o b s e r v e t h a t t h e " c o n v e r g e n c e in distributionu- topology c a n b e o b t a i n e d f r o m t h e b a s e of o p e n sets

see Billingsley (1968), t h a t also g e n e r a t e t h e Bore1 field on P. Thus

<

^ke p i EVf(., <)

i s t h e composition of a continuous function, a n d a F v - m e a s u r a b l e f u n c t i o n , a n d h e n c e i s F v - m e a s u r a b l e . ~

In t h e proof of Theorem 3.8, we h a v e used t h e continuity of t h e map P V k e p i E V f , in f a c t Theorem 3.7 only p r o v e s epi-convergence, without i n t r o d u c i n g explicitly t h e epi-topology f o r t h e s p a c e of lower semicontinuous functions. The f a c t t h a t e p i - c o n v e r g e n c e i n d u c e s a topology on t h e s p a c e of l.sc. functions i s well- e s t a b l i s h e d , see f o r example Dolecki, S a l i n e t t i a n d Wets (1983) a n d Attouch (1984), a n d t h u s with t h i s p r o v i s o , Theorem 3.7 p r o v e s t h e epi-continuity of t h e map P V k

e p i EVf.

THEOREM 3.9 Consistency. U n d e r Assumptions 3.4 a n d 3.5 w e h a v e t h a t p- a l m o s t s u r e l y

lim s u p (inf E V f ) S inf Ef v + -

Moreover, t h e r e e z i s t s Zo E F w i t h p ( Z \ ZO)

=

0, s u c h t h a t

(i) for a l l

<

^EZO, a n y c l u s t e r p o i n t

9

of a n y s e q u e n c e tx ', v = 1,

... I

^{w i t h}^x ^E

argmin E V f V ( . , <) b e l o n g s t o argmin Ef (i.e. i s an o p t i m a l e s t i m a t e ) ,

(26)

(ii) f o r v

=

^1,.

.

<

^t,argmin EVf (. , <) : Zo

2

Rn ,

is a c l o s e d - v a l u e d F V - m e a s u r a b l e m u l t i f b n c t i o n .

In p a r t i c u l a r , if t h e r e is a compact s e t D c Rn s u c h t h a t f o r v

=

I,

...

(argmin ~ ~

n

fD is n o n e m p t y p-a.s. ) , and

tx* j

=

argmin Ef

n

^D ^,

t h e n t h e r e e x i s t txu:Z, ^--+Rnj,",l F V - m e a s u r a b l e s e l e c t i o n s of targmin E v f j F = l s u c h t h a t

x

* =

lim x V ( < ) f o r p - a l m o s t all

<

^,

u + -

and a l s o

inf Ef

=

lim (inf EVf) p-a.s.

.

v - w

PROOF The inequality (3.14) immediately follows from (3.9) a n d t h e epi- c o n v e r g e n c e p-almost s u r e l y of t h e e x p e c t a t i o n functionals EVf to Ef (Theorem 3.7) as d o e s t h e a s s e r t i o n (i) a b o u t c l u s t e r points of optimal solutions (Proposition 3.2). The f a c t t h a t (argmin E V f ) i s a closed-valued F v - m e a s u r a b l e multifunction follows from Theorem 3.8 a n d P r o p o s i t i o n 3.2.

Now s u p p o s e Zo c Z b e s u c h t h a t p(ZO)

= I ,

f o r a l l

<

E Z o , Ef

=

epi-lim,

,

,Evf, and f o r a l l v

=

1, ... , (argmin E v f )

n

D i s nonempty. F o r a l l v , _{t h e} multifunction

<

^h(argmin E v f ( . , <)

n

^D)^:^Zo

²

^Rn

i s nonernpty compact-valued, and Fv-measurable; i t i s t h e i n t e r s e c t i o n of two closed-valued m e a s u r a b l e multifunctions, see R o c k a f e l l a r (1976). Now f o r a n y

<

^EZO, l e t t Z v j r = l b e a n y s e q u e n c e in Rn s u c h t h a t f o r a l l ^{Y ,}

Zv(<) E argmin ~ " f ( . , < )

n

D

.

Then, a n y c l u s t e r point of t h e s e q u e n c e is in D, s i n c e i t i s c o m p a c t , a n d in argmin Ef as follows f r o m P r o p o s i t i o n 3.2. Actually, x

* =

limv,,xv. To see t h i s

(27)

n o t e t h a t , if x* i s not t h e limit point of t h e s e q u e n c e t h e r e e x i s t s a s u b s e q u e n c e I v k { F = s u c h t h a t f o r some b

>

0 , a n d a l l k

=

1,

. . .

,

s k ~ a r g m i n ~ ' L f n D , a n d J J X * - Z ~ ) ( > ~ ,

b u t t h i s i s c o n t r a d i c t e d by t h e f a c t t h a t t h i s s u b s e q u e n c e included in D c o n t a i n s a f u r t h e r s u b s e q u e n c e t h a t i s c o n v e r g e n t .

N o w , f o r v

=

1,

...

, l e t x V : Z ^-+

R n

b e a n Fv-measurable s e l e c t i o n of t h e Fv- m e a s u r a b l e multifunction < b (argmin Evf(., <)

n

D), c f . P r o p o s i t i o n 3.1. By t h e p r e c e d i n g a r g u m e n t f o r a l l

<

^EZo, w h e r e p(Zo)

=

1,

x

* =

lim x u ( < ) v + -

and from P r o p o s i t i o n 3.3, i t t h e n a l s o follows t h a t lim (inf Evf (., <))

=

inf Ef

=

~f (x*)

v --r

f o r a l l

<

^E^ZO.n

I t should b e noted t h a t c o n t r a r y to e a r l i e r work

-

see Wald (1940), H u b e r (1967)

-

w e d o not assume t h e uniqueness of t h e optimal solutions, at l e a s t in t h e c a s e of t h e s t o c h a s t i c programming model, i n t r o d u c e d in s e c t i o n 2, t h i s would n o t b e a n a t u r a l assumption. Also, l e t us o b s e r v e t h a t w e h a v e not given h e r e t h e most g e n e r a l possible v e r s i o n of t h e Consistency Theorem t h a t could b e o b t a i n e d by relying on t h e tools i n t r o d u c e d h e r e . T h e r e are conditions t h a t are n e c e s s a r y a n d s u f f i c i e n t f o r t h e c o n v e r g e n c e of infima

-

see S a l i n e t t i a n d Wets (1986), Robinson (1985)

-

t h a t could b e used h e r e in conjunction with c o n v e r g e n c e r e s u l t s f o r m e a s u r a b l e s e l e c t i o n s (Salinetti a n d Wets (1981)) t o yield a slightly s h a r p e r t h e o r e m , b u t t h e conditions would b e much h a r d e r t o v e r i f y , a n d would b e of v e r y limited i n t e r e s t in t h i s c o n t e x t . Also, s i n c e epi-convergence i s of l o c a l c h a r a c t e r , w e could r e w a r d o u r s t a t e m e n t s to o b t a i n "local" c o n s i s t e n c y by r e s t r i c t i n g o u r at- t e n t i o n to a neighborhood of some x* in a r g m i n Ef.

W e conclude by a n e x i s t e n c e r e s u l t . A function h :

R n

-+

R

i s inf-compact if f o r a l l a E

R

l e v a h :

=

Ix E R n l h ( x ) 5 a { i s compact

.

If h i s p r o p e r (h

> -

^{w ,}dom h # 0 ) a n d inf-compact, t h e n (inf h ) i s f i n i t e and at- t a i n e d f o r some x ^ER". F o r example, if h

=

g

+

qs, w h e r e g i s continuous and qs i s

(28)

t h e i n d i c a t o r f u n c t i o n of t h e nonempty c o m p a c t set S(.ks(x)

=

0 if x E S, a n d .o o t h - e r w i s e ) , t h e n h i s inf-compact. A n o t h e r s u f f i c i e n t c o n d i t i o n i s to h a v e g c o e r c i v e . Inf-compactness i s t h e most g e n e r a l c o n d i t i o n t h a t i s v e r i f i a b l e u n d e r which e x - i s t e n c e c a n b e e s t a b l i s h e d . T h e n e x t proposi!.ion g e n e r a l i z e s r e s u l t s of Wets (1973) a n d H i r i a r t - U n r u t y (1976). E s s e n t i a l l y , we a s s u m e t h a t f(., #) i s inf-compact with p o s i t i v e p r o b a b i l i t y .

PROPOSITION 3 . 1 0 U n d e r Assumptions 3.4 a n d 3.5, t h e c o n d i t i o n : t h e r e ex- i s t s A E A w i t h P(A)

>

0 ( r e s p . Pv(A)

>

0 ) s u c h t h a t for aLL a ER, t h e set

lev, f

n

( R n x A) i s b o u n d e d

.

Then Ef is inJ-compact ( r e s p . Evf i s p - a . s . inf-compact).

PROOF I t c l e a r l y s u f f i c e s to p r o v e t h e p r o p o s i t i o n f o r P, t h e s a m e a r g u m e n t a p p l i e s f o r all P v p-as.. L e t

7 ( # ) :

=

inf to, inf f (x, #) j

.

x € R n

The f u n c t i o n i s m e a s u r a b l e ( P r o p o s i t i o n 3 . 2 ) a n d P-summable, see (3.12). T h e func- t i o n f', d e f i n e d b y

i s t h e n n o n n e g a t i v e . M o r e o v e r f' 2 f a n d t h u s

Set al :

=

a / P ( A ) a n d l e t A1 b e t h e p r o j e c t i o n o n R n of lev,,fl

n

(Rn x A). Then if x g Al a n d

#

^EA

a n d s i n c e f' i s n o n n e g a t i v e , with

=

E t7(#)

1,

H e n c e l e v -Ef C A l , a bounded s e t . To c o m p l e t e t h e p r o o f i t s u f f i c e s to o b s e r v e a + 7

t h a t f r o m Lemma 3.6 we know t h a t lev,Ef i s c l o s e d s i n c e Ef is l o w e r semicontinu- o u s , a n d t h i s with t h e a b o v e implies t h a t lev, +7Ef i s c o m p a c t f o r a l l a E R.U

Asymptotic Behavior of Statistical Estimators and Optimal Solutions for Stochastic Optimization Problems

W O R K I I G P A P E R

AND

1 lASA

AND

- 4)

y 1 +

- #

- #,

=

...

+

EXAMPLES

. . .

el, . . .

pp

=

. . .

=

z " I

-

f

pi]

f

=

. . . .

=+

5

Pi

.

.

=+

- 5

[

2

PI

.

=

.

=+

=

=

I +

-

I -

=

. . .

=+ =

=

. . .

pi =

pi =

.

.

=

. . .

=

. . .

. . .

. . .

= - CTzl

=

. . . .

7.. = ;;i; Li C;=I

+

si,

zf,o

=

+ zf=l

=

. . .

=

- -- -

I :

--

Itl, . . .

=

. . . .

C C

-

el, ^{. . .}

^pp

^-

^pi]

⁼

^. ^{. .} .

^Pi

^PI

^.

^7.. ⁼ ^;;i; ^Li ^C;=I

^- upr]J

²

< ⁼ It1, t'....