Asymptotic Behavior of Statistical Estimators and of Optimal Solutions of Stochastic Optimization Problems, II

(1)

NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS

ASYbFIVTIC BEHAVIOR OF STATISTICAL ESTIMATORS AND OF OPTIkrlAL SOLUTIONS OF STOCHASTIC

OFTIMEATION PROBLEXS. I1

Jitka DupaEovh Roger J.-B. Wets*

8 ~ u p p o r t e d in p a r t by a g r a n t of t h e National Science Foundation.

Working Papers a r e interim r e p o r t s on work of t h e International Institute f o r Applied Systems Analysis and have r e c e i v e d only limited review. Views or opinions e x p r e s s e d h e r e i n d o not necessarily r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria

(2)

FOREWORD

This p a p e r supplements t h e r e s u l t s of a new s t a t i s t i c a l a p p r o a c h t o t h e problem of incomplete information in stochastic programming. The tools of nondifferen- tiable optimization used h e r e , help t o p r o v e t h e consistency and asymptotic normality of (approximate) optimal solutions without unnatural smoothness assump-

tions. This allows t h e t h e o r y t o t a k e into account t h e p r e s e n c e of contraints.

Alexander B. K u n h a n s k i Chairman System and Decision Sciences Program

(3)

ASYMPTOTIC BEHAVIOR OF STATIETICAL ESI'IMATORS

AND

OF OF'TIMAL SOLUTIONS OF

SlVCHASITC

OPTIMIZATION PROBLEMS. II

J i t k a DupaE o v h

'

a n d Roger J.-B. Wets

athe he ma tical

S t a t i s t i c s , Charles University, P r a g u e 2 ~ a t h e m a t i c s , University of California, Davis

INTRODUCTION

These r e s u l t s complement those of Dupaeoviiand Wets (1986). W e use t h e s a m e notation and identical set-up, t h e r e a d e r is t h u s r e f e r r e d t o t h a t a r t i c l e where h e s h a l l find definitions and t h e consistency results. W e even continue t h e numbering of sections and equations, s o w e start with Section 4.

4 ASYMPTOTICS. CONVERGENCE RATES

In Section 3 of Dupaeovg and Wets (1986) we exhibited sufficient conditions f o r t h e convergence with probability 1 of t h e estimators [ x u : Z ^-4Rn, v

=

1 ,

...

j t o x

*

, t h e optimal solution of t h e limit problem. H e r e w e go one s t e p f u r t h e r and analyze t h e rate of c o n v e r g e n c e in probabilistic terms. The argumentation i s r e l a t - e d t o t h a t of Huber (1967), adapted t o f i t t h e more g e n e r a l c l a s s of problems u n d e r consideration; t h i s w a s a l r e a d y t h e p a t t e r n followed by Solis and Wets (1981), in t h e unconstrained c a s e and by ~ u p a e o v g (1983a, 1983b, 1984) f o r s t o c h a s t i c pro- grams with r e c o u r s e u n d e r s p e c i a l assumptions. W e extend t h e r e s u l t s of H u b e r (1967) in a number of directions: (i) we allow f o r c o n s t r a i n t s , (ii) t h e probability measures converging t o P are not necessarily t h e empirical measures, and (iii) t h e r e a r e no differentiability assumptions on t h e likelihood ( c r i t e r i o n ) function (in t e r m s of Huber's set-up, t h i s would c o r r e s p o n d t o t h e c a s e when his function (k i s not uniquely determined, s e e Section 3 of Huber (1967)).

(4)

One way t o look at t h e r e s u l t s of this section i s t o view them as providing limiting conditions under which one may be able t o obtain asymptotic normality. Note t h a t when t h e r e a r e c o n s t r a i n t s , one should usually not e x p e c t t h e asymptotic distribution t o b e Gaussian. This, in t u r n , allows us t o obtain c e r t a i n probabilistic estimates f o r t h e convergence "rates". To approximate t h e distribution of x u , t o obtain confidence intervals f o r example, we need a n a s s e r t i o n t h a t a suitably normal- ized sequence converges in distribution t o a n o n d e g e n e r a t e random vector. The normalizing coefficients need not b e unique but they suggest a r a t e of convergence. Following Lehmann (1983) we shall s a y t h a t t h e sequence x u

-

^x*^{goes to}⁰

w i t h t h e r a t e of c o n v e r g e n c e l / k , if k, --,

-

^{as v}^--, and if t h e r e is a continuous distribution function H such t h a t

We begin by a quick review of t h e main definitions and r e s u l t s t h a t provides us with a good notion f o r t h e subgradients of not necessarily differentiable functions.

Any assumption of differentiability of f(.,

t ) ,

would b e i n a p p r o p r i a t e and would f o r one r e a s o n o r a n o t h e r eliminate from t h e domain of applicability all t h e examples mentioned in Section 2. To handle t h e lack of differentiability, we r e l y on t h e t h e o r y of subdifferentiability developed .to handle nonsmooth func Lions, s e e Clarke (1983), Rockafellar (1983), Aubin and Ekeland (1984).

The c o n t i n g e n t d e r i v a t i v e of a lower semicontinuous function h : R n --, (- =,

+

=] at x , a point at which h i s finite, with r e s p e c t t o t h e direction y is

h'(x; y) :

=

e p i -1im inf h(x

+

ty)

-

^h(x)

L A O t

using t h e convention

-

=

=. I t i s not difficult t o s e e t h a t h' i s always well defined with values in t h e extended r e a l s . If x fE dom h, t h e n h'(x; - )

=

^w,otherwise

hl(x; y)

=

lim inf h(x

+

ty')

-

^h(x)

"2"

^t

The ( 2 ~ p p e r ) e p i d e r i v a t i v e of h

at

x , where h i s finite, in d i r e c t i o n y, i s t h e epi-limit s u p e r i o r of t h e collection th'(x1; .), x' E R n j at x , i.e.

hT(x; .) :

=

epi-lim s u p h1(x'; - ) x' 4 X

hT(x; y)

=

inf ,,

+,

lim sup hl(x'; y') IYt +Yl

(5)

where by writing f x ' -, x ] and fy' -, y ] we mean t h a t t h e infimum must b e t a k e n with r e s p e c t t o a l l n e t s

-

o r equivalently h e r e sequences

-

converging t o x and y, s e e Aubin and Ekeland (1984), Chapter 7 , Section 3.

I t i s r e m a r k a b l e t h a t if h i s p r o p e r , and x E dom h , t h e function y k h t ( x ; ^{a )} i s s u b l i n e a r and l.sc. [Theorems 1 and 2, Rockafellar (1980)l. Moreover, if h i s Lipschitzian around x , t h e n h t ( x ; -) i s e v e r y w h e r e finite (and hence continuous); in p a r t i c u l a r if h i s continuously differentiable at x t h e n ht(x; y ) is t h e directional d e r i v a t i v e of h in d i r e c t i o n y , and if h i s convex in a neighborhood of x , t h e n

ht(x; y )

=

lim h (x

+

t y )

-

^h(x)

t r o t

i s t h e one-sided directional d e r i v a t i v e in direction y. The sublinearity and lower semicontinuity of hT(x; .) makes i t possible t o define t h e notion of a subgradient of h at x , by exploiting t h e f a c t t h a t t h e r e i s a one-to-one c o r r e s p o n d e n c e between t h e p r o p e r lower semicontinuous, sublinear functions g a n d t h e nonempty closed convex s u b s e t s C of Rn, given by

g(y) = s ~ p , , ~ v - y f o r a l l y E R " , and

c =

f v E ~ " l v - ~ 5 g(y) f o r all y E R"]

s e e Rockafellar (1970). Assuming t h a t h t ( x ; .) is p r o p e r , l e t ah(x) b e t h e nonempty closed convex set s u c h t h a t f o r a l l y,

E v e r y v e c t o r in v E ah(x) i s a subgradient of h at x. If h i s smooth (continuously differentiable) t h e n

Bh(x)

=

fVh(x), t h e g r a d i e n t of h a t x j ; if h i s convex, t h e n

i s t h e usual definition of t h e s u b g r a d i e n t s of a convex function. More g e n e r a l l y if h i s locally Lipschitz at x , t h e n

ah (x)

=

c o f v

=

lim v h ( x f ) lh i s smooth at x' ]

.

X' + X

(6)

For t h e proofs of t h e s e preceding a s s e r t i o n s and f u r t h e r details, consult Rockafel- l a r (1981) and Aubin and Ekeland (1984).

Before we r e t u r n t o t h e problem a t hand, we s t a t e t h e r e s u l t s about t h e additivity of subgradients t h a t a r e r e l e v a n t t o o u r analysis, we begin with a g e n e r a l r e s u l t t h a t shows t h a t t h e derivatives and subgradient functions of t h e random l.sc.

function f and t h e expectation functionals EVf and Ef have t h e a p p r o p r i a t e measurability p r o p e r t i e s .

THEOREM 4.1 S u p p o s e h : Rn X Z --,

R

i s a r a n d o m l o w e r s e m i c o n t i n u o u s f u n c - t i o n . Then, so a r e i t s c o n t i n g e n t d e r i v a t i v e a n d i t s ( u p p e r ) e p i - d e r i v a t i v e . Moreover, for a l l x ^ERn,

t

^I+ 8h(x, t ) i s a r a n d o m closed c o n v e z s e t .

PROOF Theorem of Salinetti and Wets (1981) tells us t h a t t h e lirn s u p and lirn inf of sequences of random closed sets (closed-valued measurable multifunctions) a r e random closed s e t s . Since t h e e p i g r a p h s of t h e epi-lim s u p and epi-lim inf are respectively t h e lirn inf and lirn s u p of t h e corresponding sequence of e p i g r a p h s (see f o r example, Section 2 of Dolecki, Salinetti and Wets (1983)), t h e a s s e r t i o n about t h e derivatives follows from t h e i r definitions and p r o p e r t y (3.4) of random lower semicontinuous functions. Since hT(x; -, t ) i s sublinear, i t follows t h a t i t s conjugate

-

a n o t h e r random l.sc. function, Rockafellar (1976)

-

i s t h e indicator of t h e random closed convex set

t

^I+ 8f (x, t ) .

Our i n t e r e s t in subdifferential t h e o r y i s conditioned by t h e f a c t t h a t f o r a v e r y l a r g e c l a s s of functions (with values in t h e extended r e a l s ) , we c a n c h a r a c - t e r i z e optimality in t e r m s of a differential inclusion, a point x0 t h a t minimizes t h e p r o p e r l.sc. function on R", must necessarily satisfy

if h i s convex t h i s i s a l s o a sufficient condition. T h e r e i s a subdifferential calculus, but f o r o u r p u r p o s e s t h e following r e s u l t s about t h e subdifferentials on sums of l.sc. functions i s a l l we need. We s a y t h a t a l.sc. function is s u b d ~ e r e n t i a l l y r e g u l a r a t x if h'(x; .)

=

ht(x; -). If h i s convex o r subsmooth on a neighborhood of x , t h u s in p a r t i c u l a r if h i s C1 at x , i t i s subdifferentially r e g u l a r at x ; h i s s u b s m o o t h on a neighborhood V of x , if f o r all y E V

where T i s a compact topological s p a c e , e a c h pt i s of class C1, and both pt(x) and V,pt(x) a r e continuous with r e s p e c t t o (t, x). If h is subsmooth on a n open s e t U , i t

(7)

i s a l s o locally Lipschitz on U , Clarke (1975).

LEMMA 4.2 Rockafellar (1979) Suppose hl and h2 are l.sc. jknctions o n R%nd x a point at which both hl and h2 are finite. Suppose that dom hl(x; .) i s nonempty and h 2 i s locally Lipschitz at x. Then

Moreover equality holds i f h l and h2 are subdifferentially regular at x.

LEMMA 4.3 Clarke (1983) Let U be a n open subset of Rn, and suppose h : U ^XE ^-4R i s measurable w i t h respect to

<

and there exist a summablejknc- tion @ such that for alt'xO, x i in U and

t

^EE

Suppose moreover that for some

x

^E^{U ,}Eh(x) i s finite. Then Eh i s finite and Lipschitz o n U , a n d f o r all x i n U ,

Moreover, equality holds whenever h(. , t ) i s a.s. subdiflerentially regular at x, in which case also Eh i s s u b d ~ e r e n t i a l l y regular at x.

Theorem 4.1 shows t h a t

t k

8h(x, t ) is a random (nonempty) closed s e t ; i t i s e a s y t o v e r i f y t h a t u n d e r t h e assumptions of Lemma 4.3, h is a random 1-sc. function on U ^XE. In f a c t f o r a l l

t,

ah(x, t ) i s a compact s u b s e t of R n , see Proposition 2.1.2 of Clarke (1983). The i n t e g r a l of a random closed s e t

r

defined on E (with values in t h e closed s u b s e t s of Rn) i s

see Aumann (1965). If P is absolutely continuous, and

r

is i n t e g r a b l y bounded ( t h e function

<

^k ^{s u p} ^~Ilx

¹¹ ^I

^IlxlI^E

^r

^{( t )}^j ^{i s} summable), t h e n

f r (8

P ( d t )

= f

c o

r

( t ) P(d<) i s convex, where c o T. ( t ) is t h e convex hull of t . If

r

i s uniformly bounded t h e n

f

r ( t ) P ( d t ) i s a compact s u b s e t of Rn.

W e s h a l l b e working with t h e same set-up a s in Section 3 , but with a somewhat more r e s t r i c t e d c l a s s of random l.sc. functions. Instead of Assumption 3.4, w e s h a l l b e using t h e following one:

(8)

ASSUMPTION 4.4 The f u n c t i o n f : Rn x E --, (- ^a,^a]i s of t h e following t y p e :

w h e r e (k, i s t h e i n d i c a t o r f u n c t i o n o f t h e closed n o n e m p t y set S C R", i.e., (k,(x)

=

⁰ ^.igx E S , a n d

=

o t h e r w i s e ,

a n d fo i s a f i n i t e v a l u e d f u n c t i o n o n Rn X E, w i t h CH f0(x, C) r e l a t i v e l y c o n t i n u o u s o n Z ,

for a l l x E S , a n d a n y o p e n set U t h a t c o n t a i n s S, t h e f u n c t i o n x --, fo(x, C) i s l o c a l l y L i p s c h i t z

for a l l ^EE, a n d s u c h

that

to a n y b o u n d e d o p e n set V t h e r e c o r r e s p o n d s a P- s u m m a b l e f u n c t i o n s u c h t h a t for a n y p a i r xO, xi in V:

The only condition of Assumption 3 . 4 t h a t does not a p p e a r explicitly in As- sumption 4.4, e i t h e r in e x a c t l y t h e same form o r in a s t r o n g e r form, i s t h e lower semicontinuity of f ( - , C) on Rn f o r a l l [ in Z. But t h a t i s a n immediate consequence of t h e f a c t t h a t f o ( - , C) i s locally Lipschitz and S i s closed. Thus, f i s a p r o p e r random lower semicontinuous function, and s o i s a l s o fo. Moreover all t h e r e s u l t s and t h e observations of Section 3 a r e immediately applicable t o both f and fo, as well as t o t h e corresponding expectation functionals. Of c o u r s e t h e s e functions will now have Lipschitz p r o p e r t i e s t h a t w e shall exploit in o u r analysis. In t h e convex c a s e i t might be possible t o work with weaker r e s t r i c t i o n s on t h e function f by relying on f i n e r r e s u l t s about t h e additivity of subgradients, see Rockafellar and Wets (1982). Combining t h e r e s u l t s of Section 3 , with those a b o u t subgradients of random l.sc. functions, in p a r t i c u l a r Lemma 4.3, we c a n show that:

LEMMA 4.5 U n d e r A s s u m p t i o n s 4.4 a n d 3.5, w e h a v e

that

p-a.s. Ef a n d [EVf, v

=

1 ,

...

j a r e p r o p e r lower s e m i c o n t i n u o u s f u n c t i o n s

that

a r e l o c a l l y L i p s c h i t z o n S. Moreover w e a l w a y s h a v e

(9)

with equality iJfor all t , fo(., C) i s s u b d ~ e r e n t i a l l y regular at x. Moreover, if X E S

with equality

V

igs and for all t , fo(-, t ) are subdUYerentially regular at x . REMARK 4.6 If x E S , aq,(x) i s t h e p o l a r of t h e tangent cone T,(x) t o S a t x.

Clarke (1975). If S i s a d i f f e r e n t i a b l e manifold, t h e n aQ,(x) i s t h e orthogonal complement of t h e tangent s p a c e at x and, of c o u r s e , (k, i s differentially r e g u l a r a t x.

This i s a l s o t h e case when S i s locally convex at x , or if x belongs t o t h e boundary of S and t h i s boundary i s locally a differentiable manifold. More generally, qs i s subdifferentially r e g u l a r at x , if t h e tangent cone t o S a t x , h a s t h e following r e p r e s e n t a t i o n

T,(x)

= ly13

^hk^&^0,^yk⁺^y

.

^with x

+

hkyk E S j

S o f a r , w e have limited o u r assumptions t o c e r t a i n continuity p r o p e r t i e s of t h e function f with r e s p e c t to x and

t.

In o r d e r t o d e r i v e t h e asymptotic b e h a v i o r w e need t o impose some additional conditions a b o u t t h e way t h e information collected from t h e samples i s included in t h e approximating probability measures P V , in p a r t i c u l a r o n how i t a f f e c t s t h e s u b g r a d i e n t s of t h e functions Evf. Let us introduce t h e following notation: uo(x, t ) will always denote a n element of Bfo(x, t ) and v,(x) a n element of B9,(x). In view of Theorem 4 . 1 and Lemma 4.5 if x E S , we always have t h a t v(x) E aEf(x) implies t h e existence of v,(x) E 8q,(x) and uO(x, .) measur- a b l e with uo(x, t ) E af0(x, C) P-a.s. s u c h t h a t

Moreover similar formulas hold p-a.s. if t h e integration i s with r e s p e c t t o P v ( . , <) instead of P . If t h e functions f o ( - , t ) , as w e l l as Q,, are a s . subdifferentially regu- l a r , t h e n a t y p e of c o n v e r s e statement a l s o holds. W e h a v e t h a t

(10)

* * *

implies t h e e x i s t e n c e of v, E aqs(x ) and of a random function uo(x , - ) from E t o R n with uo(x*, -) E ~f o(x*, t ) P - a . s . such t h a t

Similarly,

means t h a t t h e r e e x i s t v,(xV) E aqs(xV), and a random function u,(xV, ^{a )} from E t o Rn with uo(xV, ^{a )} E 8fO(xV, ') P V - a . s . such t h a t

ASSUMPTION 4 . 7 S t a t i s t i c a l Information. The p r o b a b i l i t y m e a s u r e s IP", Y

=

1,

...

j a r e s u c h t h a t for some v V E a ~ " f (x*, () a n d v E aEf (xu(())

(i) 6 [ v v ( x * , ()

+

v(xV(())] c o n v e r g e s t o 0 i n p r o b a b i l i t y ; (ii) 6 [ v S ( x " ( 0 )

-

vs(x )] c o n v e r g e s t o 0 i n p r o b a b i l i t y ;

*

(iii) vv(x*, () i s a s y m p t o t i c a l l y G a u s s i a n w i t h d i s t r i b u t i o n f u n c t i o n N(0,

q)

w h e r e

C1

i s t h e c o v a r i a n c e m a t r i x . Moreover

(iv) Efo i s t w i c e c o n t i n u o u s l y d w e r e n t i a b l e a t x

*

with n o n s i n g u l a r H e s s i a n H .

Before we p r o c e e d with t h e main r e s u l t of t h i s section, l e t us examine some of 'the implications of t h e s e assumptions. The assumption t h a t Efo i s of c l a s s

c2

is of c o u r s e r a t h e r r e s t r i c t i v e , b u t without i t i t maybe h a r d t o obtain asymptotic normality; a more g e n e r a l c l a s s of limiting distributions (piecewise normal) f o r con- s t r a i n e d problems h a s r e c e n t l y been identified by King and Rockafellar (1986).

Note t h a t t h i s does not r e q u i r e t h a t fo b e of c l a s s

c2.

The assumption t h a t 6 [ v S ( x v ( ( ) )

-

vS(xL)] converges in probability t o 0 , essentially means t h a t t h e convergence of x V to x* i s "smooth". Of c o u r s e , i t will b e satisfied if x* belongs t o t h e i n t e r i o r of t h e set S of c o n s t r a i n t s , in which c a s e V,(X

*

) and p - a . s . vs(xV(()) a r e z e r o f o r Y sufficiently l a r g e . I t will also be trivially satisfied if t h e binding c o n s t r a i n t s are l i n e a r and, x* and p-a.s. xV((), belong t o

(11)

t h e l i n e a r v a r i e t y spanned by t h e s e c o n s t r a i n t s . In f a c t , w e c a n e x p e c t this condition t o be satisfied unless t h e v e c t o r x* i s a boundary point at which t h e boundary h a s high c u r v a t u r e , in p a r t i c u l a r at point at which t h e boundry i s not smooth.

The condition a b o u t asymptotic normality of t h e s u b g r a d i e n t s vv(x*) i s b e s t understood in t h e following c o n t e x t . Suppose condition (ii) i s satisfied, in f a c t l e t us assume t h a t vs(x

*

)

=

vs(xV(<)) a s . And suppose a l s o t h a t P V i s t h e empirical distribution. Then Ilvv(x*, ()I1 r e c o r d s t h e e r r o r of t h e estimate of t h e s u b g r a d i e n t s of Ef at x*; note t h a t 0 E a ~ f ( x * ) .

The f i r s t condition yields a n estimate f o r t h e e r r o r s of t h e s u b g r a d i e n t s of EVf at x * a n d Ef at xV(<). The assumption is t h a t enough information i s collected s o as t o g u a r a n t e e a c e r t a i n c o n v e r g e n c e r a t e t o 0 . This is a c r u c i a l assumption and a f t e r t h e statement of t h e theorem will r e t u r n t o t h i s condition and give sufficient conditions t h a t imply i t .

THEOREM 4 . 8 U n d e r A s s u m p t i o n s 4 . 4 , 3.5 a n d 4 . 7 , 6 ( x V ( - )

-

^x*)is a s y m p t o t i - c a l l y n o r m a l w i t h d i s t r i b u t i o n N(0, C) w h e r e C

=

H-l C 1 ( ~ - l ) = .

PROOF Since Efo is assumed t o b e

c2,

and xu(.) c o n v e r g e s t o x*, f o r v sufficiently l a r g e ,

Now, s i n c e v(x

*

)

=

0 ,

+

f i [ ~ , ( X * )

-

^vs(xV)]

By Assumption 4 . 7 t h e f i r s t t e r m c o n v e r g e s t o z e r o in probability, t h e second one c o n v e r g e s in distribution t o N(0, C1) and t h e t h i r d one c o n v e r g e s in probability t o z e r o . Hence d v [ v ~ f ~ ( x ~ )

-

VEfo(x )] converges in distribution t o N(0, C1)

*

(Slutsky's Theorem). This is t h e n a l s o t h e asymptotic distribution of f i ~ ( x '

-

x*). The r e s u l t now follows by t h e nonsingularity of t h e matrix H. D

The remainder of t h i s s e c t i o n , i s devoted t o r e c o r d i n g c e r t a i n conditions t h a t will yield condition (i) of Assumption 4 . 7 . In view of Markov's inequality i t would suffice t o c o n t r o l t h e v a r i a n c e of I1vv(x*)

+

v(x")II t o obtain t h e d e s i r e d convergence. More generally w e h a v e t h e following:

(12)

LEMMA 4.9 S u p p o s e that E , , ~ " ( x * . <)

=

0, t h a t

~ , , l l b { ( x * ,

-

vo(x*)l12j ^S

e2/

v a n d that

I L V ( ~

*

' + v ( x v ( 0 ) " c o n v e r g e s t o 0 in p r o b a b i l i t y ( p )

.

v - l /

+

Ilv(x ~ ( < ) ) l l

Then, u n d e r Assumptions 4 . 4 a n d 3.5, f o r a n y ( m e a s u r a b l e ) s e l e c t i o n s v v ( x * , - ) with

s u c h that p-a.s. v ( x )

* =

0, t h e r a n d o m v e c t o r

m v v ( x * ,

<I

+ v ( x V ( < > > l

c o n v e r g e s t o 0 in p r o b a b i l i t y as v g o e s t o

=.

PROOF W e need t o show t h a t t o a n y c

>

^{0 ,}t h e r e c o r r e s p o n d s v , s u c h t h a t f o r a l l v 2 v,,

where 6, g o e s t o z e r o as c g o e s t o z e r o .

Chebychev's inequality a n d t h e assumptions of t h e Theorem imply t h a t f o r a l l a,

And h e n c e with a2

=

2 e 2 / c, w e h a v e

This, in conjunction with t h e last o n e of o u r assumptions, i.e.,

implies t h a t t h e e v e n t s

Ilvv(x*)

+

v ( x " ) l \

<

E ( V - ~ / ~

+

IIv(x")II) and IlvV(x*)ll 5 v - 1 / 2 e m , h a v e p r o b a b i l i t y ( p ) at l e a s t 1

-

c. Thus f o r c small,

(13)

since IIv(x")II 5 IIvV(x*)

+

v(x " ) I 1

+

Ilvy(x*)\I. This, t o g e t h e r with (4.4), gives

and t h i s yields t h e d e s i r e d expression with

6, =

~ ( 1

+

(B

+

E ) / ( 1

-

^E)).

I t is e a s y t o see why t h e condition EpJvV(x*. <)I

=

0 would be satisfied when t h e P V are providing moment estimates t h a t are a t l e a s t as good as t h e empirical distributions. The same holds f o r t h e second assumption in Lemma 4.9, t h e r e is a reduction in t h e v a r i a n c e estimate t h a t is a t l e a s t as significant as t h a t which would b e a t t a i n e d by using t h e empirical distribution. Finally, t h e l a s t assumption of Lemma 4.9 means t h a t w e can allow f o r a c e r t a i n slack in t h e convergence in probability of f i I l v V ( x * )

+

v(x ")

11

t o z e r o . In t h e Appendix, w e give a derivation of t h i s condition by using assumptions t h a t are r e l a t e d t o those used by Huber (1967).

5 ASYMPTOTIC LAGRANGIANS

The r e s u l t s of Sections 3 and 4 c a n b e extended t o Lagrangians by relying on t h e t h e o r y of epi/hypo-convergence f o r saddle functions, Attouch and Wets (1983a). This gives u s n o t just asymptotic p r o p e r t i e s f o r t h e sequence

!xu, v

=

I,

...

of optimal solutions but also f o r t h e associated Lagrange multi- pliers.

W e now i n t r o d u c e a n explicit r e p r e s e n t a t i o n of t h e c o n s t r a i n t s in t h e formula- tion of t h e problem:

minimize z

=

E Jf ,(x, $)

1

(5.1)

s u b j e c t t o f i ( x ) S O , i = l , .

. .

, s ,

where f o r i = I ,

. . .

, m, t h e f l are finite-valued continuous functions, f o i s a finite-valued random l.sc. function, and X i s a closed s u b s e t of R n . When instead of P , w e use P V then t h e objective function i s modified and becomes

(14)

The (standard) associated Lagrangians are

and L(x, y)

=

Efo(x)

+

C f " = l ~ i f i ( x ) if x ^EX , and yi 2 0 , f o r i

=

1 ,

. . .

^,s ,

m if x $? X ,

-m otherwise

.

Consistency c a n be studied in t h e same framework as t h a t d e s c r i b e d at t h e beginning of Section 3. The Lagrangians L V a r e t h e n a l s o dependent of {. Suppose t h a t fo satisfies t h e conditions of Assumption 3.4; Note t h a t some of t h e s e conditions are automatically satisfied since f o is a finite-valued random l.sc. function.

Suppose a l s o t h a t t h e [ P V , v

=

1,

...

j s a t i s f y Assumption 3.5 with f o replacing f (in t h e asymptotic negligibility condition), t h e n i t follows from Lemma 3.6 t h a t p-a.s.

t h e Lagrangians L are finite-valued random l.sc. functions on (Rn x (R x Rm -')) x Z; on t h e complement all functions L v are

-

^.a.This i s all w e need t o g u a r a n t e e t h e r e q u i r e d measurability p r o p e r t i e s , in p a r t i c u l a r w e have t h a t

L'(x, y)

=

((x, y), {)k LV(x, y , {) i s B n + m @ A - m e a s u r a b l e

.

~ ' f o ( x )

+ CKl

~ , f ~ ( x ) if x E X , ^{and yi}²0 , f o r i

=

1,

. . .

, s ,

m if x $? X ,

-m otherwise

.

DEFJNTION 5.1 m e s e q u e n c e of f i n c t i o n s [h :

R n

x Rm --, [- -,

-1,

v

=

1 , .

. .

j e p i / h y p o - c o n v e r g e s t o h : R n x Rm --, [- m,

-1

i f f o r all (x, y) w e h a v e

(i) f o r e v e r y s u b s e q u e n c e [h k

=

1 , .

. .

j a n d s e q u e n c e ixk j ~ c o n v e r g i n g t o , ~ x, t h e r e e x i s t s a s e q u e n c e

lYk]c=l

c o n v e r g i n g t o y s u c h that

h(x, y) 5 lim i n f h Y ( x k , yk) ,

k - + -

a n d

(ii) f o r e v e r y s u b s e q u e n c e {h ^vk,k

=

1 , .

..

] a n d s e q u e n c e

iyk jrZl

c o n v e r g i n g t o y, t h e r e e x i s t s a s e q u e n c e

[ z ~ ] < = ~

c o n v e r g i n g t o x s u c h

that

(15)

This t y p e of convergence of b i v a r i a t e functions w a s introduced by Attouch and Wets (1983a) in o r d e r t o study t h e convergence of saddle points; in Attouch and Wets (1983b) i t i s a r g u e d t h a t i t actually is t h e weakest t y p e of convergence t h a t will g u a r a n t e e t h e c o n v e r g e n c e of saddle points.

THEOREM 5.2 Consistency. From Assumptions 3.4 a n d 3.5, w i t h f replaced b y f @

i t

follows t h a t t h e r e e z i s t s Zo E F w i t h p(Z\Zo)

=

0 s u c h t h a t L

=

epi/hypo

-

^{lim L} ^p-a.s.

v

--

a n d hence:

(i) for all ( E ZO, a n y c l u s t e r pont

(x^,

9) of a n y sequence I(x ', y v), v

=

I , . . j, w i t h (xu, yV) a saddle p o i n t of Lv(. , ., (), i s a saddle p o i n t of L;

(ii) i f D i s a compact subset o f R n x Rm t h a t m e e t s f o r all v, or a t least for some subsequence, t h e set of saddle p o i n t s of L'(-,

-, 0

for some ( E Zo, t h e n t h e r e e z i s t (xu, yv) s a d d l e p o i n t s of Lv(-;, ( ) f o r v

=

1 ,

... that

h a v e

at

least one c l u s t e r p o i n t ;

(iii) moreover, i f t h e preceding c o n d i t i o n i s s a t i s f i e d for all ( E Z@ a n d L h a s a u n i q u e s a d d l e p o i n t , t h e n t h e r e e z i s t s a sequence

of Fv- measurable f+unctions t h a t for a l l ( E ZO, determine saddle p o i n t of t h e LV, a n d converge to t h e saddle p o i n t of L.

W e note t h a t sufficient condition f o r t h e existence of saddle points a r e pro- vided by t h e condition introduced in Proposition 3.10 (with f t h e essential objective function of problem (5 . I ) ) , in conjunction with t h e Mangasarian-Fromovitz con- s t r a i n t qualification.

ASYMPTOTIC NORMALITY 5.3 The techniques of Section 4 can also b e used to ob- t a i n asymptotic normality r e s u l t s . However, t h e r e i s not y e t a good concept of subdifferentiability for b i v a r i a t e functions, e x c e p t in t h e convex c a s e (Rockafellar (1964)), and in t h e differentiable case, of c o u r s e . With aL(aLV r e s p . ) t h e set of s u b g r a d i e n t s of t h e Lagrangians in t h e convex o r differentiable c a s e , t h e condition t h a t (x*, y*) i s a saddle point of L c a n b e e x p r e s s e d a s

(16)

and 0 E aLV(xV, y V , <) in t h e c a s e of LV. For example, in t h e convex c a s e when all t h e functions [f,, i

=

0 , l ,

. . .

^,m l are differentiable and X

=

R n , t h i s condition i s equivalent to:

and similarly f o r LV.

I t i s e a s y t o see t h a t when Assumptions 4.4 and 3.5 hold (with f o instead of f ) , as well as Assumption 4.7, b u t t h i s time with u V and v s u b g r a d i e n t s of LV and L r e s p e c t i v e l y , and S

=

X ^X( R s ^XRm-'), then by t h e same argument as in t h e proof of Theorem 4.8, w e obtain:

6 ( x V ( . )

-

^x

*

^,yV(.) -y*) i s asymptoticaly normal

.

F o r a n application t o t h e above r e s u l t s t o t h e case of linearly r e s t r i c t e d L1- r e g r e s s i o n (2.3) see DupaEov& (1987).

APPENDIX

W e s h a l l show t h a t t h e assumption

*

"vv(x ) ⁺v(x'" c o n v e r g e s in probability t o 0 , v - ~ ' ~

+

llu(x")ll

of Lemma 4.9 follows from a s e r i e s of sufficient conditions similar t o those of Huber (1967) by a slight modification of t h e paving technique of t h e s a m e p a p e r . The main difference i s due t o t h e f a c t t h a t t h e probability measures P V ( . , <) are not necessarily t h e empirical ones s o t h a t t h e expectation EpEVf(x, <)

=

j ~ ' f ( x . ,u(d<) need not b e equal t o Ef(x), e t c . and t h a t subgradients a r e used in- z

s t e a d of g r a d i e n t s .

(17)

ASSUMPTION A . l There i s do

>

0, a

>

0 such that for all x E ~ ( x * )

=

[x : llx

-

^x811

<

do] a n d f o r a n arbitrary v(x) ^EaEf(x) Ilv(x)

-

v(x8>Il 2 allx

-

^x*ll

.

ASSUMPTION A.2 For a n y measurable selection u,(x;) such that uo(x, t ) E afo(x, t ) P-a.s. denote

and assume

( i ) for all ⁰

<

^d^S^{do, x}^E^N(x*)there i s M1

>

0 such that both

and

E,EV{G(x,

t ,

^{d )}^j⁵^Mld

(ii) for all 0

<

^d

^s

^do,^x^E~ ( x * ) there i s

M2 >

⁰ândîjiÊ^{(I/ 2 ,}

1 3

^{such that}

var,EvfG(x. t , d ) { r M.$v-'

.

ASSUMPTION A.3 For all x E ~ ( x * ) , for a n y measurable selection v t ( x ) E aEvfo(x) w i t h v[(x*) E 8 ~ ' f ~ ( x * ) p-a.s. and for a n y vo(x) E 8Efo(x) w i t h vo(x*) E aEVfO(x*) there i s M 2

>

0 and a €(I/ 2 ,

11

such that

LEMMA A.4 Under Assumptions A . l , A.2, A.3

in p- probability a s v 4

=.

PROOF P u t ZV(x, x')

=

I l v ~ ( x ' )

-

^vV(x)

-

v ( x f )

+

^v(x)

I1

v-1'211v(x')

-

^v(x)ll

(18)

Using (4.2) and (4.3), we c a n write

Ilv,!Jx')

-

^V{(X)

-

v o ( x f )

+

^v0(x)II

ZV(x, x')

=

v - l I 2

+

^IIv(x')

-

^v(x)ll

and

I \

according t o Chebychev inequality and Assumption A.3. This estimate, however, does not yield t h e a s s e r t i o n of t h e Lemma.

A s in Huber (1967) we c o v e r ~ ( x * ) by shrinking neighborhoods whose size de- c r e a s e s and whose number does not i n c r e a s e t o rapidly as v ^--,

-.

Let y be such t h a t

-

1

<

^y

<

^min^(a,

a).

P u t Ndo

=

~ ( x * ) and denote by 2

By t h e same argument a s above

The a r e a NdO \ Ndov7 will b e c o v e r e d by finitely many nonoverlapping "borders" of t h e form

where

and f o r e a c h v , 6 i s fixed in such a way t h a t

with Mo r 2 a n i n t e g e r t o be defined l a t e r . A s a r e s u l t ,

(19)

log Mo

-

^{log (Mo}

-

^{1 )}

6 =

log v

To simplify t h e notation w e s h a l l put

A s t h e n e x t s t e p , we s h a l l c o v e r e a c h of "borders" N(k) by nonoverlapping neighborhoods of a n equal volume with c e n t e r s x' s u c h t h a t

and diameters

2d0)

=

dk

-

dk ⁺1

=

d,1~-"~[1

-

v - ~ ]

.

T h e i r number will not e x c e e d

Using (A.3), w e h a v e

-

1 ⁵^IJ-

2 and ~ I J 2 - 1 ~

+

^{I J - ~}²^{I J - ~}

.

_('4.4)

Let N b e any of t h e neighborhoods of t h e covering N(k), i.e., N

=

[x : llx

-

^x'

¹¹

^Sdg)]. We have according t o Assumption A.l

Ilvo"(x)

-

^v{(x

*

⁾

-

^v0(x)

+

^v0(x8)II

s u p Z(x, X*) 3 s u p

X E N x E N 1 ~ - 1 / 2 + a d O ~ - ( k + l ) 6

Using Assumption A.3, Chebychev inequality and (A.4)

Similarly, according t o Assumption A.2 (ii) and Chebychev inequality

(20)

where

7

=

~ a d ~ v - ( ' " ) ~

-

E f 3 ( x f .

1.

d ) ]

-

E,Evfii(xl,

t ,

^{d ) ]}

M

1

according to Assumption A.2 (i), (A.3) and (A.4). F o r Mo

> -

t h e lower bound in 2 ~ a

(A.6) is nontrivial and w e have t h a t

Finally, according t o (A.1), (A.5) and (A.7)

p f ( : s u p ZV(x, X*) 5 2 c j 5 p f ( : s u p ZV(x, x*) 5

& I

x ^EN(x') x ENdlS,

K, -1

+ z

^{p f ( :} ^{s u p} ^ZV(x,^x*)⁵^{E ]} ^M

k = O x E N cNk)

(21)

In addition, f o r v l a r g e enough, 1

-

K v b

-

^a

<

0 and K v 6

-

^a

<

0 , K v b

- -

^a

<

0 due t o o u r choice of 7 and (A.2).

S u m m a r i z i n g : f o r a n a r b i t r a r y E

>

^{0 , 1/ 2}

<

7

<

^min^(a,

a)

i t i s possible t o bound t h e probability

from above by a n e x p r e s s i o n which c o n v e r g e s t o z e r o as v --, =. ^D

REFERENCES

Attouch, H. and R. Wets (1983a): A convergence t h e o r y f o r saddle functions, P a n - s u c t i o n s Amer. Math. Soc., 280, 1-41.

Attouch, H. and R. Wets (1983b): A convergence f o r b i v a r i a t e functions aimed at t h e c o n v e r g e n c e of saddle values, in Mathematical Theories o f ' @ t i m i z a t i o n , eds. J. Cecconi and T. Zolezzi, S p r i n g e r Verlag L e c t u r e Notes in Mathematics 979, 1-42.

Aubin, J.-P. and I. Ekeland (1984): Applied Nonlinear A n a l y s i s , Wiley Intersci- e n c e , New York.

Aumann, R.J. (1965): I n t e g r a l s of set-valued functions,

J.

Mathematical A n a l y s i s a n d A p p l i c a t i o n s 1 2 , 1-12.

Clarke, R. (1975): Generalized g r a d i e n t s and applications, P a n s a c t i o n American Mathematical S o c i e t y , 205, 247-262.

Clarke, R. (1983): @ t i m i z a t i o n a n d Nonsmooth A n a l y s i s , Wiley Interscience.

N e w York.

Dolecki, S., G. Salinetti and R. Wets (1983): Convergence of functions: equisemicon- tinuity. ZYans. Amer. Math. Soc. 276, 409-429.

DupaEovg, J . (1983a): Stability in s t o c h a s t i c programming with r e c o u r s e . Acta U n i v . Carol.-Math et P h y s . 2 4 , 23-34.

DupaEovg, J . (1983b): The problem of stability in s t o c h a s t i c programming (in Czech). Dissertation f o r t h e Doctor of Sciences d e g r e e , Faculty of Mathemat- i c s and Phyics, Charles University, P r a g u e .

DupaEovA, J . (1984): On asymptotic normality of inequality constrained optimal de- cisions. In: Proc. 3 - r d P r a g u e congerence o n a s y m p t o t i c s t a t i s t i c s , eds. P . Mandl and M. ~ u g k o v d . E l s e v i e r , Amsterdam, p. 249-257.

~ u p a E o v A , J . (1987): Asymptotic p r o p e r t i e s of r e s t r i c t e d L1-estimates. To a p p e a r in P r o c . of t h e 1st International Conference on Statistical Data Analysis based on t h e L1 norm and Related Methods, Neuchatel, 31 August

-

4 September 1987.

(22)

DupaBov6, J. and R. Wets (1986): Asymptotic b e h a v i o r of s t a t i s t i c a l e s t i m a t o r s and of optimal solutions f o r s t o c h a s t i c optimization problems, IIASA WP-86-41, Laxenburg, Austria.

H u b e r , P . (1967): The b e h a v i o r of maximum likelihood estimates u n d e r nonstandard conditions, P r o c . F W h B e r k e l e y a m p . Math. S t a t . Prob. 1, University of Cali- f o r n i a P r e s s , Berkeley, 221-233.

King, A. and R .T. Rockafellar (1986): Non-normal asymptotic b e h a v i o r of solution estimates in linear-quadratic s t o c h a s t i c optimization, Manuscript, Univ. Wash- ington, S e a t t l e .

Lehmann,

E.L.

(1983): T h e o r y o f p o i n t e s t i m a t i o n . Wiley, I n t e r s c i e n c e , N e w York.

Rockafellar, R.T. (1964): Minimax theorems and conjugate saddle functions, Ma- t e m a t i c a S c a n d i n a v i c a , 14, 151-173.

Rockafellar, R.T. (1970): C o n v e z A n a l y s i s , P r i n c e t o n University P r e s s , Prince- ton.

Rockafellar, R.T. (1976): I n t e g r a l functionals, normal i n t e g r a n d s and measurable multifunctions, in N o n l i n e a r @ e r a t o r s a n d t h e C a l c u l u s of V a r i a t i o n s , eds.

J. Gossez and L. Waelbroeck, S p r i n g e r Verlag L e c t u r e Notes in Mathematics 543, Berlin, 157-207.

Rockafellar, R.T. (1979): Directionally Lipschitzian functions and subdifferential calculus, P r o c e e d i n g s L o n d o n M a t h e m a t i c a l S o c i e t y 39, 331-355.

Rockafellar, R.T. (1980): Generalized directional d e r i v a t i v e s and s u b g r a d i e n t s of nonconvex functions, C a n a d i a n

J.

M a t h e m a t i c s 32, 257-280.

Rockafellar, R.T. (1981): The T h e o r y of S u b g r a d i e n t s a n d i t s A p p l i c a t i o n s : Con- v e z a n d N o n c o n v e z F'unctions, Halderman Verlag, Berlin.

Rockafellar, R.T. (1983): Generalized s u b g r a d i e n t s in mathematical programming, in M a t h e m a t i c a l P r o g r a m m i n g : The State-of-the-Art, S p r i n g e r Verlag, Berlin, 368-390.

Rockafellar, R.T. and R. Wets (1982), On t h e i n t e r c h a n g e of subdifferentiation and conditional e x p e c t a t i o n f o r convex functionals, S t o c h a s t i c s 7, 173-182.

Solis, R. and R. Wets (1981): A s t a t i s t i c a l view of s t o c h a s t i c programming. Tech.

r e p o r t , Univ. Kentucky.

Asymptotic Behavior of Statistical Estimators and of Optimal Solutions of Stochastic Optimization Problems, II

FOREWORD

ASYMPTOTIC BEHAVIOR OF STATIETICAL ESI'IMATORS

OF OF'TIMAL SOLUTIONS OF

OPTIMIZATION PROBLEMS. II

'

athe he ma tical

=

...

*

-

-

t ) ,

+

=

+

-

-

=

=

=

+

-

"2"

at

=

=

+,

-

-

=

+

-

c =

=

=

=

.

R

t

-

-

t

=

<

t

x

t k

t,

r

r

<

11 I

r

f r (8

= f

r

r

f

=

=

that

that

=

...

that

V

= ly13

.

+

t.

* * *

=

...

+

-

*

q)

C1

*

¹¹ ^I

^r