NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS
ASYbFIVTIC BEHAVIOR OF STATISTICAL ESTIMATORS AND OF OPTIkrlAL SOLUTIONS OF STOCHASTIC
OFTIMEATION PROBLEXS. I1
Jitka DupaEovh Roger J.-B. Wets*
8 ~ u p p o r t e d in p a r t by a g r a n t of t h e National Science Foundation.
Working Papers a r e interim r e p o r t s on work of t h e International Institute f o r Applied Systems Analysis and have r e c e i v e d only limited review. Views or opinions e x p r e s s e d h e r e i n d o not necessarily r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.
INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria
FOREWORD
This p a p e r supplements t h e r e s u l t s of a new s t a t i s t i c a l a p p r o a c h t o t h e prob- lem of incomplete information in stochastic programming. The tools of nondifferen- tiable optimization used h e r e , help t o p r o v e t h e consistency and asymptotic nor- mality of (approximate) optimal solutions without unnatural smoothness assump-
tions. This allows t h e t h e o r y t o t a k e into account t h e p r e s e n c e of contraints.
Alexander B. K u n h a n s k i Chairman System and Decision Sciences Program
ASYMPTOTIC BEHAVIOR OF STATIETICAL ESI'IMATORS
ANDOF OF'TIMAL SOLUTIONS OF
SlVCHASITCOPTIMIZATION PROBLEMS. II
J i t k a DupaE o v h
'
a n d Roger J.-B. Wetsathe he ma tical
S t a t i s t i c s , Charles University, P r a g u e 2 ~ a t h e m a t i c s , University of California, DavisINTRODUCTION
These r e s u l t s complement those of Dupaeoviiand Wets (1986). W e use t h e s a m e notation and identical set-up, t h e r e a d e r is t h u s r e f e r r e d t o t h a t a r t i c l e where h e s h a l l find definitions and t h e consistency results. W e even continue t h e numbering of sections and equations, s o w e start with Section 4.
4 ASYMPTOTICS. CONVERGENCE RATES
In Section 3 of Dupaeovg and Wets (1986) we exhibited sufficient conditions f o r t h e convergence with probability 1 of t h e estimators [ x u : Z -4 Rn, v
=
1 ,...
j t o x*
, t h e optimal solution of t h e limit problem. H e r e w e go one s t e p f u r t h e r and analyze t h e rate of c o n v e r g e n c e in probabilistic terms. The argumentation i s r e l a t - e d t o t h a t of Huber (1967), adapted t o f i t t h e more g e n e r a l c l a s s of problems u n d e r consideration; t h i s w a s a l r e a d y t h e p a t t e r n followed by Solis and Wets (1981), in t h e unconstrained c a s e and by ~ u p a e o v g (1983a, 1983b, 1984) f o r s t o c h a s t i c pro- grams with r e c o u r s e u n d e r s p e c i a l assumptions. W e extend t h e r e s u l t s of H u b e r (1967) in a number of directions: (i) we allow f o r c o n s t r a i n t s , (ii) t h e probability measures converging t o P are not necessarily t h e empirical measures, and (iii) t h e r e a r e no differentiability assumptions on t h e likelihood ( c r i t e r i o n ) function (in t e r m s of Huber's set-up, t h i s would c o r r e s p o n d t o t h e c a s e when his function (k i s not uniquely determined, s e e Section 3 of Huber (1967)).One way t o look at t h e r e s u l t s of this section i s t o view them as providing lim- iting conditions under which one may be able t o obtain asymptotic normality. Note t h a t when t h e r e a r e c o n s t r a i n t s , one should usually not e x p e c t t h e asymptotic dis- tribution t o b e Gaussian. This, in t u r n , allows us t o obtain c e r t a i n probabilistic es- timates f o r t h e convergence "rates". To approximate t h e distribution of x u , t o ob- tain confidence intervals f o r example, we need a n a s s e r t i o n t h a t a suitably normal- ized sequence converges in distribution t o a n o n d e g e n e r a t e random vector. The normalizing coefficients need not b e unique but they suggest a r a t e of conver- gence. Following Lehmann (1983) we shall s a y t h a t t h e sequence x u
-
x* goes to 0w i t h t h e r a t e of c o n v e r g e n c e l / k , if k, --,
-
as v --, and if t h e r e is a continu- ous distribution function H such t h a tWe begin by a quick review of t h e main definitions and r e s u l t s t h a t provides us with a good notion f o r t h e subgradients of not necessarily differentiable functions.
Any assumption of differentiability of f(.,
t ) ,
would b e i n a p p r o p r i a t e and would f o r one r e a s o n o r a n o t h e r eliminate from t h e domain of applicability all t h e examples mentioned in Section 2. To handle t h e lack of differentiability, we r e l y on t h e t h e o r y of subdifferentiability developed .to handle nonsmooth func Lions, s e e Clarke (1983), Rockafellar (1983), Aubin and Ekeland (1984).The c o n t i n g e n t d e r i v a t i v e of a lower semicontinuous function h : R n --, (- =,
+
=] at x , a point at which h i s finite, with r e s p e c t t o t h e direction y ish'(x; y) :
=
e p i -1im inf h(x+
ty)-
h(x)L A O t
using t h e convention
-
==
=. I t i s not difficult t o s e e t h a t h' i s always well de- fined with values in t h e extended r e a l s . If x fE dom h, t h e n h'(x; - )=
w, otherwisehl(x; y)
=
lim inf h(x+
ty')-
h(x)"2"
tThe ( 2 ~ p p e r ) e p i d e r i v a t i v e of h
at
x , where h i s finite, in d i r e c t i o n y, i s t h e epi-limit s u p e r i o r of t h e collection th'(x1; .), x' E R n j at x , i.e.hT(x; .) :
=
epi-lim s u p h1(x'; - ) x' 4 XhT(x; y)
=
inf ,,+,
lim sup hl(x'; y') IYt +Ylwhere by writing f x ' -, x ] and fy' -, y ] we mean t h a t t h e infimum must b e t a k e n with r e s p e c t t o a l l n e t s
-
o r equivalently h e r e sequences-
converging t o x and y, s e e Aubin and Ekeland (1984), Chapter 7 , Section 3.I t i s r e m a r k a b l e t h a t if h i s p r o p e r , and x E dom h , t h e function y k h t ( x ; a ) i s s u b l i n e a r and l.sc. [Theorems 1 and 2, Rockafellar (1980)l. Moreover, if h i s Lipschitzian around x , t h e n h t ( x ; -) i s e v e r y w h e r e finite (and hence continuous); in p a r t i c u l a r if h i s continuously differentiable at x t h e n ht(x; y ) is t h e directional d e r i v a t i v e of h in d i r e c t i o n y , and if h i s convex in a neighborhood of x , t h e n
ht(x; y )
=
lim h (x+
t y )-
h(x)t r o t
i s t h e one-sided directional d e r i v a t i v e in direction y. The sublinearity and lower semicontinuity of hT(x; .) makes i t possible t o define t h e notion of a subgradient of h at x , by exploiting t h e f a c t t h a t t h e r e i s a one-to-one c o r r e s p o n d e n c e between t h e p r o p e r lower semicontinuous, sublinear functions g a n d t h e nonempty closed convex s u b s e t s C of Rn, given by
g(y) = s ~ p , , ~ v - y f o r a l l y E R " , and
c =
f v E ~ " l v - ~ 5 g(y) f o r all y E R"]s e e Rockafellar (1970). Assuming t h a t h t ( x ; .) is p r o p e r , l e t ah(x) b e t h e nonempty closed convex set s u c h t h a t f o r a l l y,
E v e r y v e c t o r in v E ah(x) i s a subgradient of h at x. If h i s smooth (continuously differentiable) t h e n
Bh(x)
=
fVh(x), t h e g r a d i e n t of h a t x j ; if h i s convex, t h e ni s t h e usual definition of t h e s u b g r a d i e n t s of a convex function. More g e n e r a l l y if h i s locally Lipschitz at x , t h e n
ah (x)
=
c o f v=
lim v h ( x f ) lh i s smooth at x' ].
X' + X
For t h e proofs of t h e s e preceding a s s e r t i o n s and f u r t h e r details, consult Rockafel- l a r (1981) and Aubin and Ekeland (1984).
Before we r e t u r n t o t h e problem a t hand, we s t a t e t h e r e s u l t s about t h e addi- tivity of subgradients t h a t a r e r e l e v a n t t o o u r analysis, we begin with a g e n e r a l r e s u l t t h a t shows t h a t t h e derivatives and subgradient functions of t h e random l.sc.
function f and t h e expectation functionals EVf and Ef have t h e a p p r o p r i a t e measurability p r o p e r t i e s .
THEOREM 4.1 S u p p o s e h : Rn X Z --,
R
i s a r a n d o m l o w e r s e m i c o n t i n u o u s f u n c - t i o n . Then, so a r e i t s c o n t i n g e n t d e r i v a t i v e a n d i t s ( u p p e r ) e p i - d e r i v a t i v e . Moreover, for a l l x E Rn,t
I+ 8h(x, t ) i s a r a n d o m closed c o n v e z s e t .PROOF Theorem of Salinetti and Wets (1981) tells us t h a t t h e lirn s u p and lirn inf of sequences of random closed sets (closed-valued measurable multifunctions) a r e random closed s e t s . Since t h e e p i g r a p h s of t h e epi-lim s u p and epi-lim inf are respectively t h e lirn inf and lirn s u p of t h e corresponding sequence of e p i g r a p h s (see f o r example, Section 2 of Dolecki, Salinetti and Wets (1983)), t h e a s s e r t i o n about t h e derivatives follows from t h e i r definitions and p r o p e r t y (3.4) of random lower semicontinuous functions. Since hT(x; -, t ) i s sublinear, i t follows t h a t i t s conjugate
-
a n o t h e r random l.sc. function, Rockafellar (1976)-
i s t h e indicator of t h e random closed convex sett
I+ 8f (x, t ) .Our i n t e r e s t in subdifferential t h e o r y i s conditioned by t h e f a c t t h a t f o r a v e r y l a r g e c l a s s of functions (with values in t h e extended r e a l s ) , we c a n c h a r a c - t e r i z e optimality in t e r m s of a differential inclusion, a point x0 t h a t minimizes t h e p r o p e r l.sc. function on R", must necessarily satisfy
if h i s convex t h i s i s a l s o a sufficient condition. T h e r e i s a subdifferential cal- culus, but f o r o u r p u r p o s e s t h e following r e s u l t s about t h e subdifferentials on sums of l.sc. functions i s a l l we need. We s a y t h a t a l.sc. function is s u b d ~ e r e n t i a l l y r e g u l a r a t x if h'(x; .)
=
ht(x; -). If h i s convex o r subsmooth on a neighborhood of x , t h u s in p a r t i c u l a r if h i s C1 at x , i t i s subdifferentially r e g u l a r at x ; h i s s u b s m o o t h on a neighborhood V of x , if f o r all y E Vwhere T i s a compact topological s p a c e , e a c h pt i s of class C1, and both pt(x) and V,pt(x) a r e continuous with r e s p e c t t o (t, x). If h is subsmooth on a n open s e t U , i t
i s a l s o locally Lipschitz on U , Clarke (1975).
LEMMA 4.2 Rockafellar (1979) Suppose hl and h2 are l.sc. jknctions o n R%nd x a point at which both hl and h2 are finite. Suppose that dom hl(x; .) i s nonempty and h 2 i s locally Lipschitz at x. Then
Moreover equality holds i f h l and h2 are subdifferentially regular at x.
LEMMA 4.3 Clarke (1983) Let U be a n open subset of Rn, and suppose h : U X E -4 R i s measurable w i t h respect to
<
and there exist a summablejknc- tion @ such that for alt'xO, x i in U andt
E ESuppose moreover that for some
x
E U , Eh(x) i s finite. Then Eh i s finite and Lipschitz o n U , a n d f o r all x i n U ,Moreover, equality holds whenever h(. , t ) i s a.s. subdiflerentially regular at x, in which case also Eh i s s u b d ~ e r e n t i a l l y regular at x.
Theorem 4.1 shows t h a t
t k
8h(x, t ) is a random (nonempty) closed s e t ; i t i s e a s y t o v e r i f y t h a t u n d e r t h e assumptions of Lemma 4.3, h is a random 1-sc. func- tion on U X E. In f a c t f o r a l lt,
ah(x, t ) i s a compact s u b s e t of R n , see Proposition 2.1.2 of Clarke (1983). The i n t e g r a l of a random closed s e tr
defined on E (with values in t h e closed s u b s e t s of Rn) i ssee Aumann (1965). If P is absolutely continuous, and
r
is i n t e g r a b l y bounded ( t h e function<
k s u p ~Ilx11 I
IlxlI Er
( t ) j i s summable), t h e nf r (8
P ( d t )= f
c or
( t ) P(d<) i s convex, where c o T. ( t ) is t h e convex hull of t . Ifr
i s uniformly bounded t h e n
f
r ( t ) P ( d t ) i s a compact s u b s e t of Rn.W e s h a l l b e working with t h e same set-up a s in Section 3 , but with a somewhat more r e s t r i c t e d c l a s s of random l.sc. functions. Instead of Assumption 3.4, w e s h a l l b e using t h e following one:
ASSUMPTION 4.4 The f u n c t i o n f : Rn x E --, (- a, a] i s of t h e following t y p e :
w h e r e (k, i s t h e i n d i c a t o r f u n c t i o n o f t h e closed n o n e m p t y set S C R", i.e., (k,(x)
=
0 .ig x E S , a n d=
o t h e r w i s e ,a n d fo i s a f i n i t e v a l u e d f u n c t i o n o n Rn X E, w i t h CH f0(x, C) r e l a t i v e l y c o n t i n u o u s o n Z ,
for a l l x E S , a n d a n y o p e n set U t h a t c o n t a i n s S, t h e f u n c t i o n x --, fo(x, C) i s l o c a l l y L i p s c h i t z
for a l l E E, a n d s u c h
that
to a n y b o u n d e d o p e n set V t h e r e c o r r e s p o n d s a P- s u m m a b l e f u n c t i o n s u c h t h a t for a n y p a i r xO, xi in V:The only condition of Assumption 3 . 4 t h a t does not a p p e a r explicitly in As- sumption 4.4, e i t h e r in e x a c t l y t h e same form o r in a s t r o n g e r form, i s t h e lower semicontinuity of f ( - , C) on Rn f o r a l l [ in Z. But t h a t i s a n immediate consequence of t h e f a c t t h a t f o ( - , C) i s locally Lipschitz and S i s closed. Thus, f i s a p r o p e r ran- dom lower semicontinuous function, and s o i s a l s o fo. Moreover all t h e r e s u l t s and t h e observations of Section 3 a r e immediately applicable t o both f and fo, as well as t o t h e corresponding expectation functionals. Of c o u r s e t h e s e functions will now have Lipschitz p r o p e r t i e s t h a t w e shall exploit in o u r analysis. In t h e convex c a s e i t might be possible t o work with weaker r e s t r i c t i o n s on t h e function f by relying on f i n e r r e s u l t s about t h e additivity of subgradients, see Rockafellar and Wets (1982). Combining t h e r e s u l t s of Section 3 , with those a b o u t subgradients of random l.sc. functions, in p a r t i c u l a r Lemma 4.3, we c a n show that:
LEMMA 4.5 U n d e r A s s u m p t i o n s 4.4 a n d 3.5, w e h a v e
that
p-a.s. Ef a n d [EVf, v=
1 ,...
j a r e p r o p e r lower s e m i c o n t i n u o u s f u n c t i o n sthat
a r e l o c a l l y L i p s c h i t z o n S. Moreover w e a l w a y s h a v ewith equality iJfor all t , fo(., C) i s s u b d ~ e r e n t i a l l y regular at x. Moreover, if X E S
with equality
V
igs and for all t , fo(-, t ) are subdUYerentially regular at x . REMARK 4.6 If x E S , aq,(x) i s t h e p o l a r of t h e tangent cone T,(x) t o S a t x.Clarke (1975). If S i s a d i f f e r e n t i a b l e manifold, t h e n aQ,(x) i s t h e orthogonal com- plement of t h e tangent s p a c e at x and, of c o u r s e , (k, i s differentially r e g u l a r a t x.
This i s a l s o t h e case when S i s locally convex at x , or if x belongs t o t h e boundary of S and t h i s boundary i s locally a differentiable manifold. More generally, qs i s subdifferentially r e g u l a r at x , if t h e tangent cone t o S a t x , h a s t h e following r e p r e s e n t a t i o n
T,(x)
= ly13
hk & 0, yk + y.
with x+
hkyk E S jS o f a r , w e have limited o u r assumptions t o c e r t a i n continuity p r o p e r t i e s of t h e function f with r e s p e c t to x and
t.
In o r d e r t o d e r i v e t h e asymptotic b e h a v i o r w e need t o impose some additional conditions a b o u t t h e way t h e information collect- ed from t h e samples i s included in t h e approximating probability measures P V , in p a r t i c u l a r o n how i t a f f e c t s t h e s u b g r a d i e n t s of t h e functions Evf. Let us introduce t h e following notation: uo(x, t ) will always denote a n element of Bfo(x, t ) and v,(x) a n element of B9,(x). In view of Theorem 4 . 1 and Lemma 4.5 if x E S , we always have t h a t v(x) E aEf(x) implies t h e existence of v,(x) E 8q,(x) and uO(x, .) measur- a b l e with uo(x, t ) E af0(x, C) P-a.s. s u c h t h a tMoreover similar formulas hold p-a.s. if t h e integration i s with r e s p e c t t o P v ( . , <) instead of P . If t h e functions f o ( - , t ) , as w e l l as Q,, are a s . subdifferentially regu- l a r , t h e n a t y p e of c o n v e r s e statement a l s o holds. W e h a v e t h a t
* * *
implies t h e e x i s t e n c e of v, E aqs(x ) and of a random function uo(x , - ) from E t o R n with uo(x*, -) E ~f o(x*, t ) P - a . s . such t h a t
Similarly,
means t h a t t h e r e e x i s t v,(xV) E aqs(xV), and a random function u,(xV, a ) from E t o Rn with uo(xV, a ) E 8fO(xV, ') P V - a . s . such t h a t
ASSUMPTION 4 . 7 S t a t i s t i c a l Information. The p r o b a b i l i t y m e a s u r e s IP", Y
=
1,...
j a r e s u c h t h a t for some v V E a ~ " f (x*, () a n d v E aEf (xu(())(i) 6 [ v v ( x * , ()
+
v(xV(())] c o n v e r g e s t o 0 i n p r o b a b i l i t y ; (ii) 6 [ v S ( x " ( 0 )-
vs(x )] c o n v e r g e s t o 0 i n p r o b a b i l i t y ;*
(iii) vv(x*, () i s a s y m p t o t i c a l l y G a u s s i a n w i t h d i s t r i b u t i o n f u n c t i o n N(0,
q)
w h e r e
C1
i s t h e c o v a r i a n c e m a t r i x . Moreover(iv) Efo i s t w i c e c o n t i n u o u s l y d w e r e n t i a b l e a t x
*
with n o n s i n g u l a r H e s s i a n H .Before we p r o c e e d with t h e main r e s u l t of t h i s section, l e t us examine some of 'the implications of t h e s e assumptions. The assumption t h a t Efo i s of c l a s s
c2
is of c o u r s e r a t h e r r e s t r i c t i v e , b u t without i t i t maybe h a r d t o obtain asymptotic nor- mality; a more g e n e r a l c l a s s of limiting distributions (piecewise normal) f o r con- s t r a i n e d problems h a s r e c e n t l y been identified by King and Rockafellar (1986).Note t h a t t h i s does not r e q u i r e t h a t fo b e of c l a s s
c2.
The assumption t h a t 6 [ v S ( x v ( ( ) )
-
vS(xL)] converges in probability t o 0 , essentially means t h a t t h e convergence of x V to x* i s "smooth". Of c o u r s e , i t will b e satisfied if x* belongs t o t h e i n t e r i o r of t h e set S of c o n s t r a i n t s , in which c a s e V,(X*
) and p - a . s . vs(xV(()) a r e z e r o f o r Y sufficiently l a r g e . I t will also be trivially satisfied if t h e binding c o n s t r a i n t s are l i n e a r and, x* and p-a.s. xV((), belong t ot h e l i n e a r v a r i e t y spanned by t h e s e c o n s t r a i n t s . In f a c t , w e c a n e x p e c t this condi- tion t o be satisfied unless t h e v e c t o r x* i s a boundary point at which t h e boundary h a s high c u r v a t u r e , in p a r t i c u l a r at point at which t h e boundry i s not smooth.
The condition a b o u t asymptotic normality of t h e s u b g r a d i e n t s vv(x*) i s b e s t understood in t h e following c o n t e x t . Suppose condition (ii) i s satisfied, in f a c t l e t us assume t h a t vs(x
*
)=
vs(xV(<)) a s . And suppose a l s o t h a t P V i s t h e empirical dis- tribution. Then Ilvv(x*, ()I1 r e c o r d s t h e e r r o r of t h e estimate of t h e s u b g r a d i e n t s of Ef at x*; note t h a t 0 E a ~ f ( x * ) .The f i r s t condition yields a n estimate f o r t h e e r r o r s of t h e s u b g r a d i e n t s of EVf at x * a n d Ef at xV(<). The assumption is t h a t enough information i s collected s o as t o g u a r a n t e e a c e r t a i n c o n v e r g e n c e r a t e t o 0 . This is a c r u c i a l assumption and a f t e r t h e statement of t h e theorem will r e t u r n t o t h i s condition and give sufficient conditions t h a t imply i t .
THEOREM 4 . 8 U n d e r A s s u m p t i o n s 4 . 4 , 3.5 a n d 4 . 7 , 6 ( x V ( - )
-
x*) is a s y m p t o t i - c a l l y n o r m a l w i t h d i s t r i b u t i o n N(0, C) w h e r e C=
H-l C 1 ( ~ - l ) = .PROOF Since Efo is assumed t o b e
c2,
and xu(.) c o n v e r g e s t o x*, f o r v sufficiently l a r g e ,Now, s i n c e v(x
*
)=
0 ,+
f i [ ~ , ( X * )-
vs(xV)]By Assumption 4 . 7 t h e f i r s t t e r m c o n v e r g e s t o z e r o in probability, t h e second one c o n v e r g e s in distribution t o N(0, C1) and t h e t h i r d one c o n v e r g e s in probability t o z e r o . Hence d v [ v ~ f ~ ( x ~ )
-
VEfo(x )] converges in distribution t o N(0, C1)*
(Slutsky's Theorem). This is t h e n a l s o t h e asymptotic distribution of f i ~ ( x '
-
x*). The r e s u l t now follows by t h e nonsingularity of t h e matrix H. DThe remainder of t h i s s e c t i o n , i s devoted t o r e c o r d i n g c e r t a i n conditions t h a t will yield condition (i) of Assumption 4 . 7 . In view of Markov's inequality i t would suffice t o c o n t r o l t h e v a r i a n c e of I1vv(x*)
+
v(x")II t o obtain t h e d e s i r e d conver- gence. More generally w e h a v e t h e following:LEMMA 4.9 S u p p o s e that E , , ~ " ( x * . <)
=
0, t h a t~ , , l l b { ( x * ,
-
vo(x*)l12j Se2/
v a n d thatI L V ( ~
*
' + v ( x v ( 0 ) " c o n v e r g e s t o 0 in p r o b a b i l i t y ( p ).
v - l /
+
Ilv(x ~ ( < ) ) l lThen, u n d e r Assumptions 4 . 4 a n d 3.5, f o r a n y ( m e a s u r a b l e ) s e l e c t i o n s v v ( x * , - ) with
s u c h that p-a.s. v ( x )
* =
0, t h e r a n d o m v e c t o rm v v ( x * ,
<I
+ v ( x V ( < > > lc o n v e r g e s t o 0 in p r o b a b i l i t y as v g o e s t o
=.
PROOF W e need t o show t h a t t o a n y c
>
0 , t h e r e c o r r e s p o n d s v , s u c h t h a t f o r a l l v 2 v,,where 6, g o e s t o z e r o as c g o e s t o z e r o .
Chebychev's inequality a n d t h e assumptions of t h e Theorem imply t h a t f o r a l l a,
And h e n c e with a2
=
2 e 2 / c, w e h a v eThis, in conjunction with t h e last o n e of o u r assumptions, i.e.,
implies t h a t t h e e v e n t s
Ilvv(x*)
+
v ( x " ) l \<
E ( V - ~ / ~+
IIv(x")II) and IlvV(x*)ll 5 v - 1 / 2 e m , h a v e p r o b a b i l i t y ( p ) at l e a s t 1-
c. Thus f o r c small,since IIv(x")II 5 IIvV(x*)
+
v(x " ) I 1+
Ilvy(x*)\I. This, t o g e t h e r with (4.4), givesand t h i s yields t h e d e s i r e d expression with
6, =
~ ( 1+
(B+
E ) / ( 1-
E)).I t is e a s y t o see why t h e condition EpJvV(x*. <)I
=
0 would be satisfied when t h e P V are providing moment estimates t h a t are a t l e a s t as good as t h e empirical distributions. The same holds f o r t h e second assumption in Lemma 4.9, t h e r e is a reduction in t h e v a r i a n c e estimate t h a t is a t l e a s t as significant as t h a t which would b e a t t a i n e d by using t h e empirical distribution. Finally, t h e l a s t assumption of Lemma 4.9 means t h a t w e can allow f o r a c e r t a i n slack in t h e convergence in probability of f i I l v V ( x * )+
v(x ")11
t o z e r o . In t h e Appendix, w e give a derivation of t h i s condition by using assumptions t h a t are r e l a t e d t o those used by Huber (1967).5 ASYMPTOTIC LAGRANGIANS
The r e s u l t s of Sections 3 and 4 c a n b e extended t o Lagrangians by relying on t h e t h e o r y of epi/hypo-convergence f o r saddle functions, Attouch and Wets (1983a). This gives u s n o t just asymptotic p r o p e r t i e s f o r t h e sequence
!xu, v
=
I,...
of optimal solutions but also f o r t h e associated Lagrange multi- pliers.W e now i n t r o d u c e a n explicit r e p r e s e n t a t i o n of t h e c o n s t r a i n t s in t h e formula- tion of t h e problem:
minimize z
=
E Jf ,(x, $)1
(5.1)s u b j e c t t o f i ( x ) S O , i = l , .
. .
, s ,where f o r i = I ,
. . .
, m, t h e f l are finite-valued continuous functions, f o i s a finite-valued random l.sc. function, and X i s a closed s u b s e t of R n . When instead of P , w e use P V then t h e objective function i s modified and becomesThe (standard) associated Lagrangians are
and L(x, y)
=
Efo(x)
+
C f " = l ~ i f i ( x ) if x E X , and yi 2 0 , f o r i=
1 ,. . .
, s ,m if x $? X ,
-m otherwise
.
Consistency c a n be studied in t h e same framework as t h a t d e s c r i b e d at t h e beginning of Section 3. The Lagrangians L V a r e t h e n a l s o dependent of {. Suppose t h a t fo satisfies t h e conditions of Assumption 3.4; Note t h a t some of t h e s e condi- tions are automatically satisfied since f o is a finite-valued random l.sc. function.
Suppose a l s o t h a t t h e [ P V , v
=
1,...
j s a t i s f y Assumption 3.5 with f o replacing f (in t h e asymptotic negligibility condition), t h e n i t follows from Lemma 3.6 t h a t p-a.s.t h e Lagrangians L are finite-valued random l.sc. functions on (Rn x (R x Rm -')) x Z; on t h e complement all functions L v are
-
.a. This i s all w e need t o g u a r a n t e e t h e r e q u i r e d measurability p r o p e r t i e s , in p a r t i c u l a r w e have t h a tL'(x, y)
=
((x, y), {)k LV(x, y , {) i s B n + m @ A - m e a s u r a b l e
.
~ ' f o ( x )
+ CKl
~ , f ~ ( x ) if x E X , and yi 2 0 , f o r i=
1,. . .
, s ,m if x $? X ,
-m otherwise
.
DEFJNTION 5.1 m e s e q u e n c e of f i n c t i o n s [h :
R n
x Rm --, [- -,-1,
v=
1 , .. .
j e p i / h y p o - c o n v e r g e s t o h : R n x Rm --, [- m,-1
i f f o r all (x, y) w e h a v e(i) f o r e v e r y s u b s e q u e n c e [h k
=
1 , .. .
j a n d s e q u e n c e ixk j ~ c o n v e r g i n g t o , ~ x, t h e r e e x i s t s a s e q u e n c elYk]c=l
c o n v e r g i n g t o y s u c h thath(x, y) 5 lim i n f h Y ( x k , yk) ,
k - + -
a n d
(ii) f o r e v e r y s u b s e q u e n c e {h vk, k
=
1 , ...
] a n d s e q u e n c eiyk jrZl
c o n v e r g i n g t o y, t h e r e e x i s t s a s e q u e n c e[ z ~ ] < = ~
c o n v e r g i n g t o x s u c hthat
This t y p e of convergence of b i v a r i a t e functions w a s introduced by Attouch and Wets (1983a) in o r d e r t o study t h e convergence of saddle points; in Attouch and Wets (1983b) i t i s a r g u e d t h a t i t actually is t h e weakest t y p e of convergence t h a t will g u a r a n t e e t h e c o n v e r g e n c e of saddle points.
THEOREM 5.2 Consistency. From Assumptions 3.4 a n d 3.5, w i t h f replaced b y f @
i t
follows t h a t t h e r e e z i s t s Zo E F w i t h p(Z\Zo)=
0 s u c h t h a t L=
epi/hypo-
lim L p-a.s.v
--
a n d hence:
(i) for all ( E ZO, a n y c l u s t e r pont
(x^,
9) of a n y sequence I(x ', y v), v=
I , . . j, w i t h (xu, yV) a saddle p o i n t of Lv(. , ., (), i s a saddle p o i n t of L;(ii) i f D i s a compact subset o f R n x Rm t h a t m e e t s f o r all v, or a t least for some subsequence, t h e set of saddle p o i n t s of L'(-,
-, 0
for some ( E Zo, t h e n t h e r e e z i s t (xu, yv) s a d d l e p o i n t s of Lv(-;, ( ) f o r v=
1 ,... that
h a v eat
least one c l u s t e r p o i n t ;(iii) moreover, i f t h e preceding c o n d i t i o n i s s a t i s f i e d for all ( E Z@ a n d L h a s a u n i q u e s a d d l e p o i n t , t h e n t h e r e e z i s t s a sequence
of Fv- measurable f+unctions t h a t for a l l ( E ZO, determine saddle p o i n t of t h e LV, a n d converge to t h e saddle p o i n t of L.
W e note t h a t sufficient condition f o r t h e existence of saddle points a r e pro- vided by t h e condition introduced in Proposition 3.10 (with f t h e essential objec- tive function of problem (5 . I ) ) , in conjunction with t h e Mangasarian-Fromovitz con- s t r a i n t qualification.
ASYMPTOTIC NORMALITY 5.3 The techniques of Section 4 can also b e used to ob- t a i n asymptotic normality r e s u l t s . However, t h e r e i s not y e t a good concept of sub- differentiability for b i v a r i a t e functions, e x c e p t in t h e convex c a s e (Rockafellar (1964)), and in t h e differentiable case, of c o u r s e . With aL(aLV r e s p . ) t h e set of s u b g r a d i e n t s of t h e Lagrangians in t h e convex o r differentiable c a s e , t h e condition t h a t (x*, y*) i s a saddle point of L c a n b e e x p r e s s e d a s
and 0 E aLV(xV, y V , <) in t h e c a s e of LV. For example, in t h e convex c a s e when all t h e functions [f,, i
=
0 , l ,. . .
, m l are differentiable and X=
R n , t h i s condition i s equivalent to:and similarly f o r LV.
I t i s e a s y t o see t h a t when Assumptions 4.4 and 3.5 hold (with f o instead of f ) , as well as Assumption 4.7, b u t t h i s time with u V and v s u b g r a d i e n t s of LV and L r e s p e c t i v e l y , and S
=
X X ( R s X Rm-'), then by t h e same argument as in t h e proof of Theorem 4.8, w e obtain:6 ( x V ( . )
-
x*
, yV(.) -y*) i s asymptoticaly normal.
F o r a n application t o t h e above r e s u l t s t o t h e case of linearly r e s t r i c t e d L1- r e g r e s s i o n (2.3) see DupaEov& (1987).
APPENDIX
W e s h a l l show t h a t t h e assumption
*
"vv(x ) + v(x'" c o n v e r g e s in probability t o 0 , v - ~ ' ~
+
llu(x")llof Lemma 4.9 follows from a s e r i e s of sufficient conditions similar t o those of Huber (1967) by a slight modification of t h e paving technique of t h e s a m e p a p e r . The main difference i s due t o t h e f a c t t h a t t h e probability measures P V ( . , <) are not necessarily t h e empirical ones s o t h a t t h e expectation EpEVf(x, <)
=
j ~ ' f ( x . ,u(d<) need not b e equal t o Ef(x), e t c . and t h a t subgradients a r e used in- z
s t e a d of g r a d i e n t s .
ASSUMPTION A . l There i s do
>
0, a>
0 such that for all x E ~ ( x * )=
[x : llx
-
x811<
do] a n d f o r a n arbitrary v(x) E aEf(x) Ilv(x)-
v(x8>Il 2 allx-
x*ll.
ASSUMPTION A.2 For a n y measurable selection u,(x;) such that uo(x, t ) E afo(x, t ) P-a.s. denote
and assume
( i ) for all 0
<
d S do, x E N(x*) there i s M1>
0 such that bothand
E,EV{G(x,
t ,
d ) j 5 Mld(ii) for all 0
<
ds
do, x E ~ ( x * ) there i sM2 >
0 and iji E (I/ 2 ,1 3
such thatvar,EvfG(x. t , d ) { r M.$v-'
.
ASSUMPTION A.3 For all x E ~ ( x * ) , for a n y measurable selection v t ( x ) E aEvfo(x) w i t h v[(x*) E 8 ~ ' f ~ ( x * ) p-a.s. and for a n y vo(x) E 8Efo(x) w i t h vo(x*) E aEVfO(x*) there i s M 2
>
0 and a €(I/ 2 ,11
such thatLEMMA A.4 Under Assumptions A . l , A.2, A.3
in p- probability a s v 4
=.
PROOF P u t ZV(x, x')
=
I l v ~ ( x ' )-
vV(x)-
v ( x f )+
v(x)I1
v-1'211v(x')
-
v(x)llUsing (4.2) and (4.3), we c a n write
Ilv,!Jx')
-
V{(X)-
v o ( x f )+
v0(x)IIZV(x, x')
=
v - l I 2
+
IIv(x')-
v(x)lland
I \
according t o Chebychev inequality and Assumption A.3. This estimate, however, does not yield t h e a s s e r t i o n of t h e Lemma.
A s in Huber (1967) we c o v e r ~ ( x * ) by shrinking neighborhoods whose size de- c r e a s e s and whose number does not i n c r e a s e t o rapidly as v --,
-.
Let y be such t h a t
-
1<
y<
min (a,a).
P u t Ndo=
~ ( x * ) and denote by 2By t h e same argument a s above
The a r e a NdO \ Ndov7 will b e c o v e r e d by finitely many nonoverlapping "borders" of t h e form
where
and f o r e a c h v , 6 i s fixed in such a way t h a t
with Mo r 2 a n i n t e g e r t o be defined l a t e r . A s a r e s u l t ,
log Mo
-
log (Mo-
1 )6 =
log v
To simplify t h e notation w e s h a l l put
A s t h e n e x t s t e p , we s h a l l c o v e r e a c h of "borders" N(k) by nonoverlapping neigh- borhoods of a n equal volume with c e n t e r s x' s u c h t h a t
and diameters
2d0)
=
dk-
dk + 1=
d,1~-"~[1-
v - ~ ].
T h e i r number will not e x c e e d
Using (A.3), w e h a v e
-
1 5 IJ-2 and ~ I J 2 - 1 ~
+
I J - ~ 2 I J - ~.
('4.4)Let N b e any of t h e neighborhoods of t h e covering N(k), i.e., N
=
[x : llx-
x'11
S dg)]. We have according t o Assumption A.lIlvo"(x)
-
v{(x*
)-
v0(x)+
v0(x8)IIs u p Z(x, X*) 3 s u p
X E N x E N 1 ~ - 1 / 2 + a d O ~ - ( k + l ) 6
Using Assumption A.3, Chebychev inequality and (A.4)
Similarly, according t o Assumption A.2 (ii) and Chebychev inequality
where
7
=
~ a d ~ v - ( ' " ) ~-
E f 3 ( x f .1.
d ) ]-
E,Evfii(xl,t ,
d ) ]M
1according to Assumption A.2 (i), (A.3) and (A.4). F o r Mo
> -
t h e lower bound in 2 ~ a(A.6) is nontrivial and w e have t h a t
Finally, according t o (A.1), (A.5) and (A.7)
p f ( : s u p ZV(x, X*) 5 2 c j 5 p f ( : s u p ZV(x, x*) 5
& I
x E N(x') x ENdlS,
K, -1
+ z
p f ( : s u p ZV(x, x*) 5 E ] Mk = O x E N cNk)
In addition, f o r v l a r g e enough, 1
-
K v b-
a<
0 and K v 6-
a<
0 , K v b- -
a<
0 due t o o u r choice of 7 and (A.2).S u m m a r i z i n g : f o r a n a r b i t r a r y E
>
0 , 1/ 2<
7<
min (a,a)
i t i s possible t o bound t h e probabilityfrom above by a n e x p r e s s i o n which c o n v e r g e s t o z e r o as v --, =. D
REFERENCES
Attouch, H. and R. Wets (1983a): A convergence t h e o r y f o r saddle functions, P a n - s u c t i o n s Amer. Math. Soc., 280, 1-41.
Attouch, H. and R. Wets (1983b): A convergence f o r b i v a r i a t e functions aimed at t h e c o n v e r g e n c e of saddle values, in Mathematical Theories o f ' @ t i m i z a t i o n , eds. J. Cecconi and T. Zolezzi, S p r i n g e r Verlag L e c t u r e Notes in Mathematics 979, 1-42.
Aubin, J.-P. and I. Ekeland (1984): Applied Nonlinear A n a l y s i s , Wiley Intersci- e n c e , New York.
Aumann, R.J. (1965): I n t e g r a l s of set-valued functions,
J.
Mathematical A n a l y s i s a n d A p p l i c a t i o n s 1 2 , 1-12.Clarke, R. (1975): Generalized g r a d i e n t s and applications, P a n s a c t i o n American Mathematical S o c i e t y , 205, 247-262.
Clarke, R. (1983): @ t i m i z a t i o n a n d Nonsmooth A n a l y s i s , Wiley Interscience.
N e w York.
Dolecki, S., G. Salinetti and R. Wets (1983): Convergence of functions: equisemicon- tinuity. ZYans. Amer. Math. Soc. 276, 409-429.
DupaEovg, J . (1983a): Stability in s t o c h a s t i c programming with r e c o u r s e . Acta U n i v . Carol.-Math et P h y s . 2 4 , 23-34.
DupaEovg, J . (1983b): The problem of stability in s t o c h a s t i c programming (in Czech). Dissertation f o r t h e Doctor of Sciences d e g r e e , Faculty of Mathemat- i c s and Phyics, Charles University, P r a g u e .
DupaEovA, J . (1984): On asymptotic normality of inequality constrained optimal de- cisions. In: Proc. 3 - r d P r a g u e congerence o n a s y m p t o t i c s t a t i s t i c s , eds. P . Mandl and M. ~ u g k o v d . E l s e v i e r , Amsterdam, p. 249-257.
~ u p a E o v A , J . (1987): Asymptotic p r o p e r t i e s of r e s t r i c t e d L1-estimates. To a p p e a r in P r o c . of t h e 1st International Conference on Statistical Data Analysis based on t h e L1 norm and Related Methods, Neuchatel, 31 August
-
4 September 1987.DupaBov6, J. and R. Wets (1986): Asymptotic b e h a v i o r of s t a t i s t i c a l e s t i m a t o r s and of optimal solutions f o r s t o c h a s t i c optimization problems, IIASA WP-86-41, Laxenburg, Austria.
H u b e r , P . (1967): The b e h a v i o r of maximum likelihood estimates u n d e r nonstandard conditions, P r o c . F W h B e r k e l e y a m p . Math. S t a t . Prob. 1, University of Cali- f o r n i a P r e s s , Berkeley, 221-233.
King, A. and R .T. Rockafellar (1986): Non-normal asymptotic b e h a v i o r of solution estimates in linear-quadratic s t o c h a s t i c optimization, Manuscript, Univ. Wash- ington, S e a t t l e .
Lehmann,
E.L.
(1983): T h e o r y o f p o i n t e s t i m a t i o n . Wiley, I n t e r s c i e n c e , N e w York.Rockafellar, R.T. (1964): Minimax theorems and conjugate saddle functions, Ma- t e m a t i c a S c a n d i n a v i c a , 14, 151-173.
Rockafellar, R.T. (1970): C o n v e z A n a l y s i s , P r i n c e t o n University P r e s s , Prince- ton.
Rockafellar, R.T. (1976): I n t e g r a l functionals, normal i n t e g r a n d s and measurable multifunctions, in N o n l i n e a r @ e r a t o r s a n d t h e C a l c u l u s of V a r i a t i o n s , eds.
J. Gossez and L. Waelbroeck, S p r i n g e r Verlag L e c t u r e Notes in Mathematics 543, Berlin, 157-207.
Rockafellar, R.T. (1979): Directionally Lipschitzian functions and subdifferential calculus, P r o c e e d i n g s L o n d o n M a t h e m a t i c a l S o c i e t y 39, 331-355.
Rockafellar, R.T. (1980): Generalized directional d e r i v a t i v e s and s u b g r a d i e n t s of nonconvex functions, C a n a d i a n
J.
M a t h e m a t i c s 32, 257-280.Rockafellar, R.T. (1981): The T h e o r y of S u b g r a d i e n t s a n d i t s A p p l i c a t i o n s : Con- v e z a n d N o n c o n v e z F'unctions, Halderman Verlag, Berlin.
Rockafellar, R.T. (1983): Generalized s u b g r a d i e n t s in mathematical programming, in M a t h e m a t i c a l P r o g r a m m i n g : The State-of-the-Art, S p r i n g e r Verlag, Berlin, 368-390.
Rockafellar, R.T. and R. Wets (1982), On t h e i n t e r c h a n g e of subdifferentiation and conditional e x p e c t a t i o n f o r convex functionals, S t o c h a s t i c s 7, 173-182.
Solis, R. and R. Wets (1981): A s t a t i s t i c a l view of s t o c h a s t i c programming. Tech.
r e p o r t , Univ. Kentucky.