• Keine Ergebnisse gefunden

University of Heidelberg and Rutgers University draft, September 1997

N/A
N/A
Protected

Academic year: 2022

Aktie "University of Heidelberg and Rutgers University draft, September 1997"

Copied!
16
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Two M-Functionals of Scatter

Lutz Duembgen and David Tyler

University of Heidelberg and Rutgers University draft, September 1997

Abstract.

LetP be a probability distribution on

R

q. A detailed study of the breakdown properties of two M-functionals of scatter,P 7! (P) andP 7!(P P), is given. Here () denotes Tyler's (1987) M-functional of scatter, taking values in the set of symmetric, positive denite qq matrices. It assumes zero as a given center of the underlying distri- butions. The second functional avoids this assumption by operating on the symmetrized distribution P P := L(

x

;

y

) with independent random vectors

x

y

P. Let P be smooth in the sense of assigning probability zero to hyperplanes. Then:

(1) The (contamination) breakdown point ofP 7!(P) equals 1=q.

(2) The breakdown point ofP 7!(P P) equals 1;p1;1=q2 ]1=(2q)1=q.

(3) If we restrict attention to \tight contamination", then the breakdown point ofP 7!

(P P) equals p1=q >1=q.

In all three cases the sources of breakdown are investigated. It turns out that breakdown is only caused by rather special contaminating distributions that are concentrated near low-dimensional subspaces.

Keywords and phrases:

breakdown, coplanar contamination, tight contamination, M- functional, scatter matrix, symmetrization

Research supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.

(2)

LetP Qbe nondegenerate probability distributions on

R

q. The covariance matrix of P,

Z (x;(P))(x;(P))>P(dx) with (P) :=Z xP(dx)

is known to be very sensitive to small perturbations inP. Various robust surrogates for the covariance functional have been proposed. In the present paper we investigate the breakdown properties of two particular M-functionals of scatter.

The rst one is Tyler's (1987) M-functional (P), which is dened as follows: Let

M

be the set of symmetric matrices in

R

q q, and let

M

+ be the set of all positive denite

M 2

M

. ForM 2

M

+ let

G(P M) := q

P(

R

qnf0g)

Z

R q

nf0g M

;1=2

xx

>

M

;1=2

x

>

M

;1

x

P(dx)

which is a nonnegative denite matrix in

M

with trace q. If there is a unique matrix

M 2

M

+with

G(P M) = I and trace(M) = q

then we dene (P) :=M. Otherwise we dene arbitrarily (P) := 0. In what follows we utilize the following two properties of see Kent and Tyler (1988) and Duembgen (1996).

Proposition 1.1

Let V be the set of linear subspaces V of

R

q with 1 dim(V) < q. Suppose thatPf0g= 0. Then (P)2

M

+ if, and only if,

P(V) < dim(V)

q

for all V 2V: (1.1)

If G(P M) =I for some matrix M 2

M

+ but P(V)dim(V)=q for some space V 2V, then there is a second space W 2V such thatV \W =f0g andP(V W) = 1.

Proposition 1.2

Suppose thatPf0g= 0 and (P)2

M

+. Then (Q) ! (P) asQ!P :

(1.2)

(3)

Here and throughout the sequel the space of probability measures on

R

q is equipped with the topology of weak convergence.

The denition of (P) assumes zero as a given and known \center" ofP. The second M-functional investigated here is (P P), where generally

P Q := L(

x

;

y

) with independent random vectors

x

P

y

Q:

This modied functional, proposed in Duembgen (1996), avoids assumptions on or esti- mation of location parameters.

A quantity describing the robustness of the functional P 7!(P) is its contamination breakdown point (cf. Huber 1981). This is dened to be the supremum (P) of all

2]01 such that

(Q)2

M

+ for all Q2U(P ) and sup

Q2U(P )

1((Q))

q((Q)) < 1:

Here U(P ) denotes the contamination neighborhood

U(P ) := n(1;)P +H :H some distribution on

R

qo

of P, and 1(M) 2(M) q(M) denote the ordered eigenvalues ofM 2

M

. If

P is \smooth" in the sense that

P(H) = 0 for any hyperplane H

R

q

(1.3)

then it turns out that

(P) = 1

q :

This result is known to several people, though it never appeared in a journal. Our purpose is not only to give a general expression for (P) and a precise proof but also to investigate the case = (P) in more detail. Namely, in this case it turns out that for any sequence of distributionsQk = (1;)P +Hk 2U(P ) with (Qk)2

M

+, the condition numbers ( 1= q)((Qk)) tend to innity if, and only if, the distributionsHk are concentrated near suitable linear subspaces of

R

q. In accordance with Tyler (1986) we call this \coplanar contamination".

(4)

Analogous considerations are made for P 7! (P P). The breakdown point s(P) of this functional is dened as (P) with (QQ) in place of (Q). In case of smooth distributionsP,

s(P) = 1;

s

1;1

q 2

i 1 2q1

q h

:

In case of=s(P), the conditions on a sequence of distributions Qk = (1;)P +Hk 2

U(P ) in order to achieve unbounded condition numbers ( 1= q)((QkQk)) are even more restrictive. In particular, a necessary condition is that

j

y

kj !p 1 (k!1) if

y

k Hk:

This observation is important, because coplanar contamination \at innity" is easier to detect than arbitrary coplanar contamination. This leads to the question about breakdown caused by \tight" contamination. That means, we replace the neighborhood U(P ) with

U(P j) := n(1;)P +H :H some distribution on

R

q

such that Hfx:jxj>r g (r) for allr>0o

where is some continuous function from 01] into 01] with (0) = 1 and (1) = 0.

In case ofP 7!(P) replacingU(P ) withU(P j) does not alter the breakdown point.

However, lets(Pj) be dened ass(P) withU(P j) in place ofU(P ). Then it turns out that for smoothP,

s(Pj) =

s1

q :

All proofs are deferred to Section 4.

2 The breakdown properties of P 7! (P)

Condition (1.1) is equivalent to(P)>0, where

(P) := min

V2V

dim(V)=q;P(V)+ 1;P(V) 2

h01

q i

with 0=0 := 0. That this minimum is well-dened follows from Lemma 4.1 in Section 4.

The set of all V 2 V such that dim(V)=q;P(V)]+=(1;P(V)) equals (P) is denoted

(5)

byV(P). Another useful abbreviation is

P := 12Lj

x

j;1

x

x

6= 0+L;j

x

j;1

x

x

6= 0 with

x

P :

This is a symmetric distribution on the unit sphere

S

q ;1 of

R

q. Note that G(P )

G(P ) and thus (P) = (P).

Theorem 2.1

Suppose that (P) 2

M

+. Let P = Pf0g0+ (1;Pf0g)Po, where x denotes Dirac measure inx2

R

q and Po is a distribution on

R

qnf0g. Then

(P) =

8

>

>

>

>

>

>

>

>

>

<

>

>

>

>

>

>

>

>

>

:

(1;Pf0g)(Po)

1;Pf0g(Po) in general

(P) if Pf0g= 0 1

q

if P is smooth in the sense of (1.3):

Suppose that = (P). For any Q = (1;)P +H in U(P ), one has (Q) = 0 if, and only if, Hf0g = 0 and H(V) = 1 for some V 2 V(Po). Moreover, for k 1 let

Q

k = (1;)P+Hk 2U(P ) such that (Qk)2

M

+. Then limk !1( 1= q)((Qk)) =1 if, and only if, the following two conditions are satised:

lim

k !1 H

k

f0g = 0 (2.1)

any cluster point He of (Hk)k is supported by someV 2V(Po): (2.2)

An interesting fact is that the M-estimators introduced by Maronna (1976) have break- down point at most 1=(q+ 1) cf. Stahel (1981) or Tyler (1986).

(6)

Theorem 3.1

Suppose that (P P)2

M

+. Then

s(P) =

8

>

>

>

>

>

>

>

>

>

>

>

<

>

>

>

>

>

>

>

>

>

>

>

:

1;

s 1;((P P))

1;P Pf0g((P P)) in general

1;q1;((P P)) if P has no atoms 1;

s

1;1

q

if P is smooth in the sense of (1.3): Suppose that = s(P). Then (QQ) 2

M

+ for any Q in U(P ). Moreover, for

k 1 let Qk = (1;)P +Hk 2 U(P ). Then limk( 1= q)((Qk Qk)) = 1 if, and only if, the following three conditions are satised:

lim

k !1

max

x2R q

H

k

fxg = 0 (3.1)

j

y

kj !p 1 (k!1) where

y

k Hk (3.2)

for any cluster point (He1He2) of (Hk(Hk Hk))

k

there is a space V 2V((P P)) such that He1(V) =He2(V) = 1: (3.3)

This theorem shows that symmetrization lowers the breakdown point of the M-func- tional. However, the type of contamination required in order to cause breakdown of the functional P 7! (P P) is far more special than in case of P 7! (P). The quantity

((P P)) dicult to compute. However, for V 2V,

(P P)(V) = P P(V);P Pf0g 1;P Pf0g

P P(V) max

x2R q

P(x+V) so that

((P P)) s(P) := min

x2R q

V2V

dim(V)=q;P(x+V)+ 1;P(x+V) : Here is the result on tight contamination mentioned in the introduction.

(7)

Theorem 3.2

Suppose that (P P)2

M

+. Then

s(P j)

8

>

>

>

<

>

>

>

:

p

s(P) in general

=

s1

q

if P is smooth in the sense of (1.3):

Suppose thatP satises (1.3). Then (Q Q) = 0 forQ= (1; )P+H 2U(P j) if, and only if,H has no atoms andH is supported by some one-dimensional ane subspace of

R

q. Similarly, fork1 letQk = (1;)P+Hk 2U(P ) such that (QkQk)2

M

+. Then limk !1( 1= q)((QkQk)) =1 if, and only if, the following two conditions are satised:

lim

k !1

max

x2R q

H

k

fxg = 0 (3.4)

any cluster point of ((Hk Hk))k is supported by someV 2V with dim(V) = 1:

(3.5)

One can easily show that Condition (3.5) implies that any cluster point of (Hk)k is supported by some one-dimensional ane subspace of

R

q.

4 Proofs

Lemma 4.1

For 0 d < q let V(d) be the set of all d-dimensional linear subspaces of

R

q. Then both

max

V2V(d)

Q(V) and max

x2R q

V2V(d)

Q(x+V) are well-dened and upper semicontinuous inQ.

Proof of Lemma 4.1.

Let (Qk)k be any sequence of distributions converging weakly to someQ. Let Vk 2V(d) andxk 2

R

q such that either

x

k = 0 and Qk(Vk) > sup

V2V(d) Q

k(V);k;1 (4.1)

or

x

k 2 V

?

k and Qk(xk+Vk) > sup

x2R q

V2V(d) Q

k(x+V);k;1: (4.2)

(8)

Let Mk 2

M

describe the orthogonal projection from

R

q ontoVk. After replacing (Qk)k with a subsequence if necessary, one may assume that (Mk)k converges to some projection matrixM, and we dene V :=M

R

q. Further one may assume that

lim

k !1 jx

k

j = 1 or lim

k !1 x

k = x2

R

q:

Since xk +Vk fy :jyjjxkjg one easily deduces from limkQk =Q and limkjxkj=1 that limkQk(xk+Vk) = 0. If limkxk =x, then for anyR>0,

limsup

k !1 Q

k(xk+Vk) lim

k !1 Z

1;Rjy;Mky;xkj+Qk(dy)

= Z 1;Rjy;My;xj+Q(dy)

! Q(x+V) (R!1):

These considerations show that supV2V(d)Q(V) and supx2RqV2V(d)Q(x+V) are upper semicontinuous in Q. In the special case (Qk)k Q one realizes that both suprema are

attained. 2

Propositions 1.1 and 1.2 entail the following two facts:

Lemma 4.2 (a)

Let Q be a familiy of nondegenerate distributions on

R

q such that (Q)2

M

+ for all Q2Q and letfQ:Q2Qgbe closed. Then

sup

Q2Q 1

q

((Q)) < 1:

(b)

Let (Qk)k be a sequence of nondegenerate distributions on

R

q such that (Qk)2

M

+ for allk and lim

k !1

Qk = Qe lim

k !1 1

q

((Qk)) = 211]:

If = 1, then Qe(V) dim(V)=q for some V 2 V. If < 1 but Qe(V) dim(V)=q for some space V 2V, then there is a second space W 2 V such thatV \W = f0g and

e

Q(V W) = 1.

(9)

Proof of Lemma 4.2.

As for part (a), Prohorov's Theorem implies thatfQ:Q2

Qgis even compact. Since (Q) = (Q)2

M

+ for all Q2Q, Proposition 1.2 yields sup

Q2Q 1

q

((Q)) = max

Q2Q 1

q

((Q)) < 1:

In part (b) suppose rst that Qe(V)<dim(V)=q for all V 2 V. Then (Qe)2

M

+ by Proposition 1.1, and (Qe) = limk(Qk) by Proposition 1.2, whence= ( 1= q)((Qe))<

1.

Now suppose that <1. After replacing (Qk)k with a subsequence if necessary, one may assume that limk(Qk) =M 2

M

+. But then

I = lim

k !1

G(Qk(Qk)) = G(Qe M)

becauseG((Qk)) converges uniformly toG(M) ask!1. Thus ifQe(V)dim(V)=q for some V 2 V, then the second part of Proposition 1.1 says that V \W = f0g and

e

Q(V W) = 1 for someW 2V. 2

Proof of Theorem 2.1.

Note rst that nQ : Q2 U(P )o is equal to the closed set

n(1;o)P +oHe :He any symmetric distribution on

S

q ;1o

where

o := 1;(1;)Pf0g: For ifQ= (1;)P +H 2U(P ), then

Q = (1;)(1;Pf0g)P +(1;Hf0g)H

(1;)(1;Pf0g) +(1;Hf0g) = (1;0)P +0He for some symmetric distribution He on

S

q ;1 and

0 := (1;Hf0g)

(1;)(1;Pf0g) +(1;Hf0g) o: Further,

Q(V) (1;o)P(V) +o = (1;o)Po(V) +o

(10)

with equality if, and only if, Hf0g = 0 and H(V) = 1. This is strictly smaller than dim(V)=q if, and only if,

o

<

dim(V)=q;Po(V) 1;Po(V) :

Hence we can concude the following: If o < (Po) then () 2

M

+ on U(P ), and Lemma 4.2 (a) yields that ( 1= q)(()) is bounded on U(P ). If o = (Po), then (Q) = 0 for Q = (1;)P +H 2 U(P ) if, and only if, Hf0g= 0 and H(V) = 1 for some V 2 V(Po). Since o is strictly increasing in , inverting the equation o = (Po) yields

(P) = (1;Pf0g)(Po) 1;Pf0g(Po) :

Let= andQk = (1;)P+Hk 2U(P ) as stated in the theorem. After replacing (Qk)k with a subsequence if necessary, one may assume that limkHkf0g = a 2 01], limkHk = He (where 0 may be dened arbitrarily) and limk( 1= q)((Qk)) = 2 11]. This implies that

lim

k !1

Qk = Qe := (1;)(1;Pf0g)P +(1;a)He (1;)(1;Pf0g) +(1;a) :

Since (P)2

M

+, P(V W)<1 for arbitraryVW 2V withV \W =f0g. The limit distribution Qe inherits this property. Thus one can apply Lemma 4.2 (b) and conclude that=1if, and only if,Qe(V)dim(V)=q for some V 2V. But for anyV 2V,

e

Q(V) = (1;)(1;Pf0g)Po(V) +(1;a)He(V) (1;)(1;Pf0g) +(1;a) (1;)(1;Pf0g)Po(V) +(1;a)

(1;)(1;Pf0g) +(1;a) (1;)(1;Pf0g)Po(V) +

(1;)(1;Pf0g) +

= (1;o)Po(V) +o

(1;o)dim(V)=q;(Po) 1;(Po) +o

= dim(V)=q

with equality if, and only if,He(V) = 1,a= 0 and V 2V(Po). 2 The following preliminary result for the proof of Theorems 3.1 and 3.2 describes the possible limits of a sequence ((P Hk))k.

(11)

Proposition 4.3

Let (Hk)k 1 be a sequence of distributions on

R

q. A pair (aBe) is cluster point for the sequence (P Hkf0g(P Hk))

k 1

if, and only if, it can be represented as follows:

a = X

x2R q

Pfxga

x

and

e

B =

e

B

1+ X

x2R q

Pfxg

(1;)Hfxg;axBex+ (1;)(1;P Hf0g)(P H) 1; X

x2R q

Pfxga

x

with

:= lim

r !1

liminf

k !1 H

k

fx:jxj>r g some distribution H on

R

q

numbers ax20(1;)Hfxg] and

symmetric distributions Be1 and Bex on

S

q ;1:

Proof of Proposition 4.3.

We compactify

R

q via the mapping

x 7! (x) := (1 +jxj);1x 2 U(01)

where U(y) and B(y) denote, respectively, the open and closed ball around y 2

R

q

with radius 0. Without loss of generality one may assume that lim

k !1 H

k

;1 = D and = D(

S

q ;1):

Even ifD is concentrated onU(01) the Continuous Mapping Theorem is not applicable to (P Hk), because the points in

X

:= nx 2

R

q : D f(x)g > 0o require special attention. Since

D f(x)g = lim

#0

liminf

k !1 H

k

U(x) = lim

#0

limsup

k !1 H

k

B(x) for anyx2

X

and

= lim

r #1

liminf

k !1 H

k(

R

qnB(0r)) = lim

r #1

limsup

k !1 H

k(

R

qnU(0r))

(12)

one can nd numbers xk 0 and rk >0 such that with Uxk :=U(xxk) andU1k :=

R

qnB(0rk) the following requirements are met:

lim

k !1

xk = 0 and lim

k !1 H

k U

xk = D f(x)g forx2

X

lim

k !1 r

k = 1 and lim

k !1 H

k U

1k =

U

xk

\U

y k = for dierent xy2

X

f1g:

After replacing (Hk)k with a suitable subsequence if necessary, one may assume further that for anyx2

X

,

lim

k !1 H

k

fxg = ax 20D f(x)g] lim

k !1

Lx;

y

k

y

k 2Uxknfxg = Bex if

y

k Hk:

Since limkHkfxg= 0 whenever D f(x)g= 0, this implies that lim

k !1

P H

k

f0g = X

x2X Pfxga

x

(4.3) :

Further we write D =Be1+ (1;)H;1 with distributions Be1 on

S

q ;1 and H on

R

q. Now let

f(x) :=

(

g(jxj;1x) ifx6= 0 0 if x= 0

for some even, continuous function g on

S

q ;1, and let

x

P,

y

k Hk and

y

H be independent. Then, ask!1,

IEf(

x

;

y

k) = IE 1f

y

k 2U1kgf(

x

;

y

k) + IE1f

y

k 62U1kgf(

x

;

y

k)

= Z gdBe1+ IE1f

y

k 62U1kgf(

x

;

y

k) +o(1)

= Z gdBe1+X

x2X

PfxgIE1f

y

k 2Uxknfxggf(x;

y

k)

+ X

x2X

PfxgIE1f

y

k 62UxkU1kgf(x;

y

k) + IE 1f

x

62

X

y

k 62U1kgf(

x

;

y

k) +o(1)

= Z gdBe1+X

x2X Pfxg

D f(x)g;axZ gdBex + (1;)X

x2X

PfxgIE1f

y

6=xgf(x;

y

)

Referenzen

ÄHNLICHE DOKUMENTE

Univ Tartu Press and Open Access. ● OA as one of the priorities

As the same sensors will be used in the inner and outer detector layers, the exprint should also have a similar width as the inner detector layer exprints.. Due to the fact that

On the way to the final detector readout, this thesis presents a first implementation test of online tracking on GPUs for the MuPix telescope which is a beam telescope consisting

All local static displacements found in layer 3 were smaller than 2 µm, as large as or smaller than the average amplitudes of flow induced vibrations in the module.. Thus the

7 shows the fraction of internal conversion events in the signal region against the resolution of the mass reconstruction for different σ-regions around the muon mass.. From

Figure 6.7: Eye diagrams of 800 Mbit/s data transmission via flexprints with a trace width of 100 µm, a trace separation of 150 µm for pairs and 650 µm between pairs, and a

Summarizing the measurement results, one can say that the high frequency data transmission chain is always limited by the least performant part. It seems that the SantaLuz

The performance of the linearised vertex reconstruction algorithm was studied in terms of reconstruction efficiency, vertex position resolution, par- ticle momentum and