University of Heidelberg and Rutgers University draft, September 1997

(1)

Two M-Functionals of Scatter

Lutz Duembgen and David Tyler

University of Heidelberg and Rutgers University draft, September 1997

Abstract.

Let^P be a probability distribution on

R

^q. A detailed study of the breakdown properties of two M-functionals of scatter,^P ^7! (^P) and^P ^7!(^P ^P), is given. Here () denotes Tyler's (1987) M-functional of scatter, taking values in the set of symmetric, positive denite ^q^q matrices. It assumes zero as a given center of the underlying distributions. The second functional avoids this assumption by operating on the symmetrized distribution ^P ^P := ^L(

x

^;

y

) with independent random vectors

x

y

^P. Let ^P be smooth in the sense of assigning probability zero to hyperplanes. Then:

(1) The (contamination) breakdown point of^P ^7!(^P) equals 1^=q.

(2) The breakdown point of^P ^7!(^P ^P) equals 1^;^p1^;1^=q² ]1⁼(2^q)1^=q.

(3) If we restrict attention to \tight contamination", then the breakdown point of^P ^7!

(^P ^P) equals ^p1^=q ^>1^=q.

In all three cases the sources of breakdown are investigated. It turns out that breakdown is only caused by rather special contaminating distributions that are concentrated near low-dimensional subspaces.

Keywords and phrases:

breakdown, coplanar contamination, tight contamination, M- functional, scatter matrix, symmetrization

Research supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.

(2)

Let^P^Qbe nondegenerate probability distributions on

R

^q. The covariance matrix of ^P,

Z (^x^;(^P))(^x^;(^P))^>^P(^dx) with (^P) :=^Z ^x^P(^dx)

is known to be very sensitive to small perturbations in^P. Various robust surrogates for the covariance functional have been proposed. In the present paper we investigate the breakdown properties of two particular M-functionals of scatter.

The rst one is Tyler's (1987) M-functional (^P), which is dened as follows: Let

M

be the set of symmetric matrices in

R

^{q q}, and let

M

⁺ be the set of all positive denite

M 2

M

. For^M ²

M

⁺ let

G(^P^M) := ^q

P(

R

^qⁿ^f0^g)

Z

R q

nf0g M

;1=2

xx

>

M

;1=2

x

>

M

;1

x

P(^dx)

which is a nonnegative denite matrix in

M

with trace ^q. If there is a unique matrix

M 2

M

⁺with

G(^P^M) = ^I and trace(^M) = ^q

then we dene (^P) :=^M. Otherwise we dene arbitrarily (^P) := 0. In what follows we utilize the following two properties of see Kent and Tyler (1988) and Duembgen (1996).

Proposition 1.1

Let ^V be the set of linear subspaces ^V of

R

^q with 1 dim(^V) ^< ^q. Suppose that^P^f0^g= 0. Then (^P)²

M

⁺ if, and only if,

P(^V) ^< dim(^V)

q

for all ^V ²^V^: (1.1)

If ^G(^P^M) =^I for some matrix ^M ²

M

⁺ but ^P(^V)dim(^V)^=q for some space ^V ²^V, then there is a second space ^W ²^V such that^V ^\^W =^f0^g and^P(^V ^W) = 1.

Proposition 1.2

Suppose that^P^f0^g= 0 and (^P)²

M

⁺. Then (^Q) ^! (^P) as^Q^!^{P :}

(1.2)

(3)

Here and throughout the sequel the space of probability measures on

R

^q is equipped with the topology of weak convergence.

The denition of (^P) assumes zero as a given and known \center" of^P. The second M-functional investigated here is (^P ^P), where generally

P Q := ^L(

x

^;

y

) with independent random vectors

x

^P

y

^Q:

This modied functional, proposed in Duembgen (1996), avoids assumptions on or esti- mation of location parameters.

A quantity describing the robustness of the functional ^P ^7!(^P) is its contamination breakdown point (cf. Huber 1981). This is dened to be the supremum (^P) of all

2]01 such that

(^Q)²

M

⁺ for all ^Q²^U(^P) and sup

Q2U(P )

1((^Q))

q((^Q)) ^< ^1:

Here ^U(^P) denotes the contamination neighborhood

U(^P) := ⁿ(1^;)^P +^H :^H some distribution on

R

^q^o

of ^P, and ¹(^M) ²(^M) ^q(^M) denote the ordered eigenvalues of^M ²

M

. If

P is \smooth" in the sense that

P(^H) = 0 for any hyperplane ^H

R

^q

(1.3)

then it turns out that

(^P) = 1

q :

This result is known to several people, though it never appeared in a journal. Our purpose is not only to give a general expression for (^P) and a precise proof but also to investigate the case = (^P) in more detail. Namely, in this case it turns out that for any sequence of distributions^Q^k = (1^;)^P +^H^k ²^U(^P) with (^Q^k)²

M

⁺, the condition numbers ( ¹⁼^q)((^Q^k)) tend to innity if, and only if, the distributions^H^k are concentrated near suitable linear subspaces of

R

^q. In accordance with Tyler (1986) we call this \coplanar contamination".

(4)

Analogous considerations are made for ^P ^7! (^P ^P). The breakdown point ^s(^P) of this functional is dened as (^P) with (^Q^Q) in place of (^Q). In case of smooth distributions^P,

s(^P) = 1^;

s

1^;1

q 2

i 1 2^q1

q h

:

In case of=^s(^P), the conditions on a sequence of distributions ^Q^k = (1^;)^P +^H^k ²

U(^P) in order to achieve unbounded condition numbers ( ¹⁼^q)((^Q^k^Q^k)) are even more restrictive. In particular, a necessary condition is that

j

y

^k^j ^!^p ¹ (^k^!¹) if

y

^k ^H^k^:

This observation is important, because coplanar contamination \at innity" is easier to detect than arbitrary coplanar contamination. This leads to the question about breakdown caused by \tight" contamination. That means, we replace the neighborhood ^U(^P) with

U(^P^j) := ⁿ(1^;)^P +^H :^H some distribution on

R

^q

such that ^Hfx:^jxj^>^{r g} (^r) for all^r^>0^o

where is some continuous function from 0¹] into 01] with (0) = 1 and (¹) = 0.

In case of^P ^7!(^P) replacing^U(^P) with^U(^P^j) does not alter the breakdown point.

However, let^s(^P^j) be dened as^s(^P) with^U(^P^j) in place of^U(^P). Then it turns out that for smooth^P,

s(^P^j) =

s1

q :

All proofs are deferred to Section 4.

2 The breakdown properties of P 7! (P)

Condition (1.1) is equivalent to(^P)^>0, where

(^P) := min

V2V

dim(^V)^=q^;^P(^V)⁺ 1^;^P(^V) ²

h01

q i

with 0⁼0 := 0. That this minimum is well-dened follows from Lemma 4.1 in Section 4.

The set of all ^V ² ^V such that dim(^V)^=q^;^P(^V)]⁺⁼(1^;^P(^V)) equals (^P) is denoted

(5)

by^V(^P). Another useful abbreviation is

^P := 12^L^j

x

^j^;1

x

⁶= 0+^L^;j

x

^j^;1

x

⁶= 0 with

x

^{P :}

This is a symmetric distribution on the unit sphere

S

^{q ;1} of

R

^q. Note that ^G(^P)

G(^P) and thus (^P) = (^P).

Theorem 2.1

Suppose that (^P) ²

M

⁺. Let ^P = ^P^f0^g⁰+ (1^;^P^f0^g)^P^o, where ^x denotes Dirac measure in^x²

R

^q and ^P^o is a distribution on

R

^qⁿ^f0^g. Then

(^P) =

8

>

<

>

:

(1^;^P^f0^g)(^P^o)

1^;^P^f0^g(^P^o) in general

(^P) if ^P^f0^g= 0 1

q

if ^P is smooth in the sense of (1.3)^:

Suppose that = (^P). For any ^Q = (1^;)^P +^H in ^U(^P), one has (^Q) = 0 if, and only if, ^Hf0^g = 0 and ^H(^V) = 1 for some ^V ² ^V(^P^o). Moreover, for ^k 1 let

Q

k = (1^;)^P+^H^k ²^U(^P) such that (^Q^k)²

M

⁺. Then lim^{k !1}( ¹⁼^q)((^Q^k)) =¹ if, and only if, the following two conditions are satised:

lim

k !1 H

k

f0^g = 0 (2.1)

any cluster point ^H^e of (^H^k)^k is supported by some^V ²^V(^P^o)^: (2.2)

An interesting fact is that the M-estimators introduced by Maronna (1976) have breakdown point at most 1⁼(^q+ 1) cf. Stahel (1981) or Tyler (1986).

(6)

Theorem 3.1

Suppose that (^P ^P)²

M

⁺. Then

s(^P) =

8

>

<

>

:

1^;

s 1^;((^P ^P))

1^;^P ^P^f0^g((^P ^P)) in general

1^;^q1^;((^P ^P)) if ^P has no atoms 1^;

s

1^;1

q

if ^P is smooth in the sense of (1.3)^: Suppose that = ^s(^P). Then (^Q^Q) ²

M

⁺ for any ^Q in ^U(^P). Moreover, for

k 1 let ^Q^k = (1^;)^P +^H^k ² ^U(^P). Then lim^k( ¹⁼^q)((^Q^k ^Q^k)) = ¹ if, and only if, the following three conditions are satised:

lim

k !1

max

x2R q

H

k

fxg = 0 (3.1)

j

y

^k^j ^!^p ¹ (^k^!¹) where

y

^k ^H^k (3.2)

for any cluster point (^H^e¹^H^e²) of (^H^k(^H^k ^H^k))

k

there is a space ^V ²^V((^P ^P)) such that ^H^e¹(^V) =^H^e²(^V) = 1^: (3.3)

This theorem shows that symmetrization lowers the breakdown point of the M-functional. However, the type of contamination required in order to cause breakdown of the functional ^P ^7! (^P ^P) is far more special than in case of ^P ^7! (^P). The quantity

((^P ^P)) dicult to compute. However, for ^V ²^V,

(^P ^P)(^V) = ^P ^P(^V)^;^P ^P^f0^g 1^;^P ^P^f0^g

P P(^V) max

x2R q

P(^x+^V) so that

((^P ^P)) ^s(^P) := min

x2R q

V2V

dim(^V)^=q^;^P(^x+^V)⁺ 1^;^P(^x+^V) ^: Here is the result on tight contamination mentioned in the introduction.

(7)

Theorem 3.2

Suppose that (^P ^P)²

M

⁺. Then

s(^P ^j)

8

>

<

>

:

p

s(^P) in general

=

s1

q

if ^P is smooth in the sense of (1.3)^:

Suppose that^P satises (1.3). Then (^{Q Q}) = 0 for^Q= (1^;)^P+^H ²^U(^P^j) if, and only if,^H has no atoms and^H is supported by some one-dimensional ane subspace of

R

^q. Similarly, for^k1 let^Q^k = (1^;)^P+^H^k ²^U(^P) such that (^Q^k^Q^k)²

M

⁺. Then lim^{k !1}( ¹⁼^q)((^Q^k^Q^k)) =¹ if, and only if, the following two conditions are satised:

lim

k !1

max

x2R q

H

k

fxg = 0 (3.4)

any cluster point of ((^H^k ^H^k))^k is supported by some^V ²^V with dim(^V) = 1^:

(3.5)

One can easily show that Condition (3.5) implies that any cluster point of (^H^k)^k is supported by some one-dimensional ane subspace of

R

^q.

4 Proofs

Lemma 4.1

For 0 ^d ^< ^q let ^V(^d) be the set of all ^d-dimensional linear subspaces of

R

^q. Then both

max

V2V(d)

Q(^V) and max

x2R q

V2V(d)

Q(^x+^V) are well-dened and upper semicontinuous in^Q.

Proof of Lemma 4.1.

Let (^Q^k)^k be any sequence of distributions converging weakly to some^Q. Let ^V^k ²^V(^d) and^x^k ²

R

^q such that either

x

k = 0 and ^Q^k(^V^k) ^> sup

V2V(d) Q

k(^V)^;^k^;1 (4.1)

or

x

k 2 V

?

k and ^Q^k(^x^k+^V^k) ^> sup

x2R q

V2V(d) Q

k(^x+^V)^;^k^;1^: (4.2)

(8)

Let ^M^k ²

M

describe the orthogonal projection from

R

^q onto^V^k. After replacing (^Q^k)^k with a subsequence if necessary, one may assume that (^M^k)^k converges to some projection matrix^M, and we dene ^V :=^M

R

^q. Further one may assume that

lim

k !1 jx

k

j = ¹ or lim

k !1 x

k = ^x²

R

^q^:

Since ^x^k +^V^k ^fy :^jyj^jx^k^jg one easily deduces from lim^k^Q^k =^Q and lim^k^jx^k^j=¹ that lim^k^Q^k(^x^k+^V^k) = 0. If lim^k^x^k =^x, then for any^R^>0,

limsup

k !1 Q

k(^x^k+^V^k) lim

k !1 Z

1^;^Rjy^;^M^k^y^;^x^k^j⁺^Q^k(^dy)

= ^Z 1^;^Rjy^;^M^y^;^xj⁺^Q(^dy)

! Q(^x+^V) (^R^!¹)^:

These considerations show that sup^V^2V(d)^Q(^V) and sup^x2R^q^V^2V(d)^Q(^x+^V) are upper semicontinuous in ^Q. In the special case (^Q^k)^k ^Q one realizes that both suprema are

attained. ²

Propositions 1.1 and 1.2 entail the following two facts:

Lemma 4.2 (a)

Let ^Q be a familiy of nondegenerate distributions on

R

^q such that (^Q)²

M

⁺ for all ^Q²^Q and let^f^Q:^Q²^Qgbe closed. Then

sup

Q2Q 1

q

((^Q)) ^< ^1:

(b)

Let (^Q^k)^k be a sequence of nondegenerate distributions on

R

^q such that (^Q^k)²

M

⁺ for all^k and lim

k !1

^Q^k = ^Q^e lim

k !1 1

q

((^Q^k)) = ²1¹]^:

If = ¹, then ^Q^e(^V) dim(^V)^=q for some ^V ² ^V. If ^< ¹ but ^Q^e(^V) dim(^V)^=q for some space ^V ²^V, then there is a second space ^W ² ^V such that^V ^\^W = ^f0^g and

e

Q(^V ^W) = 1.

(9)

Proof of Lemma 4.2.

As for part (a), Prohorov's Theorem implies that^f^Q:^Q²

Qgis even compact. Since (^Q) = (^Q)²

M

⁺ for all ^Q²^Q, Proposition 1.2 yields sup

Q2Q 1

q

((^Q)) = max

Q2Q 1

q

((^Q)) ^< ^1:

In part (b) suppose rst that ^Q^e(^V)^<dim(^V)^=q for all ^V ² ^V. Then (^Q^e)²

M

⁺ by Proposition 1.1, and (^Q^e) = lim^k(^Q^k) by Proposition 1.2, whence= ( ¹⁼^q)((^Q^e))^<

1.

Now suppose that ^<¹. After replacing (^Q^k)^k with a subsequence if necessary, one may assume that lim^k(^Q^k) =^M ²

M

⁺. But then

I = lim

k !1

G(^Q^k(^Q^k)) = ^G(^Q^e ^M)

because^G((^Q^k)) converges uniformly to^G(^M) as^k^!¹. Thus if^Q^e(^V)dim(^V)^=q for some ^V ² ^V, then the second part of Proposition 1.1 says that ^V ^\^W = ^f0^g and

e

Q(^V ^W) = 1 for some^W ²^V. ²

Proof of Theorem 2.1.

Note rst that ⁿ^Q : ^Q² ^U(^P)^o is equal to the closed set

n(1^;ô)^P +ô^Hê :^Hê any symmetric distribution on

S

^{q ;1}^o

where

o := 1^;(1^;)^P^f0^g^: For if^Q= (1^;)^P +^H ²^U(^P), then

^Q = (1^;)(1^;^P^f0^g)^P +(1^;^Hf0^g)^H

(1^;)(1^;^P^f0^g) +(1^;^Hf0^g) = (1^;⁰)^P +⁰^H^e for some symmetric distribution ^H^e on

S

^{q ;1} and

0 := (1^;^Hf0^g)

(1^;)(1^;^P^f0^g) +(1^;^Hf0^g) ^o^: Further,

^Q(^V) (1^;ô)^P(^V) +ô = (1^;ô)^Pô(^V) +ô

(10)

with equality if, and only if, ^Hf0^g = 0 and ^H(^V) = 1. This is strictly smaller than dim(^V)^=q if, and only if,

o

<

dim(^V)^=q^;^P^o(^V) 1^;^P^o(^V) ^:

Hence we can concude the following: If ^o ^< (^P^o) then () ²

M

⁺ on Û(^P), and Lemma 4.2 (a) yields that ( ¹⁼^q)(()) is bounded on Û(^P). If ô = (^Pô), then (^Q) = 0 for ^Q = (1^;)^P +^H ² Û(^P) if, and only if, ^Hf0^g= 0 and ^H(^V) = 1 for some ^V ² ^V(^Pô). Since ô is strictly increasing in , inverting the equation ô = (^Pô) yields

(^P) = (1^;^P^f0^g)(^P^o) 1^;^P^f0^g(^P^o) ^:

Let= and^Q^k = (1^;)^P+^H^k ²Û(^P) as stated in the theorem. After replacing (^Q^k)^k with a subsequence if necessary, one may assume that lim^k^H^k^f0^g = â ² 01], lim^k^H^k = ^Hê (where ⁰ may be dened arbitrarily) and lim^k( ¹⁼^q)((^Q^k)) = ² 1¹]. This implies that

lim

k !1

^Q^k = ^Qê := (1^;)(1^;^P^f0^g)^P +(1^;â)^Hê (1^;)(1^;^P^f0^g) +(1^;â) ^:

Since (^P)²

M

⁺, ^P(^V ^W)^<1 for arbitrary^V^W ²^V with^V ^\^W =^f0^g. The limit distribution ^Q^e inherits this property. Thus one can apply Lemma 4.2 (b) and conclude that=¹if, and only if,^Q^e(^V)dim(^V)^=q for some ^V ²^V. But for any^V ²^V,

e

Q(^V) = (1^;)(1^;^P^f0^g)^Pô(^V) +(1^;â)^Hê(^V) (1^;)(1^;^P^f0^g) +(1^;â) (1^;)(1^;^P^f0^g)^Pô(^V) +(1^;â)

(1^;)(1^;^P^f0^g) +(1^;^a) (1^;)(1^;^P^f0^g)^P^o(^V) +

(1^;)(1^;^P^f0^g) +

= (1^;ô)^Pô(^V) +ô

(1^;ô)dim(^V)^=q^;(^Pô) 1^;(^Pô) +ô

= dim(^V)^=q

with equality if, and only if,^Hê(^V) = 1,â= 0 and ^V ²^V(^Pô). ² The following preliminary result for the proof of Theorems 3.1 and 3.2 describes the possible limits of a sequence ((^P ^H^k))^k.

(11)

Proposition 4.3

Let (^H^k)^{k 1} be a sequence of distributions on

R

^q. A pair (^a^B^e) is cluster point for the sequence (^P ^H^k^f0^g(^P ^H^k))

k 1

if, and only if, it can be represented as follows:

a = ^X

x2R q

Pfxga

x

and

e

B =

e

B

1+ ^X

x2R q

Pfxg

(1^;)^Hfxg^;^a^x^B^e^x+ (1^;)(1^;^P ^Hf0^g)(^P ^H) 1^; ^X

x2R q

Pfxga

x

with

:= lim

r !1

liminf

k !1 H

k

fx:^jxj^>^{r g} some distribution ^H on

R

^q

numbers ^a^x²0(1^;)^Hfxg] and

symmetric distributions ^B^e¹ and ^B^e^x on

S

^{q ;1}^:

Proof of Proposition 4.3.

We compactify

R

^q via the mapping

x 7! (^x) := (1 +^jxj)^;1^x ² ^U(01)

where ^U(^y) and ^B(^y) denote, respectively, the open and closed ball around ^y ²

R

^q

with radius 0. Without loss of generality one may assume that lim

k !1 H

k

;1 = ^D and = ^D(

S

^{q ;1})^:

Even if^D is concentrated on^U(01) the Continuous Mapping Theorem is not applicable to (^P ^H^k), because the points in

X

:= ⁿ^x ²

R

^q : ^{D f}(^x)^g ^> 0^o require special attention. Since

D f(^x)^g = lim

#0

liminf

k !1 H

k

U(^x) = lim

#0

limsup

k !1 H

k

B(^x) for any^x²

X

and

= lim

r #1

liminf

k !1 H

k(

R

^qⁿ^B(0^r)) = lim

r #1

limsup

k !1 H

k(

R

^qⁿ^U(0^r))

(12)

one can nd numbers ^xk 0 and ^r^k ^>0 such that with Û^xk :=Û(^x^xk) andÛ^1k :=

R

^qⁿ^B(0^r^k) the following requirements are met:

lim

k !1

xk = 0 and lim

k !1 H

k U

xk = ^{D f}(^x)^g for^x²

X

lim

k !1 r

k = ¹ and lim

k !1 H

k U

1k =

U

xk

\U

y k = for dierent ^x^y²

X

^f1g:

After replacing (^H^k)^k with a suitable subsequence if necessary, one may assume further that for any^x²

X

,

lim

k !1 H

k

fxg = ^a^x ²0^{D f}(^x)^g] lim

k !1

^L^x^;

y

^k

y

^k ²^U^xkⁿ^fxg = ^B^e^x if

y

^k ^H^k^:

Since lim^k^H^k^fxg= 0 whenever ^{D f}(^x)^g= 0, this implies that lim

k !1

P H

k

f0^g = ^X

x2X Pfxga

x

(4.3) :

Further we write ^D =^B^e¹+ (1^;)^H^;1 with distributions ^B^e¹ on

S

^{q ;1} and ^H on

R

^q. Now let

f(^x) :=

(

g(^jxj^;1^x) if^x⁶= 0 0 if ^x= 0

for some even, continuous function ^g on

S

^{q ;1}, and let

x

^P,

y

^k ^H^k and

y

^H be independent. Then, as^k^!¹,

IE^f(

x

^;

y

^k) = IE 1^f

y

^k ²^U^1k^gf(

x

^;

y

^k) + IE1^f

y

^k ⁶²^U^1k^gf(

x

^;

y

^k)

= ^Z ^g^d^B^e¹+ IE1^f

y

^k ⁶²^U^1k^gf(

x

^;

y

^k) +^o(1)

= ^Z ^g^d^B^e¹+^X

x2X

PfxgIE1^f

y

^k ²^U^xkⁿ^fxggf(^x^;

y

^k)

+ ^X

x2X

PfxgIE1^f

y

^k ⁶²^U^xk^U^1k^gf(^x^;

y

^k) + IE 1^f

x

⁶²

X

y

^k ⁶²^U^1k^gf(

x

^;

y

^k) +^o(1)

= ^Z ^g^d^B^e¹+^X

x2X Pfxg

D f(^x)^g^;^a^x^Z ^g^d^B^e^x + (1^;)^X

x2X

PfxgIE1^f

y

⁶=^xgf(^x^;

y

)