Two M-Functionals of Scatter
Lutz Duembgen and David Tyler
University of Heidelberg and Rutgers University draft, September 1997
Abstract.
LetP be a probability distribution onR
q. A detailed study of the breakdown properties of two M-functionals of scatter,P 7! (P) andP 7!(P P), is given. Here () denotes Tyler's (1987) M-functional of scatter, taking values in the set of symmetric, positive denite qq matrices. It assumes zero as a given center of the underlying distri- butions. The second functional avoids this assumption by operating on the symmetrized distribution P P := L(x
;y
) with independent random vectorsx
y
P. Let P be smooth in the sense of assigning probability zero to hyperplanes. Then:(1) The (contamination) breakdown point ofP 7!(P) equals 1=q.
(2) The breakdown point ofP 7!(P P) equals 1;p1;1=q2 ]1=(2q)1=q.
(3) If we restrict attention to \tight contamination", then the breakdown point ofP 7!
(P P) equals p1=q >1=q.
In all three cases the sources of breakdown are investigated. It turns out that breakdown is only caused by rather special contaminating distributions that are concentrated near low-dimensional subspaces.
Keywords and phrases:
breakdown, coplanar contamination, tight contamination, M- functional, scatter matrix, symmetrizationResearch supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.
LetP Qbe nondegenerate probability distributions on
R
q. The covariance matrix of P,Z (x;(P))(x;(P))>P(dx) with (P) :=Z xP(dx)
is known to be very sensitive to small perturbations inP. Various robust surrogates for the covariance functional have been proposed. In the present paper we investigate the breakdown properties of two particular M-functionals of scatter.
The rst one is Tyler's (1987) M-functional (P), which is dened as follows: Let
M
be the set of symmetric matrices in
R
q q, and letM
+ be the set of all positive deniteM 2
M
. ForM 2M
+ letG(P M) := q
P(
R
qnf0g)Z
R q
nf0g M
;1=2
xx
>
M
;1=2
x
>
M
;1
x
P(dx)
which is a nonnegative denite matrix in
M
with trace q. If there is a unique matrixM 2
M
+withG(P M) = I and trace(M) = q
then we dene (P) :=M. Otherwise we dene arbitrarily (P) := 0. In what follows we utilize the following two properties of see Kent and Tyler (1988) and Duembgen (1996).
Proposition 1.1
Let V be the set of linear subspaces V ofR
q with 1 dim(V) < q. Suppose thatPf0g= 0. Then (P)2M
+ if, and only if,P(V) < dim(V)
q
for all V 2V: (1.1)
If G(P M) =I for some matrix M 2
M
+ but P(V)dim(V)=q for some space V 2V, then there is a second space W 2V such thatV \W =f0g andP(V W) = 1.Proposition 1.2
Suppose thatPf0g= 0 and (P)2M
+. Then (Q) ! (P) asQ!P :(1.2)
Here and throughout the sequel the space of probability measures on
R
q is equipped with the topology of weak convergence.The denition of (P) assumes zero as a given and known \center" ofP. The second M-functional investigated here is (P P), where generally
P Q := L(
x
;y
) with independent random vectorsx
Py
Q:This modied functional, proposed in Duembgen (1996), avoids assumptions on or esti- mation of location parameters.
A quantity describing the robustness of the functional P 7!(P) is its contamination breakdown point (cf. Huber 1981). This is dened to be the supremum (P) of all
2]01 such that
(Q)2
M
+ for all Q2U(P ) and supQ2U(P )
1((Q))
q((Q)) < 1:
Here U(P ) denotes the contamination neighborhood
U(P ) := n(1;)P +H :H some distribution on
R
qoof P, and 1(M) 2(M) q(M) denote the ordered eigenvalues ofM 2
M
. IfP is \smooth" in the sense that
P(H) = 0 for any hyperplane H
R
q(1.3)
then it turns out that
(P) = 1
q :
This result is known to several people, though it never appeared in a journal. Our purpose is not only to give a general expression for (P) and a precise proof but also to investigate the case = (P) in more detail. Namely, in this case it turns out that for any sequence of distributionsQk = (1;)P +Hk 2U(P ) with (Qk)2
M
+, the condition numbers ( 1= q)((Qk)) tend to innity if, and only if, the distributionsHk are concentrated near suitable linear subspaces ofR
q. In accordance with Tyler (1986) we call this \coplanar contamination".Analogous considerations are made for P 7! (P P). The breakdown point s(P) of this functional is dened as (P) with (QQ) in place of (Q). In case of smooth distributionsP,
s(P) = 1;
s
1;1
q 2
i 1 2q1
q h
:
In case of=s(P), the conditions on a sequence of distributions Qk = (1;)P +Hk 2
U(P ) in order to achieve unbounded condition numbers ( 1= q)((QkQk)) are even more restrictive. In particular, a necessary condition is that
j
y
kj !p 1 (k!1) ify
k Hk:This observation is important, because coplanar contamination \at innity" is easier to detect than arbitrary coplanar contamination. This leads to the question about breakdown caused by \tight" contamination. That means, we replace the neighborhood U(P ) with
U(P j) := n(1;)P +H :H some distribution on
R
qsuch that Hfx:jxj>r g (r) for allr>0o
where is some continuous function from 01] into 01] with (0) = 1 and (1) = 0.
In case ofP 7!(P) replacingU(P ) withU(P j) does not alter the breakdown point.
However, lets(Pj) be dened ass(P) withU(P j) in place ofU(P ). Then it turns out that for smoothP,
s(Pj) =
s1
q :
All proofs are deferred to Section 4.
2 The breakdown properties of P 7! (P)
Condition (1.1) is equivalent to(P)>0, where
(P) := min
V2V
dim(V)=q;P(V)+ 1;P(V) 2
h01
q i
with 0=0 := 0. That this minimum is well-dened follows from Lemma 4.1 in Section 4.
The set of all V 2 V such that dim(V)=q;P(V)]+=(1;P(V)) equals (P) is denoted
byV(P). Another useful abbreviation is
P := 12Lj
x
j;1x
x
6= 0+L;jx
j;1x
x
6= 0 withx
P :This is a symmetric distribution on the unit sphere
S
q ;1 ofR
q. Note that G(P )G(P ) and thus (P) = (P).
Theorem 2.1
Suppose that (P) 2M
+. Let P = Pf0g0+ (1;Pf0g)Po, where x denotes Dirac measure inx2R
q and Po is a distribution onR
qnf0g. Then(P) =
8
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
:
(1;Pf0g)(Po)
1;Pf0g(Po) in general
(P) if Pf0g= 0 1
q
if P is smooth in the sense of (1.3):
Suppose that = (P). For any Q = (1;)P +H in U(P ), one has (Q) = 0 if, and only if, Hf0g = 0 and H(V) = 1 for some V 2 V(Po). Moreover, for k 1 let
Q
k = (1;)P+Hk 2U(P ) such that (Qk)2
M
+. Then limk !1( 1= q)((Qk)) =1 if, and only if, the following two conditions are satised:lim
k !1 H
k
f0g = 0 (2.1)
any cluster point He of (Hk)k is supported by someV 2V(Po): (2.2)
An interesting fact is that the M-estimators introduced by Maronna (1976) have break- down point at most 1=(q+ 1) cf. Stahel (1981) or Tyler (1986).
Theorem 3.1
Suppose that (P P)2M
+. Then
s(P) =
8
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
:
1;
s 1;((P P))
1;P Pf0g((P P)) in general
1;q1;((P P)) if P has no atoms 1;
s
1;1
q
if P is smooth in the sense of (1.3): Suppose that = s(P). Then (QQ) 2
M
+ for any Q in U(P ). Moreover, fork 1 let Qk = (1;)P +Hk 2 U(P ). Then limk( 1= q)((Qk Qk)) = 1 if, and only if, the following three conditions are satised:
lim
k !1
max
x2R q
H
k
fxg = 0 (3.1)
j
y
kj !p 1 (k!1) wherey
k Hk (3.2)for any cluster point (He1He2) of (Hk(Hk Hk))
k
there is a space V 2V((P P)) such that He1(V) =He2(V) = 1: (3.3)
This theorem shows that symmetrization lowers the breakdown point of the M-func- tional. However, the type of contamination required in order to cause breakdown of the functional P 7! (P P) is far more special than in case of P 7! (P). The quantity
((P P)) dicult to compute. However, for V 2V,
(P P)(V) = P P(V);P Pf0g 1;P Pf0g
P P(V) max
x2R q
P(x+V) so that
((P P)) s(P) := min
x2R q
V2V
dim(V)=q;P(x+V)+ 1;P(x+V) : Here is the result on tight contamination mentioned in the introduction.
Theorem 3.2
Suppose that (P P)2M
+. Then
s(P j)
8
>
>
>
<
>
>
>
:
p
s(P) in general
=
s1
q
if P is smooth in the sense of (1.3):
Suppose thatP satises (1.3). Then (Q Q) = 0 forQ= (1; )P+H 2U(P j) if, and only if,H has no atoms andH is supported by some one-dimensional ane subspace of
R
q. Similarly, fork1 letQk = (1;)P+Hk 2U(P ) such that (QkQk)2M
+. Then limk !1( 1= q)((QkQk)) =1 if, and only if, the following two conditions are satised:lim
k !1
max
x2R q
H
k
fxg = 0 (3.4)
any cluster point of ((Hk Hk))k is supported by someV 2V with dim(V) = 1:
(3.5)
One can easily show that Condition (3.5) implies that any cluster point of (Hk)k is supported by some one-dimensional ane subspace of
R
q.4 Proofs
Lemma 4.1
For 0 d < q let V(d) be the set of all d-dimensional linear subspaces ofR
q. Then bothmax
V2V(d)
Q(V) and max
x2R q
V2V(d)
Q(x+V) are well-dened and upper semicontinuous inQ.
Proof of Lemma 4.1.
Let (Qk)k be any sequence of distributions converging weakly to someQ. Let Vk 2V(d) andxk 2R
q such that eitherx
k = 0 and Qk(Vk) > sup
V2V(d) Q
k(V);k;1 (4.1)
or
x
k 2 V
?
k and Qk(xk+Vk) > sup
x2R q
V2V(d) Q
k(x+V);k;1: (4.2)
Let Mk 2
M
describe the orthogonal projection fromR
q ontoVk. After replacing (Qk)k with a subsequence if necessary, one may assume that (Mk)k converges to some projection matrixM, and we dene V :=MR
q. Further one may assume thatlim
k !1 jx
k
j = 1 or lim
k !1 x
k = x2
R
q:Since xk +Vk fy :jyjjxkjg one easily deduces from limkQk =Q and limkjxkj=1 that limkQk(xk+Vk) = 0. If limkxk =x, then for anyR>0,
limsup
k !1 Q
k(xk+Vk) lim
k !1 Z
1;Rjy;Mky;xkj+Qk(dy)
= Z 1;Rjy;My;xj+Q(dy)
! Q(x+V) (R!1):
These considerations show that supV2V(d)Q(V) and supx2RqV2V(d)Q(x+V) are upper semicontinuous in Q. In the special case (Qk)k Q one realizes that both suprema are
attained. 2
Propositions 1.1 and 1.2 entail the following two facts:
Lemma 4.2 (a)
Let Q be a familiy of nondegenerate distributions onR
q such that (Q)2M
+ for all Q2Q and letfQ:Q2Qgbe closed. Thensup
Q2Q 1
q
((Q)) < 1:
(b)
Let (Qk)k be a sequence of nondegenerate distributions onR
q such that (Qk)2M
+ for allk and limk !1
Qk = Qe lim
k !1 1
q
((Qk)) = 211]:
If = 1, then Qe(V) dim(V)=q for some V 2 V. If < 1 but Qe(V) dim(V)=q for some space V 2V, then there is a second space W 2 V such thatV \W = f0g and
e
Q(V W) = 1.
Proof of Lemma 4.2.
As for part (a), Prohorov's Theorem implies thatfQ:Q2Qgis even compact. Since (Q) = (Q)2
M
+ for all Q2Q, Proposition 1.2 yields supQ2Q 1
q
((Q)) = max
Q2Q 1
q
((Q)) < 1:
In part (b) suppose rst that Qe(V)<dim(V)=q for all V 2 V. Then (Qe)2
M
+ by Proposition 1.1, and (Qe) = limk(Qk) by Proposition 1.2, whence= ( 1= q)((Qe))<1.
Now suppose that <1. After replacing (Qk)k with a subsequence if necessary, one may assume that limk(Qk) =M 2
M
+. But thenI = lim
k !1
G(Qk(Qk)) = G(Qe M)
becauseG((Qk)) converges uniformly toG(M) ask!1. Thus ifQe(V)dim(V)=q for some V 2 V, then the second part of Proposition 1.1 says that V \W = f0g and
e
Q(V W) = 1 for someW 2V. 2
Proof of Theorem 2.1.
Note rst that nQ : Q2 U(P )o is equal to the closed setn(1;o)P +oHe :He any symmetric distribution on
S
q ;1owhere
o := 1;(1;)Pf0g: For ifQ= (1;)P +H 2U(P ), then
Q = (1;)(1;Pf0g)P +(1;Hf0g)H
(1;)(1;Pf0g) +(1;Hf0g) = (1;0)P +0He for some symmetric distribution He on
S
q ;1 and
0 := (1;Hf0g)
(1;)(1;Pf0g) +(1;Hf0g) o: Further,
Q(V) (1;o)P(V) +o = (1;o)Po(V) +o
with equality if, and only if, Hf0g = 0 and H(V) = 1. This is strictly smaller than dim(V)=q if, and only if,
o
<
dim(V)=q;Po(V) 1;Po(V) :
Hence we can concude the following: If o < (Po) then () 2
M
+ on U(P ), and Lemma 4.2 (a) yields that ( 1= q)(()) is bounded on U(P ). If o = (Po), then (Q) = 0 for Q = (1;)P +H 2 U(P ) if, and only if, Hf0g= 0 and H(V) = 1 for some V 2 V(Po). Since o is strictly increasing in , inverting the equation o = (Po) yields(P) = (1;Pf0g)(Po) 1;Pf0g(Po) :
Let= andQk = (1;)P+Hk 2U(P ) as stated in the theorem. After replacing (Qk)k with a subsequence if necessary, one may assume that limkHkf0g = a 2 01], limkHk = He (where 0 may be dened arbitrarily) and limk( 1= q)((Qk)) = 2 11]. This implies that
lim
k !1
Qk = Qe := (1;)(1;Pf0g)P +(1;a)He (1;)(1;Pf0g) +(1;a) :
Since (P)2
M
+, P(V W)<1 for arbitraryVW 2V withV \W =f0g. The limit distribution Qe inherits this property. Thus one can apply Lemma 4.2 (b) and conclude that=1if, and only if,Qe(V)dim(V)=q for some V 2V. But for anyV 2V,e
Q(V) = (1;)(1;Pf0g)Po(V) +(1;a)He(V) (1;)(1;Pf0g) +(1;a) (1;)(1;Pf0g)Po(V) +(1;a)
(1;)(1;Pf0g) +(1;a) (1;)(1;Pf0g)Po(V) +
(1;)(1;Pf0g) +
= (1;o)Po(V) +o
(1;o)dim(V)=q;(Po) 1;(Po) +o
= dim(V)=q
with equality if, and only if,He(V) = 1,a= 0 and V 2V(Po). 2 The following preliminary result for the proof of Theorems 3.1 and 3.2 describes the possible limits of a sequence ((P Hk))k.
Proposition 4.3
Let (Hk)k 1 be a sequence of distributions onR
q. A pair (aBe) is cluster point for the sequence (P Hkf0g(P Hk))k 1
if, and only if, it can be represented as follows:
a = X
x2R q
Pfxga
x
and
e
B =
e
B
1+ X
x2R q
Pfxg
(1;)Hfxg;axBex+ (1;)(1;P Hf0g)(P H) 1; X
x2R q
Pfxga
x
with
:= lim
r !1
liminf
k !1 H
k
fx:jxj>r g some distribution H on
R
qnumbers ax20(1;)Hfxg] and
symmetric distributions Be1 and Bex on
S
q ;1:Proof of Proposition 4.3.
We compactifyR
q via the mappingx 7! (x) := (1 +jxj);1x 2 U(01)
where U(y) and B(y) denote, respectively, the open and closed ball around y 2
R
qwith radius 0. Without loss of generality one may assume that lim
k !1 H
k
;1 = D and = D(
S
q ;1):Even ifD is concentrated onU(01) the Continuous Mapping Theorem is not applicable to (P Hk), because the points in
X
:= nx 2R
q : D f(x)g > 0o require special attention. SinceD f(x)g = lim
#0
liminf
k !1 H
k
U(x) = lim
#0
limsup
k !1 H
k
B(x) for anyx2
X
and= lim
r #1
liminf
k !1 H
k(
R
qnB(0r)) = limr #1
limsup
k !1 H
k(
R
qnU(0r))one can nd numbers xk 0 and rk >0 such that with Uxk :=U(xxk) andU1k :=
R
qnB(0rk) the following requirements are met:lim
k !1
xk = 0 and lim
k !1 H
k U
xk = D f(x)g forx2
X
lim
k !1 r
k = 1 and lim
k !1 H
k U
1k =
U
xk
\U
y k = for dierent xy2
X
f1g:After replacing (Hk)k with a suitable subsequence if necessary, one may assume further that for anyx2
X
,lim
k !1 H
k
fxg = ax 20D f(x)g] lim
k !1
Lx;
y
ky
k 2Uxknfxg = Bex ify
k Hk:Since limkHkfxg= 0 whenever D f(x)g= 0, this implies that lim
k !1
P H
k
f0g = X
x2X Pfxga
x
(4.3) :
Further we write D =Be1+ (1;)H;1 with distributions Be1 on
S
q ;1 and H onR
q. Now letf(x) :=
(
g(jxj;1x) ifx6= 0 0 if x= 0
for some even, continuous function g on
S
q ;1, and letx
P,y
k Hk andy
H be independent. Then, ask!1,IEf(
x
;y
k) = IE 1fy
k 2U1kgf(x
;y
k) + IE1fy
k 62U1kgf(x
;y
k)= Z gdBe1+ IE1f
y
k 62U1kgf(x
;y
k) +o(1)= Z gdBe1+X
x2X
PfxgIE1f
y
k 2Uxknfxggf(x;y
k)+ X
x2X
PfxgIE1f
y
k 62UxkU1kgf(x;y
k) + IE 1fx
62X
y
k 62U1kgf(x
;y
k) +o(1)= Z gdBe1+X
x2X Pfxg
D f(x)g;axZ gdBex + (1;)X
x2X
PfxgIE1f