The Asymptotic Behavior of Tyler's M-Estimator of Scatter in High Dimension
Lutz Duembgen
December 1994, revised May 1997
Note.
This is an extended version of the paper \On Tyler's M-functional of scatter in high dimension" which has been tentatively accepted for publication in the Annals of the Institute of Statistical Mathematics. The present version contains some additional results and more detailed proofs.Abstract.
Lety
1y
2::: y
n2R
pbe independent, identically distributed random vectors with nonsingular covariance matrix , and letS
=S
(y
1::: y
n) be an estimator for . A quantity of particular interest is the condition number of ;1S
. If they
i are Gaussian andS
is the sample covariance matrix, the condition number of ;1S
, i.e. the ratio of its extreme eigenvalues, equals 1 +O
p((p=n
)1=2) asp
! 1 andp=n
! 0. The present paper shows that the same result can be achieved with estimators based on Tyler's (1987) M-functional of scatter, assuming only elliptical symmetry of L(y
i) or less. The main tool is a linear expansion for this M-functional which holds uniformly in the dimensionp
. As a by-product we obtain continuous Frechet-dierentiability with respect to weak convergence.Keywords and phrases:
dierentiability, dimensional asymptotics, elliptical symmetry, M-functional, scatter matrix, symmetrizationAuthor's address:
Institut fur Angewandte Mathematik, Universitat Heidelberg, Im Neuenheimer Feld 294, D-69120 Heidelberg, Germany lutz@statlab.uni-heidelberg.deResearch supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.
1 Introduction
It has been noted by numerous authors that asymptotic results, where the dimension of the underlying model is xed while the number of observations tends to innity, are often inappropriate for real applications e.g. Portnoy (1988) or Girko (1995). In particular, the literature on M-estimation in linear regression models with increasing dimension is vast and still growing see for instance Huber (1981), Portnoy (1984, 1985), Bai and Wu (1994 a-b), Mammen (1996) and the references cited therein. In the present paper we investigate the related problem of M-estimation of a high-dimensional covariance matrix.
Let
P
bn be the empirical distribution of independent random vectorsy
n =y
n1,y
n2, ...,y
nninR
pwith unknown distributionP
n, and letS
n=S
n(P
bn) be an estimator for the covariance matrix n ofP
n, both assumed to be positive denite. Of particular interest is the condition number of ;1nS
n, (;1nS
n) := 1(;1nS
n) p(;1nS
n)where
1(A
) 2(A
) 3(A
) denote the ordered real eigenvalues ofA
2R
pp. For there are explicit bounds for various scale-invariant functions ofS
n and n such as correlations, partial and canonical correlations, regression coecients or eigenspaces, all in terms of (;1nS
n) (cf. Dumbgen 1994). An example are the following sharp inequalities for correlations, where ()0denotes transposition:
arctanh
x
0ny
p
x
0nxy
0ny
;arctanh
x
0S
ny
p
x
0S
nxy
0S
ny
log
(;1nS
n) 2for arbitrary
x y
2R
pnf0g. Therefore it is of interest to study the probabilistic behavior of (;1nS
n). IfP
n is multivariate normal andS
n is the sample covariance matrix, a modication of Silverstein's (1985) arguments reveals that (;1nS
n) = 1 + 4(p=n
)1=2+o
p((p=n
)1=2) (1.1)see also the proof of Theorem 5.4. In connection with
P
n,P
bn we assume tacitly that the dimensionp
=p
n may depend on the sample sizen
such thatp=n
! 0. Asymptotic statements refer ton
!1. Expansions such as (1.1) hold under more general assumptionson the distribution
P
n, provided that it has suciently light tails (cf. Girko 1995). On the other hand, the distribution of the extremal eigenvalues of the sample covariance matrix is very sensitive to deviations from normality so that even the weaker assertion (;1nS
n) = 1 +O
p((p=n
)1=2) (1.2)may be false, even under elliptical symmetry of
P
n. It is thus desirable to have an esti- mator, whose distribution is less model-dependent, such that expansion (1.1) or at least (1.2) holds.A possible alternative to the sample covariance matrix are M-estimators of scatter as proposed by Maronna (1976) and Tyler (1987). The present paper focuses on two estimators related to Tyler's (1987) M-functional. The latter is dened in Section 2 as a matrix-valued function
Q
7! (Q
) on the space of probability measures onR
p nf0g. Section 3 provides a basic linear expansion for () with a rather explicit bound for the remainder term. As a by-product we obtain continuous Frechet-dierentiability of () with respect to the weak topology on the space of probability measures onR
pnf0g.Section 4 describes estimators based on (). One obvious choice is the M-estimator (
P
bn), which is distribution-free ifP
is elliptically symmetric around zero. In addition we propose the estimator (P
bns), whereP
bns is a symmetrization ofP
bn. This is an intu- itively appealing method to get rid of unknown location parameters. The linear expansion of Section 3 implies asymptotic normality of both estimators and consistency of certain bootstrap methods. Some of these results and conclusions are not entirely new but nev- ertheless stated explicitly for the reader's convenience. In connection with the bootstrap we use similar arguments as Bickel and Freedman (1981).In order to prove Frechet-dierentiability for xed dimension
p
, one could also apply general methods of Clarke (1983). An advantage of our explicit expansion is that it enables us to investigate the asymptotic behavior of (P
bn) and (P
bns) asp
=p
n ! 1. This is done in Section 5. Under certain regularity assumptions assertion (1.2) is valid for both estimators (P
bn) and (P
bns). In particular, ifP
is elliptically symmetric, (P
bn) is shown to have the same asymptotic behavior as the sample covariance matrix in the Gaussianmodel, including expansion (1.1).
Another approach to the problem of unknown location, pursued by Tyler (1987), is to re-center
P
bn around an estimator bn = bn(P
bn) forP
's center. Section 6 contains some additional results on this method, also in view of dimensional asymptotics.All proofs are deferred to Section 7.
2 Denition and basic properties of the M-functional
( )Let us rst introduce some notation. Throughout the set of symmetric matrices in
R
ppis denoted by
M
, whileM
+ denotes the set of positive deniteM
2M
. ForM
2M
+ the unique matrixN
2M
+ withNN
=M
is denoted byM
1=2, andM
;1=2 := (M
;1)1=2 = (M
1=2);1. Further we consider the following ane subspaces ofM
, whereI
stands for the identity matrix inR
pp:M
(0) := nM
2M
: trace(M
) = 0oM
(p
) := nM
2M
: trace(M
) =p
o =I
+M
(0):
Let
f
be a real or vector-valued function onR
p, and let be a signed measure onR
p. Thenf
() stands for Rf
(x
)(dx
). This convention will be particularly convenient for functions of several arguments. Further, forA
2R
pp we denote byA
the transformed signed measureA
;1.Througout let
P
andQ
be probability distributions onR
p nf0g. We regardQ
as rotationally symmetric around zero in a weak sense ifG
(Q
) =RG
(x
)Q
(dx
) is equal toI
, whereG
(x
) :=(
p
jx
j;2xx
0 2M
(p
) ifx
6= 00 else
here j
x
j denotes the standard Euclidean norm (x
0x
)1=2 ofx
. Note thatG
(Q
) equalsp
times the matrix of second moments ofjz
j;1z
, wherez
Q
. IfQ
is spherically symmetric around zero, one easily veries that in factG
(Q
) =I
. More generally, this equality holds if the vectorsz
= (z
i)1ip and (iz
(i))1ip have the same distribution for arbitrary 2 f;1 1gp and permutations of f1 2::: p
g. In general one tries to ndM
2M
+such that
G
(M
;1=2Q
) =p
ZM
;1=2xx
0M
;1=2x
0M
;1x Q
(dx
) =I:
Note that
G
(M
;1=2Q
) =G
((sM
);1=2Q
) for alls >
0, so thatG
() is only useful in connection with scale-invariant functions onM
+ such as correlations.Denition.
If the equalityG
(M
;1=2Q
) =I
has a unique solutionM
inM
+(p
) :=M
+\M
(p
), this matrixM
is denoted by (Q
). Otherwise we dene arbitrarily (Q
) := 0.An important property of
G
() and () is linear equivariance. For nonsingularA
2R
pp andM
2M
+ one easily veries thatG
(AMA
0);1=2AQ
=TG
(M
;1=2Q
)T
0 (2.1)where
T
:= (AMA
0);1=2AM
1=2 is orthonormal:
ThusG
(M
;1=2Q
) =I
if, and only if,G
(AMA
0);1=2AQ
=I
. Hence(
AQ
) =rA
(Q
)A
0 withr
:=p=
trace(A
(Q
)A
0):
(2.2)Necessary and sucient conditions for (
Q
)2M
+ are as follows.Theorem 2.1
LetV be the set of proper linear subspacesV
ofR
p, i.e. 1dim(V
)< p
.a]
IfG
(M
;1=2Q
) =I
for someM
2M
+, thenQ
(V
) dim(V
)=p
for allV
2V: b]
Suppose thatQ
(V
)<
dim(V
)=p
for allV
2V:
(2.3)Then there exists a unique
M
2M
+(p
) such thatG
(M
;1=2Q
) =I
.c]
Suppose thatG
(Q
) =I
butQ
(V
) = dim(V
)=p
for someV
2V. ThenQ
(V
V
?) = 1 andG
(a
+b
(I
; ));1=2Q
=I
for alla b >
0where 2
M
describes the orthogonal projection fromR
p ontoV
, andV
? stands for the orthogonal complement ofV
.Parts !a, b] are due to Tyler (1987) and Kent and Tyler (1988). Their proofs are formulated for empirical distributions
Q
, but extension to arbitary distributions is mainly straightforward, requiring only notational changes. The only exception is the existence statement in part !b]. Two possible proofs are given in Section 7. Part !c], combined with (2.1), supplements part !b] in that condition (2.3) is even necessary for (Q
)2M
+. This will be needed in the proof of Theorem 3.2 below.3 Dierentiability of
( )For
M
2M
we dene kM
k := j1(M
)j_jp(M
)j. Since the dimensionp
may vary, this particular choice of a norm is important. It is particularly useful in connection with eigenvalues, because ji(A
);i(B
)j kA
;B
k forA B
2M
and 1i
p
. By way of contrast, for growing dimensionp
expansions involving the Euclidean norm kM
kE = trace(M
2)1=2 would be of little use. This is one reason why the results of Portnoy (1988) cannot be applied here without unnecessary restrictions onp
. Generally, we always use the normk
L
k := maxy2S(B) k
Ly
kof a linear operator
L
from a normed vector space (B
kk) into another normed space, whereS
(B
) denotes the unit sphere fy
2B
:ky
k= 1g.Now we investigate (
Q
) ifQ
is close toP
in a certain sense andG
(P
) =I
. By equivariance ofG
() and () it suces to consider the latter case.The function
G
(M
;1=2x
) is dierentiable with respect toM
2M
+ withD
(x B
) :=@
@t
t=0
G
(I
+tB
);1=2x
=F
(x B
);2;1BG
(x
) +G
(x
)B
F
(x B
) :=(
j
x
j;2x
0BxG
(x
) =p
jx
j;4x
0Bxxx
0 ifx
6= 00 else
:
Note that
D
(x I
) = 0 and trace(D
(x B
)) = 0 for allB
2M
. The next lemma shows that condition (2.3) is closely related to the operatorD
(Q
).Lemma 3.1
The operatorD
(Q
) is nonsingular onM
(0) if, and only if,Q
(V
V
?)<
1for arbitrary
V
2V. In that case,trace(
D
(Q B
)B
)<
0 for allB
2M
(0)nf0g:
The inverse operator of
D
(P
) :M
(0)!M
(0), if existent, is denoted byD
;1(P
).Here is our basic linear expansion for ().
Theorem 3.2
For anyb <
1there exist constants(b
)<
1and(b
)>
0 (not depending onp
orP
) such that
(
Q
);I
+D
;1(P G
(Q
;P
)) (b
)kF
(Q
;P
)kkG
(Q
;P
)k whenever(
P
) =I
kD
;1(P
)kb
and kF
(Q
;P
)k(b
):
The latter two norms kk refer to the linear operators
D
;1(P
) :M
(0)!M
(0) andF
(Q
;P
) :M
!M
. Note also thatkG
(Q
;P
)k=kF
(Q
;P I
)kkF
(Q
;P
)k.Theorem 3.2, Lemma 3.1 and (2.2) together imply that () is Frechet-dierentiable with respect to the weak topology. The reason is that
x
7!F
(x
) is a bounded, continuous mapping fromR
pnf0g into the nite-dimensional space of linear operatorsL
:M
!M
, so thatkF
(Q
;P
)k!0 asQ
!P
weakly.Corollary 3.3
Suppose that (P
) =I
. Then, asQ
!P
weakly,G
(Q
) !I
and (Q
);I
= ;D
;1P G
(Q
);I
+o
kG
(Q
);I
k:
2 One can even show that () is continuously Frechet-dierentiable. Instead of pursuing this issue, we shall prove a related statement about limiting distributions of (P
bn) and (P
bns) in the next section.4 Related estimators and their properties in xed dimen- sion
At this point it is convenient to dene (
Q
e) :=Q
eR
pnf0g for any probability measureQ
e onR
p withQ
ef0g<
1.Suppose rst that the distribution
P
n has a known \center" n 2R
p. Without loss of generality one may assume that n = 0. Then a straightforward estimator for (P
n) is given by (P
bn). An important example are elliptically symmetric distributionsP
n = L(R
n1n=2u
), whereR
n>
0 andu
are stochastically independent,u
is uniformly distributed on the unit sphere ofR
p, and n 2M
+(p
). Clearly (P
n) = n, and the empirical distributionP
bn satises condition (2.3) almost surely ifn > p
. Moreover, the distribution of(;1n (P
bn)) depends neither on n nor on L(R
n) (cf. Tyler 1987).The center
n, no matter how it is dened, is rarely known in advance. In order to avoid denition and estimation of an unknown location parameter one can also consider the functionalQ
7!(Q
s) with the symmetrized distributionQ
s := Lz
1;z
2z
16=z
2 where (z
1z
2)Q
Q:
Here 12 denotes the product measure on
R
pR
p of (signed) measures 1 2onR
p. One motivation for the functionalQ
7!(Q
s) is the representation 2;1IE(z
1;z
2)(z
1;z
2)0 of the covariance matrix ofQ
. Moreover, ifz
Q
has independent, identically distributed components, thenG
(Q
s) =I
, whereasG
(Q
) may be dierent fromI
. Thus symmetrization partly corrects a possible deciency of M-estimators.One easily veries that
Q
7!(Q
s) is anely invariant in thatA
(Q
s)A
0 =r
(+AQ
)s withr
:= trace(A
0A
(Q
s))=p
(4.1)for any nonsingular
A
2R
pp and 2R
p, where +AQ
:=L(+A z
),z
Q
. IfQ
is elliptically symmetric around with scatter matrix o 2M
+(p
), thenQ
s is elliptically symmetric around zero with the same scatter matrix o.An application of Theorem 3.2 utilizing the explicit error bound is the following Central Limit Theorem for the distribution of (
P
bn) and (P
bns).Corollary 4.1
Suppose thatP
n converges weakly to some distributionP
onR
p.a]
LetP
f0g= 0 and (P
) =I
. LetL
n(jP
n) denote the distribution ofn
1=2(P
n);1=2P
bn
;
I
(provided that (
P
n)2M
+). Then (P
n)!I
andL
n(jP
n) !w L(W
)where
W
2M
(0) is a random matrix with centered Gaussian distribution and the same covariance function asD
;1(P G
(y
);I
),y
P
.b]
LetP
fg= 0 for all 2R
p and (P
s) =I
. LetL
ns(jP
n) denote the distribution ofn
1=2(P
ns);1=2P
bns;I
(provided that (P
ns) 2M
+). Then (P
ns)!I
andL
ns(jP
n) !w L(W
s)where
W
s2M
(0) is a random matrix with centered Gaussian distribution and the same covariance function as 2D
;1(P
sG
e(y P
);I
),y
P
. HereG
e(x y
) :=G
(x
;y
).Remark 4.2
The covariance function of a random matrixW
2M
(0) is dened as the function (A B
)7!Covtrace(WA
) trace(WB
)onM
(0)M
(0).Remark 4.3
In case ofP
being spherically symmetric around zero one can deduce from equations (7.11) and (7.12) in Lemma 7.5 thatIEtrace(
WA
)trace(WB
) = 2(1 + 2=p
)trace(AB
) forA B
2M
(0):
Remark 4.4
IfP
n!P
weakly, then the emprirical distributionP
bn converges weakly toP
in probability. More precisely,d
w(P
bnP
) converges to zero in probability, whered
w( ) metrizes weak convergence of probability measures onR
p. Consequently, the bootstrap distributionsL
n(jP
bn) andL
ns(jP
bn) are consistent estimators ofL
n(jP
n) andL
ns(jP
n), respectively.Remark 4.5
Utilizing the equivariance properties of (), (2.2) and (4.1), one can deduce from Corollary 4.1 thatn
1=2(P
n);1(P
bn);1 !L (1;p)(W
o) in part !a]n
1=2(P
ns);1(P
bns);1 !L (1;p)(W
os) in part !b]:
5 Asymptotic behavior of
(Pcn)and
(Pcns)in high dimension
Now we consider the case where
p
=p
n ! 1 butp=n
! 0:
For the sake of simplicity it is assumed thatP
n has no atoms.Theorem 5.1
Suppose that (P
n) =I
for alln
. Let 2n := maxu2S(R p
) Z
(
u
0G
(y
)u
)2P
n(dy
) =O
(1) n2 := B max2S(M(0)) Z
y
0By y
0y
2
P
n(dy
) =o
(1):
Further letp
=O
(n
1=2). ThenIEk
G
(P
bn);I
k =o
(1) and IEk(P
bn);G
(P
bn)k =o
IEkG
(P
bn);I
k:
If in additionp
=O
(n
1=3), thenIEk
G
(P
bn);I
k =O
((p=n
)1=2):
Remark 5.2
Suppose thaty
n= (y
ni)1ipP
nhas independent, identically distributed components with continuous, symmetric distribution such that IE(y
n21) = 1 and IE(y
4n1) =O
(1). Then 2n =O
(1) and n2 =O
(p
;1). For it follows from the one-sided version of Bennett's (1962) inequality that IPfjy
nj2=p
1=
2g exp(;a
np
) for some numbera
ndepending on the fourth moment of
y
n1 andp
such that liminfn!1a
n>
0. Therefore, since (u
0G
(y
)u
)2p
2 and (y
0By
)2=
(y
0y
)21, one may replace these integrands of2nand n2 with 4(u
0y
)4 and 4p
;2(y
0By
)2, respectively. Then the assertion follows from tedious but elementary moment calculations.Remark 5.3
The conclusions of Theorem 5.1 and Remark 5.2 remain valid if (P
nP
bn) is replaced with (P
nsP
bns), where the symmetry condition in Remark 5.2 becomes super#uous.For the proof of Theorem 5.1 consists essentially of bounding IEk
F
(P
bn;P
n )k2 and IEkG
(P
bn;P
n)k2. ButF
(P
bnsB
) can be written as a matrix-valued U-statisticn
2
;1
X
1i<jn
F
(y
ni;y
njB
):
Let $
P
ns be the empirical distribution ofy
ns1y
ns2::: y
nms , wherem
=m
n := bn=
2c andy
nis :=y
n2i;1;y
n2i. Then a simple convexity argument due to Hoeding (1963) yields IEkG
(P
bns;P
ns)k2 IEkG
( $P
ns;P
ns)k2IEk
F
(P
bns;P
ns )k2 IEkF
( $P
ns;P
ns )k2 (5.1)see also equation (7.20) in Section 7. Now the signed measure $
P
ns;P
ns can be handled analogously asP
bn;P
n.Under spherical symmetry of
P
n, restrictions onp
beyondp
=o
(n
) are super#uous, and one can obtain rather precise expansions.Theorem 5.4
Suppose thatP
n is spherically symmetric around zero for alln
.a]
ThenIEk
G
(P
bn);I
k =O
((p=n
)1=2) IE(P
bn);I
;(1 + 2=p
)(G
(P
bn);I
) =O
log(n=p
)p=n
:
Moreover, one can couple (
P
bn) with a standard Wishart matrixM
n2M
withn
degrees of freedom such thatIEk(
P
bn);n
;1M
nk =o
((p=n
)1=2):
In particular,((P
bn)) = 1 + 4(p=n
)1=2+o
p((p=n
)1=2).b]
As forP
bns,IEk(
P
bns);I
k =O
((p=n
)1=2) IE(P
bns);I
;n
;1Pni=1H
n(jy
nij)(G
(y
ni);I
) =o
((p=n
)1=2)where
H
nis an increasing function from !0 1! into !0 2!. If in additionjy
nj2=p
converges in probability to a constanto>
0, thenIEk(
P
bns);(P
bn)k =o
((p=n
)1=2):
6 The impact of plugging in estimates of location
For
2R
p letQ
() := Lz
;z
6= wherez
Q:
If
P
f0g= 0, one can easily show thatQ
() converges weakly toP
asQ
!wP
and!0.Thus Corollary 3.3 implies that
(
Q
()) ! (P
) asQ
!wP
and!0 (6.1)whenever (
P
) 2M
+. The following two results show that under moderate moment assumptions onP
the dierence (Q
());(Q
) can be expanded explicitly, extending results of Tyler (1987, Section 4).Theorem 6.1
Suppose thatP
f0g= 0, (P
) =I
and R jx
j;1P
(dx
)<
1. DeneH
(x
) :=p
jx
j;2(x
0+x
0);2jx
j;2x
0G
(x
):
Then
(
Q
());I
= ;D
;1P G
(Q
);I
+H
(P
)+o
kG
(P
;Q
)k+jj asQ
!wP
Z jx
j;1Q
(dx
) ! Z jx
j;1P
(dx
) ! 0:
Note that
H
( ) is an odd function. Thus the bias termH
(P
) equals zero ifP
is symmetric in thatP
(S
) =P
(;S
) for all Borel setsS
R
p. As for the moment condition, note that R jx
j;rP
(dx
)<
1 ifr < p
andP
has a bounded density with respect to Lebesgue measure.Theorem 6.2
Suppose thatp
=p
n!1 andp=n
!0. Let (P
n) =I
andP
n(fg) = 0 for arbitrary2R
p and alln
. Moreover, letp
Z jx
j;2P
n(dx
) =O
(1) and suppose that either 2n =O
(1) 2n =o
(1) andp
=O
(n
1=3) (cf. Theorem 5.1), or,P
n is spherically symmetric around zero for alln:
Then k
H
(P
n )k=O
(1) and for any sequence of positive numbersn=o
((p=n
)1=4), supjjn
(
P
bn());I
+D
;1P
nG
(P
bn);I
+H
(P
n ) =o
p((p=n
)1=2):
Since R j
x
j;2Np(0I
)(dx
) = (p
;2);1, the rst moment condition is satised for mix- turesP
n = Z Np(0 2I
)n(d
)provided thatR
;2n(d
) =O
(1). Ifbn =bn(P
bn) is an estimator such that bn =O
p((p=n
)1=2)(6.2)
then under the assumptions of Theorem 6.2,
(
P
bn(bn));I
=O
p((p=n
)1=2)
(
P
bn(bn));(P
bn) +D
;1(P
nH
(P
n bn)) =o
p((p=n
)1=2):
In case of
p
;1R jx
j2P
n(dx
) =O
(1), the sample mean bn = Rx P
bn(dx
) satises condi- tion (6.2). Alternatively consider Tukey's median bn = argmax2R p
u2S(Rinf p)
P
bnn
x
2R
p :x
0u
0u
o:
Here bn=O
p((p=n
)1=2), provided thatliminfn
!1
u2S(Rinf p)
;1nP
nn
x
2R
p :u
0x
no
;1
=
2>
0 whenevern#0:
(6.3)This follows straightforwardly from the fact that
IE sup
halfspacesHRp(
P
bn(H
);P
n(H
))2cp=n
for some universal constant
c
. This is a consequence of Alexander (1984, Corollary 2.9) see also Pollard (1990, Sections 1-4) for techniques to prove it. IfP
nis a mixture of normal distributions as above, condition (6.3) is satised ifliminfn
!1
n(!0r
])>
0 for somer <
1:
7 Proofs
7.1 Proofs for Section 2
Proof of Theorem 2.1 a, c]:
LetG
(Q
) =I
, and letV
2 V with corresponding projection matrix 2M
. Thendim(
V
) = trace( ) =p
Z jx
j;2x
0xQ
(dx
)pQ
(V
)with equality if, and only if,
Q
(V
V
?) = 1. In this caseG
(a
+b
(I
; ));1=2Q
equalsG
(Q
) =I
for alla b >
0, becauseG
(a
+b
(I
; ));1=2x
=G
(x
) for any nonzerox
2V
V
?. Note that (a
+b
(I
; ))=a
+b
(I
; ) for any real . 2First proof of the existence statement in Theorem 2.1 b].
The arguments of Kent and Tyler (1988) can be modied as follows. Without loss of generality letQ
be supported by the unit sphereS
(R
p). Any local maximumA
2M
+(p
) of the functional`
(A
) := logdetA
;p
Z log(x
0Ax
)Q
(dx
) satisesG
(A
1=2Q
) =I
, because@t @
t=0
`
(A
+t
) = trace(A
;1=2A
;1=2);p
Zx
0x x
0Ax Q
(dx
)= trace
A
;1=2A
;1=2(I
;G
(A
1=2Q
))for arbitrary 2
M
. Existence of such a local maximum is guaranteed if we can show that limk!1`
(A
k) =;1 for any sequence (A
k)k inM
+(p
) with limitA
2M
(p
)nM
+.For that purpose assume without loss of generality that
A
k =Ppi=1i(A
k)kiki0 with an orthonormal matrix (k1 k2:::
kp) converging to (1 2:::
p) ask
!1. For xed>
0 and 1j
p
deneS
j := nx
2S
(R
p) : Xpi=j(
i0x
)2>
1;2o andD
j :=S
jnS
j+1where
S
p+1 :=. Note thatS
j is just the intersection of the unit sphereS
(R
p) with the open-neighborhood of the space span(j:::
p). Sinceliminfk
!1
x2S(Rminp)nSj +1
x
0A
kx
j(A
k) liminfk!1
x2S(Rminp)nSj +1 j
X
i=1(
ki0