The Asymptotic Behavior of Tyler's M-Estimator of Scatter in High Dimension

(1)

The Asymptotic Behavior of Tyler's M-Estimator of Scatter in High Dimension

Lutz Duembgen

December 1994, revised May 1997

Note.

This is an extended version of the paper \On Tyler's M-functional of scatter in high dimension" which has been tentatively accepted for publication in the Annals of the Institute of Statistical Mathematics. The present version contains some additional results and more detailed proofs.

Abstract.

Let

y

¹

y

²

::: y

n²

R

^pbe independent, identically distributed random vectors with nonsingular covariance matrix , and let

S

=

S

(

y

¹

::: y

n) be an estimator for . A quantity of particular interest is the condition number of ^;1

S

. If the

y

i are Gaussian and

S

is the sample covariance matrix, the condition number of ^;1

S

, i.e. the ratio of its extreme eigenvalues, equals 1 +

O

^p((

p=n

)¹⁼²) as

p

^! ¹ and

p=n

^! 0. The present paper shows that the same result can be achieved with estimators based on Tyler's (1987) M-functional of scatter, assuming only elliptical symmetry of ^L(

y

i) or less. The main tool is a linear expansion for this M-functional which holds uniformly in the dimension

p

. As a by-product we obtain continuous Frechet-dierentiability with respect to weak convergence.

Keywords and phrases:

dierentiability, dimensional asymptotics, elliptical symmetry, M-functional, scatter matrix, symmetrization

Author's address:

Institut fur Angewandte Mathematik, Universitat Heidelberg, Im Neuenheimer Feld 294, D-69120 Heidelberg, Germany lutz@statlab.uni-heidelberg.de

Research supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.

(2)

1 Introduction

It has been noted by numerous authors that asymptotic results, where the dimension of the underlying model is xed while the number of observations tends to innity, are often inappropriate for real applications e.g. Portnoy (1988) or Girko (1995). In particular, the literature on M-estimation in linear regression models with increasing dimension is vast and still growing see for instance Huber (1981), Portnoy (1984, 1985), Bai and Wu (1994 a-b), Mammen (1996) and the references cited therein. In the present paper we investigate the related problem of M-estimation of a high-dimensional covariance matrix.

Let

P

^bn be the empirical distribution of independent random vectors

y

n =

y

n¹,

y

n², ...,

y

nnin

R

^pwith unknown distribution

P

n, and let

S

n=

S

n(

P

^bn) be an estimator for the covariance matrix n of

P

n, both assumed to be positive denite. Of particular interest is the condition number of ^;1_n

S

n,

(^;1_n

S

n) :=

¹(^;1_n

S

n)

p(^;1n

S

n)

where

¹(

A

)

²(

A

)

³(

A

) denote the ordered real eigenvalues of

A

²

R

^p^p. For there are explicit bounds for various scale-invariant functions of

S

n and n such as correlations, partial and canonical correlations, regression coecients or eigenspaces, all in terms of

(^;1_n

S

n) (cf. Dumbgen 1994). An example are the following sharp inequalities for correlations, where ()⁰denotes transposition:

arctanh

x

⁰n

y

p

x

⁰n

xy

⁰n

y

;arctanh

x

⁰

S

n

y

p

x

⁰

S

n

xy

⁰

S

n

y

log

(^;1_n

S

n) 2

for arbitrary

x y

²

R

^pⁿ^f0^g. Therefore it is of interest to study the probabilistic behavior of

(^;1_n

S

n). If

P

n is multivariate normal and

S

n is the sample covariance matrix, a modication of Silverstein's (1985) arguments reveals that

(^;1_n

S

n) = 1 + 4(

p=n

)¹⁼²+

o

^p((

p=n

)¹⁼²) (1.1)

see also the proof of Theorem 5.4. In connection with

P

n,

P

^bn we assume tacitly that the dimension

p

=

p

n may depend on the sample size

n

such that

p=n

^! 0. Asymptotic statements refer to

n

^!¹. Expansions such as (1.1) hold under more general assumptions

(3)

on the distribution

P

n, provided that it has suciently light tails (cf. Girko 1995). On the other hand, the distribution of the extremal eigenvalues of the sample covariance matrix is very sensitive to deviations from normality so that even the weaker assertion

(^;1_n

S

n) = 1 +

O

^p((

p=n

)¹⁼²) (1.2)

may be false, even under elliptical symmetry of

P

n. It is thus desirable to have an estimator, whose distribution is less model-dependent, such that expansion (1.1) or at least (1.2) holds.

A possible alternative to the sample covariance matrix are M-estimators of scatter as proposed by Maronna (1976) and Tyler (1987). The present paper focuses on two estimators related to Tyler's (1987) M-functional. The latter is dened in Section 2 as a matrix-valued function

Q

^7! (

Q

) on the space of probability measures on

R

^p ⁿ^f0^g. Section 3 provides a basic linear expansion for () with a rather explicit bound for the remainder term. As a by-product we obtain continuous Frechet-dierentiability of () with respect to the weak topology on the space of probability measures on

R

^pⁿ^f0^g.

Section 4 describes estimators based on (). One obvious choice is the M-estimator (

P

^bn), which is distribution-free if

P

is elliptically symmetric around zero. In addition we propose the estimator (

P

^b_n^s), where

P

^b_n^s is a symmetrization of

P

^bn. This is an intu- itively appealing method to get rid of unknown location parameters. The linear expansion of Section 3 implies asymptotic normality of both estimators and consistency of certain bootstrap methods. Some of these results and conclusions are not entirely new but nev- ertheless stated explicitly for the reader's convenience. In connection with the bootstrap we use similar arguments as Bickel and Freedman (1981).

In order to prove Frechet-dierentiability for xed dimension

p

, one could also apply general methods of Clarke (1983). An advantage of our explicit expansion is that it enables us to investigate the asymptotic behavior of (

P

^bn) and (

P

^b_n^s) as

p

=

p

n ^! ¹. This is done in Section 5. Under certain regularity assumptions assertion (1.2) is valid for both estimators (

P

^bn) and (

P

^b_n^s). In particular, if

P

is elliptically symmetric, (

P

^bn) is shown to have the same asymptotic behavior as the sample covariance matrix in the Gaussian

(4)

model, including expansion (1.1).

Another approach to the problem of unknown location, pursued by Tyler (1987), is to re-center

P

^bn around an estimator

^bn =

^bn(

P

^bn) for

P

's center. Section 6 contains some additional results on this method, also in view of dimensional asymptotics.

All proofs are deferred to Section 7.

2 Denition and basic properties of the M-functional

⁽ ⁾

Let us rst introduce some notation. Throughout the set of symmetric matrices in

R

^p^p

is denoted by

M

, while

M

⁺ denotes the set of positive denite

M

²

M

. For

M

²

M

⁺ the unique matrix

N

²

M

⁺ with

NN

=

M

is denoted by

M

¹⁼², and

M

^;1⁼² := (

M

^;1)¹⁼² = (

M

¹⁼²)^;1. Further we consider the following ane subspaces of

M

, where

I

stands for the identity matrix in

R

^p^p:

M

(0) := ⁿ

M

²

M

: trace(

M

) = 0^o

M

(

p

) := ⁿ

M

²

M

: trace(

M

) =

p

^o =

I

+

M

(0)

:

Let

f

be a real or vector-valued function on

R

^p, and let be a signed measure on

R

^p. Then

f

() stands for ^R

f

(

x

)(

dx

). This convention will be particularly convenient for functions of several arguments. Further, for

A

²

R

^p^p we denote by

A

the transformed signed measure

A

^;1.

Througout let

P

and

Q

be probability distributions on

R

^p ⁿ^f0^g. We regard

Q

as rotationally symmetric around zero in a weak sense if

G

(

Q

) =^R

G

(

x

)

Q

(

dx

) is equal to

I

, where

G

(

x

) :=

(

p

^j

x

^j^;2

xx

⁰ ²

M

(

p

) if

x

⁶= 0

0 else

here ^j

x

^j denotes the standard Euclidean norm (

x

⁰

x

)¹⁼² of

x

. Note that

G

(

Q

) equals

p

times the matrix of second moments of^j

z

^j^;1

z

, where

z

Q

. If

Q

is spherically symmetric around zero, one easily veries that in fact

G

(

Q

) =

I

. More generally, this equality holds if the vectors

z

= (

z

i)¹ip and (

i

z

⁽i⁾)¹ip have the same distribution for arbitrary

² ^f;1 1^g^p and permutations

of ^f1 2

::: p

^g. In general one tries to nd

M

²

M

⁺

(5)

such that

G

(

M

^;1⁼²

Q

) =

p

^Z

M

^;1⁼²

xx

⁰

M

^;1⁼²

x

⁰

M

^;1

x Q

(

dx

) =

I:

Note that

G

(

M

^;1⁼²

Q

) =

G

((

sM

)^;1⁼²

Q

) for all

s >

0, so that

G

() is only useful in connection with scale-invariant functions on

M

⁺ such as correlations.

Denition.

If the equality

G

(

M

^;1⁼²

Q

) =

I

has a unique solution

M

in

M

⁺(

p

) :=

M

⁺^\

M

(

p

), this matrix

M

is denoted by (

Q

). Otherwise we dene arbitrarily (

Q

) := 0.

An important property of

G

() and () is linear equivariance. For nonsingular

A

²

R

^p^p and

M

²

M

⁺ one easily veries that

G

(

AMA

⁰)^;1⁼²

AQ

=

TG

(

M

^;1⁼²

Q

)

T

⁰ (2.1)

where

T

:= (

AMA

⁰)^;1⁼²

AM

¹⁼² is orthonormal

:

Thus

G

(

M

^;1⁼²

Q

) =

I

if, and only if,

G

(

AMA

⁰)^;1⁼²

AQ

=

I

. Hence

(

AQ

) =

rA

(

Q

)

A

⁰ with

r

:=

p=

trace(

A

(

Q

)

A

⁰)

:

(2.2)

Necessary and sucient conditions for (

Q

)²

M

⁺ are as follows.

Theorem 2.1

Let^V be the set of proper linear subspaces

V

of

R

^p, i.e. 1dim(

V

)

< p

.

a]

If

G

(

M

^;1⁼²

Q

) =

I

for some

M

²

M

⁺, then

Q

(

V

) dim(

V

)

=p

for all

V

²^V

: b]

Suppose that

Q

(

V

)

<

dim(

V

)

=p

for all

V

²^V

:

(2.3)

Then there exists a unique

M

²

M

⁺(

p

) such that

G

(

M

^;1⁼²

Q

) =

I

.

c]

Suppose that

G

(

Q

) =

I

but

Q

(

V

) = dim(

V

)

=p

for some

V

²^V. Then

Q

(

V

^?) = 1 and

G

(

a

+

b

(

I

^; ))^;1⁼²

Q

=

I

for all

a b >

0

where ²

M

describes the orthogonal projection from

R

^p onto

V

, and

V

^? stands for the orthogonal complement of

V

.

(6)

Parts !a, b] are due to Tyler (1987) and Kent and Tyler (1988). Their proofs are formulated for empirical distributions

Q

, but extension to arbitary distributions is mainly straightforward, requiring only notational changes. The only exception is the existence statement in part !b]. Two possible proofs are given in Section 7. Part !c], combined with (2.1), supplements part !b] in that condition (2.3) is even necessary for (

Q

)²

M

⁺. This will be needed in the proof of Theorem 3.2 below.

3 Dierentiability of

⁽ ⁾

For

M

²

M

we dene ^k

M

^k := ^j

¹(

M

)^j^_^j

p(

M

)^j. Since the dimension

p

may vary, this particular choice of a norm is important. It is particularly useful in connection with eigenvalues, because ^j

i(

A

)^;

i(

B

)^j ^k

A

^;

B

^k for

A B

²

M

and 1

i

p

. By way of contrast, for growing dimension

p

expansions involving the Euclidean norm ^k

M

^k^E = trace(

M

²)¹⁼² would be of little use. This is one reason why the results of Portnoy (1988) cannot be applied here without unnecessary restrictions on

p

. Generally, we always use the norm

k

L

^k := max_y

2S(B) k

Ly

^k

of a linear operator

L

from a normed vector space (

B

^k^k) into another normed space, where

S

(

B

) denotes the unit sphere ^f

y

²

B

:^k

y

^k= 1^g.

Now we investigate (

Q

) if

Q

is close to

P

in a certain sense and

G

(

P

) =

I

. By equivariance of

G

() and () it suces to consider the latter case.

The function

G

(

M

^;1⁼²

x

) is dierentiable with respect to

M

²

M

⁺ with

D

(

x B

) :=

@

@t

t⁼⁰

G

(

I

+

tB

)^;1⁼²

x

=

F

(

x B

)^;2^;1

BG

(

x

) +

G

(

x

)

B

F

(

x B

) :=

(

j

x

^j^;2

x

⁰

BxG

(

x

) =

p

^j

x

^j^;4

x

⁰

Bxxx

⁰ if

x

⁶= 0

0 else

:

Note that

D

(

x I

) = 0 and trace(

D

(

x B

)) = 0 for all

B

²

M

. The next lemma shows that condition (2.3) is closely related to the operator

D

(

Q

).

Lemma 3.1

The operator

D

(

Q

) is nonsingular on

M

(0) if, and only if,

Q

(

V

^?)

<

1

(7)

for arbitrary

V

²^V. In that case,

trace(

D

(

Q B

)

B

)

<

0 for all

B

²

M

(0)ⁿ^f0^g

:

The inverse operator of

D

(

P

) :

M

(0)^!

M

(0), if existent, is denoted by

D

^;1(

P

).

Here is our basic linear expansion for ().

Theorem 3.2

For any

b <

¹there exist constants

(

b

)

<

¹and

(

b

)

>

0 (not depending on

p

or

P

) such that

(

Q

)^;

I

+

D

^;1(

P G

(

Q

^;

P

))

(

b

)^k

F

(

Q

^;

P

)^kk

G

(

Q

^;

P

)^k whenever

(

P

) =

I

^k

D

^;1(

P

)^k

b

and ^k

F

(

Q

^;

P

)^k

(

b

)

:

The latter two norms ^k^k refer to the linear operators

D

^;1(

P

) :

M

(0)^!

M

(0) and

F

(

Q

^;

P

) :

M

^!

M

. Note also that^k

G

(

Q

^;

P

)^k=^k

F

(

Q

^;

P I

)^k^k

F

(

Q

^;

P

)^k.

Theorem 3.2, Lemma 3.1 and (2.2) together imply that () is Frechet-dierentiable with respect to the weak topology. The reason is that

x

^7!

F

(

x

) is a bounded, continuous mapping from

R

^pⁿ^f0^g into the nite-dimensional space of linear operators

L

:

M

^!

M

, so that^k

F

(

Q

^;

P

)^k^!0 as

Q

^!

P

weakly.

Corollary 3.3

Suppose that (

P

) =

I

. Then, as

Q

^!

P

weakly,

G

(

Q

) ^!

I

and (

Q

)^;

I

= ^;

D

^;1

P G

(

Q

)^;

I

+

o

^k

G

(

Q

)^;

I

^k

:

² One can even show that () is continuously Frechet-dierentiable. Instead of pursuing this issue, we shall prove a related statement about limiting distributions of (

P

^bn) and (

P

^b_n^s) in the next section.

4 Related estimators and their properties in xed dimen- sion

At this point it is convenient to dene (

Q

^e) :=

Q

^e

R

^pⁿ^f0^g for any probability measure

Q

^e on

R

^p with

Q

^e^f0^g

<

1.

(8)

Suppose rst that the distribution

P

n has a known \center"

n ²

R

^p. Without loss of generality one may assume that

n = 0. Then a straightforward estimator for (

P

n) is given by (

P

^bn). An important example are elliptically symmetric distributions

P

n = ^L(

R

n¹n⁼²

u

), where

R

n

>

0 and

u

are stochastically independent,

u

is uniformly distributed on the unit sphere of

R

^p, and n ²

M

⁺(

p

). Clearly (

P

n) = n, and the empirical distribution

P

^bn satises condition (2.3) almost surely if

n > p

. Moreover, the distribution of

(^;1_n (

P

^bn)) depends neither on n nor on ^L(

R

n) (cf. Tyler 1987).

The center

n, no matter how it is dened, is rarely known in advance. In order to avoid denition and estimation of an unknown location parameter one can also consider the functional

Q

^7!(

Q

^s) with the symmetrized distribution

Q

^s := ^L

z

¹^;

z

²

z

¹⁶=

z

² where (

z

¹

z

²)

Q

Q:

Here ¹² denotes the product measure on

R

^p

R

^p of (signed) measures ¹ ²on

R

^p. One motivation for the functional

Q

^7!(

Q

^s) is the representation 2^;1IE(

z

¹^;

z

²)(

z

¹^;

z

²)⁰ of the covariance matrix of

Q

. Moreover, if

z

Q

has independent, identically distributed components, then

G

(

Q

^s) =

I

, whereas

G

(

Q

) may be dierent from

I

. Thus symmetrization partly corrects a possible deciency of M-estimators.

One easily veries that

Q

^7!(

Q

^s) is anely invariant in that

A

(

Q

^s)

A

⁰ =

r

(

+

AQ

)^s with

r

:= trace(

A

⁰

A

(

Q

^s))

=p

(4.1)

for any nonsingular

A

²

R

^p^p and

²

R

^p, where

+

AQ

:=^L(

+

A z

),

z

Q

. If

Q

is elliptically symmetric around

with scatter matrix o ²

M

⁺(

p

), then

Q

^s is elliptically symmetric around zero with the same scatter matrix o.

An application of Theorem 3.2 utilizing the explicit error bound is the following Central Limit Theorem for the distribution of (

P

^bn) and (

P

^b_n^s).

Corollary 4.1

Suppose that

P

n converges weakly to some distribution

P

on

R

^p.

a]

Let

P

^f0^g= 0 and (

P

) =

I

. Let

L

n(^j

P

n) denote the distribution of

n

¹⁼²(

P

n)^;1⁼²

P

^bn

;

I

(9)

(provided that (

P

n)²

M

⁺). Then (

P

n)^!

I

and

L

n(^j

P

n) ^!^w ^L(

W

)

where

W

²

M

(0) is a random matrix with centered Gaussian distribution and the same covariance function as

D

^;1(

P G

(

y

)^;

I

),

y

P

.

b]

Let

P

^f

^g= 0 for all

²

R

^p and (

P

^s) =

I

. Let

L

_n^s(^j

P

n) denote the distribution of

n

¹⁼²(

P

_n^s)^;1⁼²

P

^b_n^s^;

I

(provided that (

P

_n^s) ²

M

⁺). Then (

P

_n^s)^!

I

and

L

_n^s(^j

P

n) ^!^w ^L(

W

^s)

where

W

^s²

M

(0) is a random matrix with centered Gaussian distribution and the same covariance function as 2

D

^;1(

P

^s

G

^e(

y P

)^;

I

),

y

P

. Here

G

^e(

x y

) :=

G

(

x

^;

y

).

Remark 4.2

The covariance function of a random matrix

W

²

M

(0) is dened as the function (

A B

)^7!Covtrace(

WA

) trace(

WB

)on

M

(0)

M

(0).

Remark 4.3

In case of

P

being spherically symmetric around zero one can deduce from equations (7.11) and (7.12) in Lemma 7.5 that

IEtrace(

WA

)trace(

WB

) = 2(1 + 2

=p

)trace(

AB

) for

A B

²

M

(0)

:

Remark 4.4

If

P

n^!

P

weakly, then the emprirical distribution

P

^bn converges weakly to

P

in probability. More precisely,

d

^w(

P

^bn

P

) converges to zero in probability, where

d

^w( ) metrizes weak convergence of probability measures on

R

^p. Consequently, the bootstrap distributions

L

n(^j

P

^bn) and

L

_n^s(^j

P

^bn) are consistent estimators of

L

n(^j

P

n) and

L

_n^s(^j

P

n), respectively.

Remark 4.5

Utilizing the equivariance properties of (), (2.2) and (4.1), one can deduce from Corollary 4.1 that

n

¹⁼²

(

P

n)^;1(

P

^bn)^;1 ^!^L (

¹^;

p)(

W

o) in part !a]

n

¹⁼²

(

P

_n^s)^;1(

P

^b_n^s)^;1 ^!^L (

¹^;

p)(

W

_o^s) in part !b]

:

(10)

5 Asymptotic behavior of

⁽^P^cⁿ⁾

and

⁽^P^cⁿ^s⁾

in high dimension

Now we consider the case where

p

=

p

n ^! ¹ but

p=n

^! 0

:

For the sake of simplicity it is assumed that

P

n has no atoms.

Theorem 5.1

Suppose that (

P

n) =

I

for all

n

. Let

²_n := max_u

2S(R p

) Z

(

u

⁰

G

(

y

)

u

)²

P

n(

dy

) =

O

(1)

_n² := _B max

2S(M(0)) Z

y

⁰

By y

⁰

y

2

P

n(

dy

) =

o

(1)

:

Further let

p

=

O

(

n

¹⁼²). Then

IE^k

G

(

P

^bn)^;

I

^k =

o

(1) and IE^k(

P

^bn)^;

G

(

P

^bn)^k =

o

IE^k

G

(

P

^bn)^;

I

^k

:

If in addition

p

=

O

(

n

¹⁼³), then

IE^k

G

(

P

^bn)^;

I

^k =

O

((

p=n

)¹⁼²)

:

Remark 5.2

Suppose that

y

n= (

y

ni)¹ip

P

nhas independent, identically distributed components with continuous, symmetric distribution such that IE(

y

_n²¹) = 1 and IE(

y

⁴_n¹) =

O

(1). Then

²_n =

O

(1) and

_n² =

O

(

p

^;1). For it follows from the one-sided version of Bennett's (1962) inequality that IP^fj

y

n^j²

=p

1

=

2^g exp(^;

a

n

p

) for some number

a

n

depending on the fourth moment of

y

n¹ and

p

such that liminfn^!1

a

n

>

0. Therefore, since (

u

⁰

G

(

y

)

u

)²

p

² and (

y

⁰

By

)²

=

(

y

⁰

y

)²1, one may replace these integrands of

²_nand

_n² with 4(

u

⁰

y

)⁴ and 4

p

^;2(

y

⁰

By

)², respectively. Then the assertion follows from tedious but elementary moment calculations.

Remark 5.3

The conclusions of Theorem 5.1 and Remark 5.2 remain valid if (

P

n

P

^bn) is replaced with (

P

_n^s

P

^b_n^s), where the symmetry condition in Remark 5.2 becomes super#uous.

For the proof of Theorem 5.1 consists essentially of bounding IE^k

F

(

P

^bn^;

P

n )^k² and IE^k

G

(

P

^bn^;

P

n)^k². But

F

(

P

^b_n^s

B

) can be written as a matrix-valued U-statistic

n

2

;1

X

1i<jn

F

(

y

ni^;

y

nj

B

)

:

(11)

Let $

P

_n^s be the empirical distribution of

y

_n^s¹

y

_n^s²

::: y

nm^s , where

m

=

m

n := ^b

n=

2^c and

y

_ni^s :=

y

n²i^;1^;

y

n²i. Then a simple convexity argument due to Hoeding (1963) yields IE^k

G

(

P

^b_n^s^;

P

_n^s)^k² IE^k

G

( $

P

_n^s^;

P

_n^s)^k²

IE^k

F

(

P

^b_n^s^;

P

_n^s )^k² IE^k

F

( $

P

_n^s^;

P

_n^s )^k² (5.1)

see also equation (7.20) in Section 7. Now the signed measure $

P

_n^s^;

P

_n^s can be handled analogously as

P

^bn^;

P

n.

Under spherical symmetry of

P

n, restrictions on

p

beyond

p

=

o

(

n

) are super#uous, and one can obtain rather precise expansions.

Theorem 5.4

Suppose that

P

n is spherically symmetric around zero for all

n

.

a]

Then

IE^k

G

(

P

^bn)^;

I

^k =

O

((

p=n

)¹⁼²) IE(

P

^bn)^;

I

^;(1 + 2

=p

)(

G

(

P

^bn)^;

I

) =

O

log(

n=p

)

p=n

:

Moreover, one can couple (

P

^bn) with a standard Wishart matrix

M

n²

M

with

n

degrees of freedom such that

IE^k(

P

^bn)^;

n

^;1

M

n^k =

o

((

p=n

)¹⁼²)

:

In particular,

((

P

^bn)) = 1 + 4(

p=n

)¹⁼²+

o

^p((

p=n

)¹⁼²).

b]

As for

P

^b_n^s,

IE^k(

P

^b_n^s)^;

I

^k =

O

((

p=n

)¹⁼²) IE(

P

^b_n^s)^;

I

^;

n

^;1^P_ni⁼¹

H

n(^j

y

ni^j)(

G

(

y

ni)^;

I

) =

o

((

p=n

)¹⁼²)

where

H

nis an increasing function from !0 ¹! into !0 2!. If in addition^j

y

n^j²

=p

converges in probability to a constant

o

>

0, then

IE^k(

P

^b_n^s)^;(

P

^bn)^k =

o

((

p=n

)¹⁼²)

:

6 The impact of plugging in estimates of location

For

²

R

^p let

Q

⁽⁾ := ^L

z

^;

z

⁶=

where

z

Q:

(12)

If

P

^f0^g= 0, one can easily show that

Q

⁽⁾ converges weakly to

P

as

Q

^!^w

P

and

^!0.

Thus Corollary 3.3 implies that

(

Q

⁽⁾) ^! (

P

) as

Q

^!^w

P

and

^!0 (6.1)

whenever (

P

) ²

M

⁺. The following two results show that under moderate moment assumptions on

P

the dierence (

Q

⁽⁾)^;(

Q

) can be expanded explicitly, extending results of Tyler (1987, Section 4).

Theorem 6.1

Suppose that

P

^f0^g= 0, (

P

) =

I

and ^R ^j

x

^j^;1

P

(

dx

)

<

¹. Dene

H

(

x

) :=

p

^j

x

^j^;2(

x

⁰+

x

⁰)^;2^j

x

^j^;2

x

⁰

G

(

x

)

:

Then

(

Q

⁽⁾)^;

I

= ^;

D

^;1

P G

(

Q

)^;

I

+

H

(

P

)+

o

^k

G

(

P

^;

Q

)^k+^j

^j as

Q

^!^w

P

^Z ^j

x

^j^;1

Q

(

dx

) ^! ^Z ^j

x

^j^;1

P

(

dx

)

^! 0

:

Note that

H

(

) is an odd function. Thus the bias term

H

(

P

) equals zero if

P

is symmetric in that

P

(

S

) =

P

(^;

S

) for all Borel sets

S

R

^p. As for the moment condition, note that ^R ^j

x

^j^;^r

P

(

dx

)

<

¹ if

r < p

and

P

has a bounded density with respect to Lebesgue measure.

Theorem 6.2

Suppose that

p

=

p

n^!¹ and

p=n

^!0. Let (

P

n) =

I

and

P

n(^f

^g) = 0 for arbitrary

²

R

^p and all

n

. Moreover, let

p

^Z ^j

x

^j^;2

P

n(

dx

) =

O

(1) and suppose that either

²_n =

O

(1)

²_n =

o

(1) and

p

=

O

(

n

¹⁼³) (cf. Theorem 5.1), or,

P

n is spherically symmetric around zero for all

n:

(13)

Then ^k

H

(

P

n )^k=

O

(1) and for any sequence of positive numbers

n=

o

((

p=n

)¹⁼⁴), sup

j^jⁿ

(

P

^b_n⁽⁾)^;

I

+

D

^;1

P

n

G

(

P

^bn)^;

I

+

H

(

P

n

) =

o

^p((

p=n

)¹⁼²)

:

Since ^R ^j

x

^j^;2^Np(0

I

)(

dx

) = (

p

^;2)^;1, the rst moment condition is satised for mix- tures

P

n = ^Z ^Np(0

²

I

)

n(

d

)

provided that^R

^;2

n(

d

) =

O

(1). If

^bn =

^bn(

P

^bn) is an estimator such that

bn =

O

^p((

p=n

)¹⁼²)

(6.2)

then under the assumptions of Theorem 6.2,

(

P

^bn^(bⁿ⁾)^;

I

=

O

^p((

p=n

)¹⁼²)

(

P

^bn^(bⁿ⁾)^;(

P

^bn) +

D

^;1(

P

n

H

(

P

n

^bn)) =

o

^p((

p=n

)¹⁼²)

:

In case of

p

^;1^R ^j

x

^j²

P

n(

dx

) =

O

(1), the sample mean

^bn = ^R

x P

^bn(

dx

) satises condition (6.2). Alternatively consider Tukey's median

bn = argmax

2R p

u^2S(Rinf ^p⁾

P

^bn

n

x

²

R

^p :

x

⁰

u

⁰

u

^o

:

Here

^bn=

O

^p((

p=n

)¹⁼²), provided that

liminf_n

!1

u^2S(Rinf ^p⁾

^;1_n

P

n

x

²

R

^p :

u

⁰

x

n

o

;1

=

2

>

0 whenever

n^#0

:

(6.3)

This follows straightforwardly from the fact that

IE sup

halfspacesH^R^p(

P

^bn(

H

)^;

P

n(

H

))²

cp=n

for some universal constant

c

. This is a consequence of Alexander (1984, Corollary 2.9) see also Pollard (1990, Sections 1-4) for techniques to prove it. If

P

nis a mixture of normal distributions as above, condition (6.3) is satised if

liminf_n

!1

n(!0

r

])

>

0 for some

r <

¹

:

(14)

7 Proofs

7.1 Proofs for Section 2

Proof of Theorem 2.1 a, c]:

Let

G

(

Q

) =

I

, and let

V

² ^V with corresponding projection matrix ²

M

. Then

dim(

V

) = trace( ) =

p

^Z ^j

x

^j^;2

x

⁰

xQ

(

dx

)

pQ

(

V

)

with equality if, and only if,

Q

(

V

^?) = 1. In this case

G

(

a

+

b

(

I

^; ))^;1⁼²

Q

equals

G

(

Q

) =

I

for all

a b >

0, because

G

(

a

+

b

(

I

^; ))^;1⁼²

x

=

G

(

x

) for any nonzero

x

²

V

^?. Note that (

a

+

b

(

I

^; ))=

a

+

b

(

I

^; ) for any real

. ²

First proof of the existence statement in Theorem 2.1 b].

The arguments of Kent and Tyler (1988) can be modied as follows. Without loss of generality let

Q

be supported by the unit sphere

S

(

R

^p). Any local maximum

A

²

M

⁺(

p

) of the functional

`

(

A

) := logdet

A

^;

p

^Z log(

x

⁰

Ax

)

Q

(

dx

) satises

G

(

A

¹⁼²

Q

) =

I

, because

@t @

t⁼⁰

`

(

A

+

t

) = trace(

A

^;1⁼²

A

^;1⁼²)^;

p

^Z

x

⁰

x x

⁰

Ax Q

⁽

dx

)

= trace

A

^;1⁼²

A

^;1⁼²(

I

^;

G

(

A

¹⁼²

Q

))

for arbitrary ²

M

. Existence of such a local maximum is guaranteed if we can show that limk^!1

`

(

A

k) =^;1 for any sequence (

A

k)k in

M

⁺(

p

) with limit

A

²

M

(

p

)ⁿ

M

⁺.

For that purpose assume without loss of generality that

A

k =^P^p_i⁼¹

i(

A

k)

ki

_ki⁰ with an orthonormal matrix (

k¹

k²

:::

kp) converging to (

¹

²

:::

p) as

k

^!¹. For xed

>

0 and 1

j

p

dene

S

j := ⁿ

x

²

S

(

R

^p) : ^X^p

i⁼j(

_i⁰

x

)²

>

1^;

²^o and

D

j :=

S

jⁿ

S

j⁺¹

where

S

p⁺¹ :=. Note that

S

j is just the intersection of the unit sphere

S

(

R

^p) with the open

-neighborhood of the space span(

j

:::

p). Since

liminf_k

!1

x^2S(Rmin^p⁾ⁿS^{j +1}

x

⁰

A

k

x

j(

A

k) liminf_k

!1

x^2S(Rmin^p⁾ⁿS^{j +1} j

X

i⁼¹(

_ki⁰

x

)²