• Keine Ergebnisse gefunden

Superecient Estimation of Multivariate Trend

N/A
N/A
Protected

Academic year: 2022

Aktie "Superecient Estimation of Multivariate Trend"

Copied!
16
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Superecient Estimation of Multivariate Trend

Rudolf Beran Department of Statistics University of California, Berkeley

Berkeley, CA 94720{3860, USA Revised July 1998

Abstract.

The question of recovering a multiband signal from noisy observations motivates a model in which the multivariate data points consist of an unknown deter- ministic trend observed with multivariate Gaussian errors. A cognate random trend model suggests ane shrinkage estimators ^A and ^B for , which are related to an extended Efron-Morris estimator. When represented canonically, ^A performs componen- twise James-Stein shrinkage in a coordinate system that is determined by the data. Under the original deterministic trend model, ^A and its relatives are asymptotically minimax in Pinsker's sense over certain classes of subsets of the parameter space. In such fashion,

^A and its cousins dominate the classically ecient least squares estimator. We illustrate their use to improve on the least squares t of the multivariate linear model.

AMS classication: 62H12, 62J05

Keywords and phrases: multivariate linear model, deterministic trend, risk estimator, minimumCL, adaptive estimator, Efron-Morris estimator, asymptotic minimax, Pinsker bound.

1. Introduction.

The least squares t to a multivariate trend that is observed with error at many points is unsatisfactory because it emphasizes unbiasedness at the expense of risk. This paper develops adaptive ane shrinkage estimators that have two advantages:

They asymptotically dominate least squares ts, pointwise in the parameter space and they are asymptotically minimax over certain classes of subsets in the parameter space.

Consider the multivariate trend model in which we observe the independent p 1 random vectorsfxt:1t ng, the distribution ofxt being Np(t) with n greater than p. Thep 1 mean vectorsft:1tngare unknownconstants, as is the positive denite p pcovariance matrix . The observations fxtg are organized into then pdata matrix X = (x1x2:::xn)0 whose expectation is the matrix = EX = (12:::n)0.

Research supported in part by National Science Foundation Grant DMS95{30492 and by the Alexander von Humboldt Foundation.

1

(2)

Let ^ = (^1^2:::^n)0 denote any estimator of . The quality of ^ is assessed through the quadratic loss

Ln(^) = (np);1tr(^;);1(^;)0]

= (np);1Xn

t=1(^t;t)0;1(^t ;t) (1:1) where tr denotes the trace operator. The risk Rn(^) is the expectation of this loss.

The adaptive estimator ^A developed in this paper has asymptotic risk as follows.

Let V =n;1Xn

t=1tt0 =n;10 W = ;1=2V;1=2 (1:2) and denote the eigenvalues of W by 1(W) ::: p(W) 0. We will show, among other results, that for every nite positive r,

nlim!1 sup

1

(W)rjRn(^A);(W)j= 0 (1:3) where

(W) = 1;p;1tr(Ip+W);1]<1 8 : (1:4) The quantity 1(W) that denes the domain of in the supremum is a multivariate measure of signal-to-noise ratio. The asymptotic risk in (1.3) dominates the risk of the least squares estimator ^LS =X, which is 1 for every value of and . Moreover, unlike

^LS, the adaptive ane estimator ^Aturns out to be asymptotically minimax over certain classes of subsets of centered at = 0. In this sense, ^A is asymptotically superecient relative to the classically ecient ^LS.

The construction of ^A involves the following steps. Let ^ be an independent consis- tent estimator of and let

V^ =n;1Xn

t=1xtx0t; ^ =n;1X0X ;^ W^ = ^;1=2V^^;1=2: (1:5) Suppose that ^1 ::: ^p ;1 are the eigenvalues of ^W and f^jg are corresponding eigenvectors. Letting ]+ denote the positive-part function, dene

A^= Xp

j=1^j=(1 + ^j)]+^j^j0: (1:6) The adaptive estimator of is then

^A=X^;1=2A^^1=2: (1:7) 2

(3)

Expressions (1.6) and (1.7) reveal that ^A carries out componentwise James-Stein shrink- age in a canonical coordinate system for Rp that is estimated from the data. Multiple shrinkage in a xed coordinate system was introduced by Stein 17]. Unlike the least squares estimator ^LS, the ane shrinkage estimator ^A uses ^W, which estimates the matrixW, in order to reduce asymptotic risk.

The estimator ^A is more easily understood in a multivariate random trend model.

Instead of the model described above, suppose that the ftg are independent random vec- tors, each having aNp(0V) distribution. Given , suppose that thefxtgare conditionally independent, the conditional distribution of xt beingNp(t). IfX is observed andV, are known, then the minimum risk predictor of under loss (1.1) is ~ = X;1=2A~1=2, where

A~=Ip;(Ip+W);1 =W(Ip+W);1: (1:8) When W is nonsingular, it is also true that ~A = (Ip+W;1);1. Since ^A is a consistent, positive semidenite estimator of ~Ain the random trend model, ^Ais the natural empirical version of the minimum risk predictor for . Another consistent estimator of ~A is A = Ip;(Ip + ^W);1 which, unlike ^A, need not be positive semidenite. This generates the alternative empirical predictor

^B =X^;1=2A^1=2 =XIp;n(X0X);1^]: (1:9) Related predictors, albeit more complex, have been used to analyze multiband satellite image data (see 6]).

Up to second-order renements, ^A and ^B are multivariate versions of James-Stein 7] estimators for univariate trend. Indeed, when p = 1, ^W = (n^2);1Pnt=1x2t ;1 = ^1 and ^1 = 1. Then

^A = 1;n^2=Xn

t=1x2t]+X ^B = 1;n^2=Xn

t=1x2t]X: (1:10) Statistical folklore, supported by a growing number of results, posits that procedures with good behavior in models with many random parameters may also have desirable properties when those parameters are deterministic. Pfanzagl 13] discussed one aspect of the matter, estimation of a real parameter in the presence of many nuisance parameters, and reviewed earlier contributions to the literature. Another aspect is estimation of the entire high-dimensional parameter under the deterministic trend model described above.

When = ^ =Ip, Efron and Morris 4] showed that a renementof ^Bis globally minimax and dominates ^LS. Bilodeau and Kariya 2] extended both the Efron-Morris estimator and its global asymptotic minimaxity to the case of unknown . For details, see (3.12).

The aim of this paper is to study the performance of ^A as n tends to innity with p xed. Section 2 of the paper drawson Pinsker's 14] theoremto give an asymptotic minimax

3

(4)

bound for the estimation of over a certain rich class of subsets of the parameter space.

The idealized estimator ~, suggested by the random trend model when V, are known, is asymptotically minimax in this deterministic setting, unlike the least squares estimator

^LS. The results of Section 3, on the success of adaptation by minimizing estimated risk, entail limit (1.3) and the asymptotic minimaxity, over designated subsets, of ^A, ^B, and the extended Efron-Morris estimator. Section 4 describes how these estimators may used to improve tting of the multivariate linear model.

2. Asymptotic Minimax Bound.

This section obtains asymptotic minimax bounds for estimation of over certain subsets of the parameter space and constructs two perti- nent estimators. The rst of these is asymptotically minimax for a specied subset of the parameter space. The second is asymptotically minimax over all subsets of the form con- sidered but still requires knowledge of and W. The adaptive estimator to be developed in Section 3 depends only on the data.

For the purposes of this section, we will reduce the estimation problem to a canonical form. Let ;denote ap pmatrix whosej-th column is an eigenvector ofW corresponding to the eigenvalue j(W). Dene

yt = ;0;1=2xt t = ;0;1=2t: (2:1) The distribution of yt is Np(tIp) and the fytg are independent random vectors that dene the data matrix Y = (y1y2:::yn)0. Moreover,

n;1Xn

t=1tt0 = diagfj(W)g: (2:2) Any estimator ^ of induces the estimator

H^ = (^1^2:::^n)0 = ^;1=2; (2:3) of H = (12:::n)0 = ;1=2;. The correspondence between ^ and ^H is one-to-one as is the correspondence between and H. Risks map through the identity

Ln(^) =Ln( ^HHIp) = (np);1Xn

t=1

j^t;tj2: (2:4) The problem of estimating under loss (1.1) is therefore equivalent to the simpler problem of estimating H under quadratic loss, as exhibited in (2.4).

Let M consist of all vectors in Rp whose components fbjg each satisfy 1 bj 1. For every b2M and every r >0, dene

D(rb) =f:p;1Xp

j=1bjj(W)rg (2:5) 4

(5)

a subset of the original parameter space for . Evidently, 2 D(rb) if and only if the canonical parameter H lies in

E(rb) =fH:(np);1Xp

j=1bjXn

t=1tj2 rg: (2:6) Application of Pinsker's (1980) theorem to the canonical estimation problem (see Section 5) yields the asymptotic minimax bound

nlim!1inf

H^ Hsup

2E(rb)(np);1EXn

t=1

j^t;tj2 =0(rb) (2:7) where

0(rb) =p;1Xp

j=1(bj);1=2;1]+=(1 + (bj);1=2;1]+) (2:8) and =(rb) is the unique positive real number such that

p;1Xp

j=1(bj=)1=2 ;bj]+ =r: (2:9) The bound (2.7) is attained asymptotically by the linear estimator ^H given by ^t = diagfgjgyt, where gj = 1;(bj)1=2]+ for 1 j p. Let X = (xtx2:::xn)0. For any symmetric matrix A and positive denite matrix S, both of dimensionsp p, let

^(AS) =XS;1=2AS1=2: (2:10) The image of estimator ^H in the original parametrization is

H^ ;01=2 =X;1=2A 1=2 = ^(A ) (2:11) where

A =A (rbW) = ;diagfgjg;0: (2:12) The discussion in this and the preceding paragraph yields the following asymptotic mini- max theorem.

Theorem 2.1.

For every b2M, every r >0, and every positive denite ,

nlim!1inf

^

sup

2D(rb)Rn(^) =0(rb) (2:13) and ^(A ) is an asymptotically minimax estimator of in that

nlim!1 sup

2D(rb)Rn(^(A )) =0(rb): (2:14) 5

(6)

A major drawback to the estimator ^(A ) is its dependence on r, b, W, and . The second estimator to be discussed in this section dispenses with knowledge of r and b, though still requiring W and , and will lead to the fully adaptive estimator ^A that is treated in Section 3.

Let A denote all symmetric p p matrices with eigenvalues restricted to 01]. Ev- idently A 2 A. Consider the class of candidate estimators f^(A):A 2 Ag dened through (2.10). This class assumes knowledge of . Let

A~=W(Ip+W);1 =Ip;(Ip+W);1: (2:15) Using the spectral decomposition W = ; !; 0 yields the spectral decomposition ~A =

Ppj=1j=(1 +j)]jj0], which shows that ~A 2 A. The risk of the candidate estimator

^(A) simplies algebraically to

Rn(^(A)) =p;1trA2 + (Ip;A)2W]

=p;1tr(A;A~)2(Ip+W)] +p;1trW(Ip+W);1]

=(AW) say: (2:16)

It follows from this display that ~A = argminA2A(AW) and

Amin2A(AW) =( ~AW) =(W) (2:17) for(W) dened by (1.4).

BecauseA 2A, sup

2D(rb)Rn(^( ~A)) sup

2D(rb)Rn(^(A )): (2:18) This inequality and the limit (2.14) yield

Corollary 2.2.

For everyb2M, every r >0, and every positive denite ,

nlim!1 sup

2D(rb)Rn(^( ~A)) =0(rb): (2:19) Thus, the estimator

^( ~A) =X;1=2A~1=2 =X;X(Ip+ ;1V);1 (2:20) which requires knowledge ofW and , is asymptotically minimax for every choice ofb2M and r > 0. The next section devises a fully adaptive asymptotically minimax estimator that depends only on data.

6

(7)

3. Adaptive Estimation.

Let ^ be a consistent estimator of . The risk function (AW) in (2.16) is estimated plausibly by

^

(A) =p;1trA2+ (Ip;A)2W^] (3:1) where ^W dened in (1.5) approximates W. By analogy with the construction of ^( ~A), the proposed adaptive estimator of is

^A = ^( ^A^) =X^;1=2A^^1=2 (3:2) with ^A = argminA2A^(A). Lemma 5.3 in Section 5 veries that ^A is given explicitly by (1.6).

The procedure just described is a multivariate version of adaptation by minimizing CL, a methodology that Mallows 10] rst discussed critically and connected to Stein estimation. Li 9] developed properties of minimum CL procedures, relating them to cross-validation methods. Kneip 8] treated the success of minimumCL for ordered linear smoothers. On the other hand, Efroimovich and Pinsker 5] and Golubev 6] pioneered adaptive estimators whose maximum risk converges asymptotically to the Pinsker bound for each member of a class of ellipsoids in the parameter space. The extensive univariate literature on such adaptive asymptotically minimax estimators is reviewed by Nussbaum 12].For every b2Mand r >0, the set D(rb) dened in (2.5) satises

D(rb) f:1(W)prg: (3:3) This ordering links the results in the next two theorems with the task of proving that ^ is asymptotically minimax in the sense of Theorem 1.1.

We will impose the following assumption on the estimator ^ of . Note that the condition includes the case when is known and ^ = . For any matrix argument, jj will denote the Frobenius norm, which is dened byjAj2 = trAA0] = trA0A]. We note for later use that if fAi:1i kgare p p matrices, then jtrQki=1Ai]jQki=1jAij.

Condition C.

The estimator ^ and X are independent. Let Z^ = ;1=2^1=2. For every r >0,

nlim!1 sup

1

(W)rEJ = 0 (3:4)

where J is any one of jZ^;1;Ipj2, jZ^;1j2jZ^;Ipj2, or jZ^;1j2jZ^;1;Ipj2.

The next two theorems, proved in Section 5, establish that the estimated risk function ^(A) and the adaptive estimator ^A, both dened above, serve asymptotically as surrogates for the true risk function (AW) and for ^( ~A).

7

(8)

Theorem 3.1.

Suppose that Condition C holds. Then, for every r > 0 and every positive denite ,

nlim!1 sup

1

(W)rE supA

2A

jLn(^(A^));(AW)j= 0 (3:5)

and nlim

!1

1(supW)rE supA

2A

j^(A);(AW)j= 0: (3:6) From this result follows

Theorem 3.2.

Suppose that Condition C holds. Then, for every r > 0 and every positive denite ,

nlim!1 sup

1

(W)rEjT ;(W)j= 0 (3:7) where T can be any one of Ln(^A), Ln(^( ~A)) or ^( ^A) and (W) is dened in (1.4).

The convergence (1.3) of the risk of ^A is immediate from this result. Another con- sequence is the following corollary, which establishes the asymptotic minimaxity of ^A.

Corollary 3.3.

Suppose that Condition C holds. For every b2M, every r >0, and every positive denite ,

nlim!1 sup

1(W)rjRn(^A);Rn(^( ~A))j= 0 (3:8)

and nlim

!1

sup

2D(rb)Rn(^A) =0(rb): (3:9) To verify (3.8), observe that

1(supW)rjRn(^A);Rn(^( ~A))j sup

1

(W)rEjLn(^A);Ln(^( ~A))j (3:10) which tends to zero by Theorem 3.2. Corollary 2.2, (3.3), and (3.10) then imply (3.9).

Related to Corollary 3.3 are the following remarks:

a) A uniform integrability argument yields

nlim!1 sup

1

(W)r(np);1Ej;1=2(^B ;^( ~A))0j2 = 0: (3:11) Consequently, by Corollary 2.2, the estimator ^B is asymptotically minimax in the sense (3.9).

8

(9)

b) Suppose that ^ is independent ofX and (m+p+1)^ has a Wishart(m) distribution.

Bilodeau and Kariya 2] showed that the extended Efron-Morris estimator

^EM =X;X(n;p;1)(X0X);1+ (p;1)Ip=tr(X0X)]^ (3:12) is then globally minimax. Under the hypotheses just stated, this renement of ^B

also has the Pinsker asymptotic minimaxity (3.9), provided m tends to innity with n.

c) Specialized to the case p = 1, Corollary 3.3 implies that the James-Stein estimator and the positive-part James Stein estimator are asymptotically minimax over every ball centered at the origin in the parameter space. Of course, this result also follows directly from Pinsker's theorem (see Theorem 5.2) or by developing ideas sketched in Stein 16] (see 1]).

4. Application to the Multivariate Linear Model.

This section describes some implications of ^A and its cousins for improved tting of the Gaussian multivariate linear model (see also 2]). For the univariate linear model, Rao and Toutenberg 15] reviewed various biased estimation techniques that have smaller risk than least squares. The multi- variate case presents the additional possibility of estimating and using information between response variables.

Consider the multivariate linear model Y = CB+E, where the observation matrix Y is m p, the regression matrix C is m n, the coecient matrix B is n p, and the rows of the error matrix E are independent Gaussian random vectors with mean 0 and covariance matrix . HereC is a given matrix constant while both B and are unknown.

We will assume that rank(C) = n < m and that p < n. The problem is to estimate M = EY =CB.

Reducing this linear model to canonical form enables us to apply the preceding re- sults on estimation of multivariate trend. Let N be an m n matrix whose columns are orthonormal and span the same subspace of Rm as do the columns of C. One possible algebraic construction of N is through the singular value decomposition of C,

C =NLP0 (4:1)

where P is n n,N0N =P0P =PP0 =In, and L= diagflig withl1 l2 :::ln >0.

The columns of P are eigenvectors of C0C and li is the positive square root of the i-th largest eigenvalue.

Having chosenN, construct them (m;n) matrix "N so that the matrixO=fNjN"g is orthogonal. If N comes from the singular value decomposition (4.1), then the columns of O are eigenvectors of CC0, ordered in decreasing order of the eigenvalues. Let

X =N0Y X" = "N0Y (4:2) 9

(10)

and dene = EX = LP0B, an n p matrix. Because (X0jX"0)0 = O0Y, the rows of X and "X are independent Gaussian random vectors, each having covariance matrix . This structure is a canonical form of the original linear model.

The mapping between andM =CBis one-to-one, becauseM =N and =N0M. The columns of the canonical parameter can take any value in Rn the columns of the original parameterM are restricted to then-dimensional subspaceL(C) ofRm spanned by the columns of C. The same one-to-one mapping exists between any estimator ^M = CB^ of CB and the corresponding estimator ^ =N0M^ of . Because

Lmn( ^MM) = (np);1tr;1( ^M ;M)0( ^M ;M)]

= (np);1tr;1(^;)0(^;)] (4:3) estimation ofM =CBunder the loss to the left is equivalent to estimation of the canonical parameter under the loss to the right. Denote the corresponding risk byRmn( ^MM).

Let ^ = (m;n);1X"0X" be the usual estimator of based upon the rows of "X. In terms of the original parametrization, ^ = (m;n);1(Y ;CB^LS)0(Y ;CB^LS) where B^LS = (C0C);1C0Y is the least squares estimator of B (cf. Mardia, Kent and Bibby 11], chapter 6). Dene the estimator ^A as in (1.7). Asymptotic minimaxity of ^A, as stated in Corollary 3.3, entails asymptotic minimaxity under loss (4.3) of the estimator

M^A =N^A =CB^A (4:4) where ^BA=NL;1^A= NL;1X^;1=2A^^1=2.

More precisely, note thatW, dened by (1.2), can be expressed in terms ofM through W =n;1;1=2M0NN0M;1=2 (4:5) and that 2D(rb) if and only if M 2C(rb), where

C(rb) =fM:M 2L(C) p;1Xp

j=1bjj(W)rg: (4:6) The estimator ^ dened above satises Condition C with n replaced by m;n. Thus, (2.13) and (3.9) imply

Corollary 4.1.

Let q = min(nm;n). For every b 2 M, every r > 0, and every positive denite ,

qlim!1inf

M^ Msup

2C(rb)Rmn( ^MM) =0(rb) (4:7)

and qlim

!1

M2supC(rb)Rmn( ^MAM) =0(rb): (4:8) 10

(11)

Example.

Suppose we observe k independent replicates of the deterministic trend model described in Section 1. Equivalent is the multivariate linear model in whichm=kn, B is n p, and

C = (InjInj:::jIn)0: (4:9) Thus M = CB = (B0jB0j:::B0)0. The singular value decomposition of C has P = In, N = k;1=2(InjInj:::jIn)0, and L = k1=2In. Let Y1 denote the rst n rows of Y, Y2 the next n rows, and so forth until Yk. If "Y = k;1Pki=1Yi, then the least squares estimator of B is ^BLS = "Y. Consequently, ^MLS = ("Y0jY"0j:::jY"0)0, X =k1=2Y",

^ =k;1Xk

i=1(Yi;Y")0(Yi;Y") (4:10) and ^W =kn;1^;1=2Y"0Y"^;1=2;Ip. By Corollary 4.1, construction (4.4) yields a superef- cient estimator ^MA of M that is asymptotically minimax whenk is xed and ntends to innity. Since =k1=2B and W = kn;1;1=2B0B;1=2, it follows from (1.3) and (1.4) that ^MA improves most signicantly on ^MLS when k is small.

5. Argument Details.

This section substantiates various claims made earlier in the paper.

The Pinsker bound.

Suppose we observe u = (u1u2:::um)0, the fuig being independent random variables and the distribution of ui being N(i1). The problem is to estimate the means = (12:::m)0 under normalized quadratic loss. The risk of an estimator ^ = (^1^2:::^m) is

Rm(^) = m;1EXm

i=1(^i;i)2: (5:1) When specialized to this problem, Pinsker's 14] paper yields two theorems stated below.

We emphasize that these two theorems are useful corollaries to Pinsker's more general analysis. Nussbaum's12] extensive survey reviews other applications of the Pinsker bound.

Let N = fa 2 Rm:ai 2 11]1 i mg. Dene addition, subtraction, multiplica- tion and division of f and g inRm by the specied operation on components, as in coding S-Plus. For instance, fg = (f1g1f2g2:::fmgm). Let ave(f) = m;1Pmi=1fi. For every a2N and r >0, dene the ellipsoid

B(ra) =f 2Rm:ave(a2)rg: (5:2) Let02 = (aj);1=2;1]+, whereis the unique positive real number such that ave(a20) = r. Dene m(ra) = ave02=(1 +20)] and f0 =02=(1 +02).

11

(12)

The rst theorem drawn from Pinsker's reasoning treats linear estimators for of the form ^ =fu.

Theorem 5.1.

For every a 2N and every r >0,

f2infRm sup

2B(ra)Rm(fu) =m(ra) = sup

2B(ra)Rm(f0u): (5:3) The second theorem from the same source shows that the minimax linear estimator is often asymptotically minimax among all estimators.

Theorem 5.2.

If limm!1mm(ra) =1, then for every a 2N and every r >0,

mlim!1inf

^ sup

2B(ra)Rm(^)=m(ra)] = 1: (5:4) If limm!1m(ra) =0 >0, then also

mlim!1inf

^ sup

2B(ra)Rm(^) =0: (5:5)

Proof of (2.7).

The canonical estimation problem of Section 2, described in equations (2.1) through (2.7) can be re-expressed in the notation above. Formuby stacking vertically the columns of Y. Similarly, form and ^ by stacking the columns of H and ^H. Thus m =np. Forma by stacking n replicates of b1 atop nreplicates of b2 and so on through n replicates of bp. With these identications, equation (5.5) above is equivalent to (2.7).

Lemma 5.3.

The matrix A^= argminA2A^(A) is given explicitly by (1.6).

Proof.

Let A=Ip;(Ip+ ^W);1. As in the second line in (2.16),

^

(A) =p;1tr(A;A)2(Ip+ ^W)] +p;1tr ^W(Ip+ ^W);1]: (5:6) Let S denote the set of all p p symmetric matrices. From (5.6), A = argminA2S^(A).

Write ^j = ^j=(1+^j). If ^! = diagf^jg and ^; =f^1^2:::^pg, then ^W has the spectral representation ^W = ^;^!^;0. Consequently, A=Ppj=1^j^j^j0. Because 1+^j 0, it follows that ^j 1 but need not be positive. Consequently A is not, in general, an element of A.

Dene A+ = X

^j0^j^j^j0 A; = X

^j<0^j^j^j0 (5:7) noting that A+ 2A, A = A++ A;, and A+A; = 0. For brevity, put

K^ =Ip+ ^W =Xp

j=1(1 + ^j)^j^j0: (5:8) 12

(13)

Then

tr(A;A)2K^] = trf(A;A+);A;g2K^]

= tr(A;A+)2K^];trAA;K^];tr ^KA;A] + tr A2;K^]: (5:9) For every A2A,

;trAA;K^] =;trA X

^j<0^j(1 + ^j)^j^j0]

=; X

^j<0^j(1 + ^j)^j0A^j 0 (5:10) because A is positive semidenite and 1 + ^j 0. Similarly, ;tr ^KA;A] 0. It now follows from (5.10) that

tr(A;A)2K^]tr(A;A+)2K^] + tr A2;K^] (5:11) for every A2A. This implies that ^A = A+, as was to be shown.

Proof of Theorem 3.1

We rst prove (3.6). Let zt = ;1=2xt and t = ;1=2t. Thefzt:1tngare independent randomvectors, the distribution ofzt beingNp(tIp).

If ^U =n;1Pnt=1ztzt0 and ^Z is the matrix dened in Condition C, then

W^ = ^Z;1U^( ^Z;1)0;Ip (5:12)

and U^ =W +Ip+ ^F + ^F0+ ^G (5:13)

where

F^ =n;1Xn

t=1(zt;t)t0 G^ =n;1Xn

t=1(zt;t)(zt ;t)0;Ip: (5:14) By direct calculations, sup1(W)rEjF^j2 =O(n;1) and sup1(W)rEjG^j2 =O(n;1) con- sequently

1(supW)rEjU^ ;W ;Ipj=O(n;1=2): (5:15) Evidently

^

(A);(AW) = p;1tr(Ip;A)2( ^W ;W)]

= p;1tr(Ip;A)2fZ^;1U^( ^Z;1)0;W ;Ipg]

p;1X3

j=1Tj (5:16)

13

(14)

where

jT1j=jtr(Ip;A)2Z^;1U^f( ^Z;1)0;Ipg]jjIp;Aj2jZ^;1jjU^jjZ^;1;Ipj

jT2j=jtr(Ip;A)2( ^Z;1;Ip)^U]jjIp;Aj2jZ^;1;IpjjU^j

jT3j=jtr(Ip;A)2(^U ;W ;Ip)]j jIp;Aj2jU^ ;W ;Ipj: (5:17) For everyA2A,jIp;Aj2 p. Combining the last three displays with Condition C yields (3.6).

To verify (3.5), write AZ = ^ZAZ^;1 and observe that Ln(^(A^)) = (np);1Xn

t=1

jAZzt;tj2

= (np);1Xn

t=1

jAZ(zt ;t);(Ip;AZ)tj2

=p;1trA0ZAZ(Ip+ ^G) + (Ip;AZ)0(Ip;AZ)W ;2(Ip;AZ)0AZF^]:

(5:18)

Since

jAZ;AjjAjjZ^;1jjZ^;Ipj+jAjjZ^;1;Ipj (5:19) the limit (3.5) follows from the preceding two displays, the statement after (5.14), and Condition C.

Proof of Theorem 3.2.

Limit (3.6) implies that

nlim!1 sup

1

(W)rEj^( ^A);( ^AW)j= 0 (5:20)

and nlim

!1

1(supW)rEj^( ^A);( ~AW)j = 0: (5:21) Since ( ~AW) =(W), limit (3.7) holds for T = ^( ^A) and

nlim!1 sup

1

(W)rEj( ^AW);(W)j= 0: (5:22) On the other hand, limit (3.5) gives

nlim!1 sup

1

(W)rEjLn(^A);( ^AW)j= 0: (5:23) Combining this with (5.22) entails (3.7) for T = Ln(^A). Finally, taking ^ = yields (3.7) for T =Ln(^( ~A)).

14

(15)

6. Discussion.

This paper approaches from several directions the ane shrinkage estimator ^A for the multivariate trend . In a Gaussian random trend model, ^A is an estimated minimum risk predictor of . For the deterministic trend model used in our analysis, ^Ais that member of a certain class of candidate ane shrinkage estimators that minimizes estimated risk, or equivalently, minimizes theCL criterion. Analysis shows that

^A is asymptotically minimax in Pinsker's sense over certain subsets of trends centered at = 0. The asymptotic maximum risk of ^A over such subsets strictly dominates that of the least squares trend estimator. Unlike ^(A ) and ^( ~A), the other asymptotically minimax estimators studied in Section 2, the estimator ^Ais fully adaptive, depending only on data. As exhibited in the Introduction, ^A achieves supereciency relative to the least squares estimator by performing componentwise James-Stein shrinkage in a coordinate system that is estimated from the data. The construction of ^A, applied to the Gaussian multivariate linear model in canonical form, yields improved regression ts. These main results carry over to cousins of ^A such as ^B and the extended Efron-Morris estimator.

The historically distinct ideas of Stein, Mallows, and Pinsker on estimation of a high- dimensional parameter form the background to this paper.

References

1] R. Beran, Stein estimation in high dimensions: a retrospective, In: Madan Puri Festschrift, (E. Brunner and M. Denker, eds.), VSP, Zeist (1996), pp. 91{110.

2] M. Bilodeau and T. Kariya, Minimax estimators in the normal MANOVA model, J.

Multivariate Anal., 28 (1989), 260{270.

3] S. Yu. Efroimovich and M. S. Pinsker,Learning algorithm for nonparametric ltering, Automat. Remote Control, 45 (1984), pp. 1434{1440.

4] B. Efron and C. Morris, Multivariate empirical Bayes and estimation of covariance matrices, Ann. Statist., 4 (1976), pp. 22{32.

5] G. K. Golubev, Adaptive asymptotically minimax estimators of smooth signals, Prob- lems Inform. Transmission, 23 (1987), pp. 57{67.

6] A. A. Green, M. Berman, P. Switzer, and M. D. Craig,A transformation for ordering multispectral data in terms of image quality with implications for noise removal, IEEE Trans. Geosci. Remote Sensing, 26 (1985), pp. 65{74.

7] W. James and C. Stein, Estimation with quadratic loss, In: Proc. Fourth Berke- ley Symp. Math. Statist. Prob., 1, (J. Neyman, ed.), University of California Press, Berkeley (1961), pp. 361{380.

8] A. Kneip, Ordered linear smoothers, Ann. Statist., 22 (1994), pp. 835-866.

15

(16)

9] K.-C. Li, Asymptotic optimality for Cp, CL, cross-validation and generalized cross- validation: discrete index set, Ann. Statist., 15 (1989), pp. 958{976.

10] C. L. Mallows, Some comments on Cp, Technometrics, 15 (1973), pp. 661{676.

11] K. V. Mardia, J. T. Kent, and J. M Bibby, Multivariate Analysis, Academic Press, London, 1979.

12] M. Nussbaum,The Pinsker bound: a review, In: Encyclopedia of Statistical Sciences:

Update Volume 3, (S. Kotz, C. B. Read, eds.), Wiley, New York, to appear.

13] J. Pfanzagl, Incidental versus random nuisance parameters, Ann. Statist., 21 (1993), pp. 1663{1691.

14] M. S. Pinsker,Optimal ltration of square-integrable signals in Gaussian noise, Prob- lems Inform. Transmission, 16 (1980), pp. 120{133.

15] C. R. Rao and H. Toutenberg,Linear Models. Least Squares and Alternatives, Spring- er, New York, 1995.

16] C. Stein, Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, In: Proc. Third Berkeley Symposium Symp. Math. Statist. Prob., 1, (J.

Neyman, ed.), University of California Press, Berkeley (1956), pp. 197{206.

17] C. Stein, An approach to recovery of inter-block information in balanced incomplete block designs, In: Research Papers in Statistics. Festschrift for Jerzy Neyman (F. N.

David, ed.), Wiley, London (1966), pp. 351{366.

16

Referenzen

ÄHNLICHE DOKUMENTE

This paper employs recently developed techniques, group-mean panel estimator (include group-mean panel FMOLS and DOLS), for investigating the sustainability of fiscal policy

We consider seven estimators: (1) the least squares estimator for the full model (labeled Full), (2) the averaging estimator with equal weights (labeled Equal), (3) optimal

In this paper I use the National Supported Work (NSW) data to examine the finite-sample performance of the Oaxaca–Blinder unexplained component as an estimator of the population

Local polynomial regression (LPR) is a nonparametric approach that assumes that µ(t) is a smooth but unknown deterministic function of time, which can be approximated in a

Simulation results show that the metrics based on the sample autocorrelations, the sample partial autocorrelations, the Kullback-Leibler information measure and the normal-

Figure 2.2: Slave estimator in an (a) convex potential, fluctuating around a mean value, and in a (b) double-well potential, where the concave central region produces large spikes

The asymptotic distribution of OLS in stationary stochastic regression models including long memory processes was first examined by Robinson and Hidalgo (1997).. Specifically,

Section 4 reports simulations that demonstrate that the ESPL and the CESPL estimator have smaller bias than currently available estimators and that new tests for the validity of