THE RELATIONSHIP BETWEEN TWO
REPEATEDLY MEASURED VARIABLES
A. Ruckstuhl,A.H. Welshand R. J. Carroll
January 13, 1997
Abstract
We describe methods for estimating the regression function nonparametrically and for esti- mating the variance components in a simple variance component model which is sometimes used for repeated measures data or data with a simple clustered structure. We consider a number of dierent ways of estimating the regression function. The main results are that the simple pooled estimator which treats the data as independent performs very well asymptotically but that we can construct estimators which perform better asymptotically in some circumstances.
Key words and phrases: Local linear regression, local quasi-likelihood estimator, smoothing, semi- parametric estimation, variance components.
Short title.
Nonparametric Regression with Repeated MeasuresAndreas Ruckstuhl is Lecturer and Alan H. Welsh is Reader, Department of Statistics, Australian National University, Canberra ACT 2601. R. J. Carroll is Professor of Statistics, Nutrition and Toxicology, Department of Statistics, Texas A&M University, College Station, TX 77843{3143. Our research was supported by a grant from the National Cancer Institute (CA{57030). Carroll's research was partially completed while visiting the Institut fur Statistik und Okonometrie, Sonderforschungsbereich 373, Humboldt Universitat zu Berlin, with partial support from a senior Alexander von Humboldt Foundation research award.
In this paper, we consider the semiparametric model
Y
ij =i+m
(X
ij) +iji
= 1:::nj
= 1:::
=J
(1) feq:qa1g where i and ij are independent mean zero random variables with variances 2>
0 and 2>
0,respectively, and
m
( ) is an unknown smooth function. LetY
i= (Y
i1:::Y
iJ)t,X
i= (X
i1:::X
iJ)t, andm
(X
i) =fm
(X
i1):::m
(X
iJ)gt. The model implies that theY
iare independent withE
(Y
ijX
i)=
m
(X
i) and, ife
J = (1:::
1)tis theJ
-vector of ones, cov(Y
ijX
i) = =2e
Je
tJ +2I
. We ad- dress the general problem of how, whenJ
is xed and typically small, to estimate the functionm
( ) nonparametrically and, at the same time, how to estimate the variances 2 and2.We will show in Section 5 that the variance components (
22) can be estimated at the para- metric rateOP(n
;1=2) and thus can eectively be treated as known for the purpose of developing and analysing estimators ofm
( ). We therefore treat both variances as known for our theoretical investigation in Sections 2 - 4. For deniteness, we focus on the use of local linear kernel smooth- ing. Where local linear kernel smoothing yields surprising results (Section 3), we compare these results with results obtained using kernel (local average kernel smoothing) and local quadratic kernel smoothing.In Section 2, we investigate two simple approaches to the problem of estimating
m
(x
0) at a xed pointx
0. The rst, which we call pooled estimation, ignores the dependence structure in the model (1) and simply ts a single nonparametric regression model (with a bandwidth depending onx
0 but notj
) through all the data. The second approach, which we call component estimation, involves tting separate nonparametric regression models relating thej
th component ofY
to thej
th component ofX
(allowing dierent local bandwidths atx
0 for each component,j
= 1:::J
) and then combining these estimators to produce an overall estimator of the common regression functionm
(x
0).Pooled estimation has the advantage of simplicity, since only one regression t is required.
Component estimation requires
J
regression ts and may be adversely aected by boundary ef- fects: if the support of the components ofX
depends onj
, the components estimators may end up combining estimators from components aected by boundary eects with estimators from compo- nents unaected by them. However, we show here that for local linear kernel estimation, pooled estimation is asymptotically equivalent to the optimal linear combination of the component esti- mators. The property on which this result depends is that for local polynomial kernel regression,the estimators of the component functions are asymptotically independent. (The well-known corre- spondence between local polynomial kernel regression with local bandwidths and local polynomial nearest neighbor (loess) regression suggests that the same asymptotic independence results holds for the latter.)
Severini & Staniswalis (1994) introduced quasi-likelihood estimation for so-called partially linear models, which consist of a linear parametric component, a nonparametric component, and a general covariance structure. Hence model (1) is a simple special case of a partially linear model. We discuss quasi-likelihood estimation in the context of model (1) in Section 3. Severini and Staniswalis focus their analysis on the problem of estimating and deriving asymptotic results for estimators of the parameters of the parametric component of a partially linear model, while we derive asymptotic results for the estimator of the nonparametric component based on local polynomial estimators restricting, however, our attention to simpler models like (1). Since the calculations yield a compli- cated expression for the asymptotic variance, preventing direct comparisons of estimation methods, we explore in detail the case of independent and identically distributed explanatory variables
X
. In this case, the asymptotic variance of the locally linear quasi-likelihood estimator is larger than that of the pooled estimator. We found this result surprising so explored the properties of kernel and local quadratic kernel smoothing in quasi-likelihood estimation. We found that (i) the asymp- totic variance of the locally linear quasi-likelihood estimator is even larger than that of a (locally averaged) kernel quasi-likelihood estimator (without the bias necessarily being smaller) and (ii) the asymptotic variance of a locally quadratic quasi-likelihood estimator is of a dierent order than that of the locally linear quasi-likelihood estimator, namely of orderOP (nh
5);1). The increase in the size of the variance of the locally linear quasi-likelihood estimator compared to that of the pooled estimator is caused by the o-diagonal elements of the inverse covariance matrix. In Section 3 we also show that a modied version of the quasi-likelihood estimator in which the inverse covariance matrix is replaced by the diagonal matrix with the diagonal of the inverse covariance matrix on its diagonal, results in an estimator which is asymptotically equivalent to the pooled estimator.Although the pooled estimator is the (asymptotically) best estimator we have considered so far and is easy to apply, it makes no use of the covariance structure in the components of
Y
and therefore ought to be capable of being improved upon. Because of the local nature of nonparametric regression, constructing an estimator which accounts for the covariance structure and improves upon the pooled estimator is a surprisingly dicult task (cf. Section 3). In Section 4 we propose a two-step estimator. The intuition for it is very simple: in model (1), multiply both sides ofthe model by the square-root of the inverse covariance matrix and rearrange terms so that we have "expression"=
m
(X
ij) +ij, where the ij are now independent and identically distributed.The "expression" depends on
m
(X
ij) which we estimate by the pooled estimator. The two-step estimator has a smaller asymptotic variance than the pooled estimator and an asymptotic bias which can be smaller than the pooled estimator.We require the following assumptions.
C{1
K
( ) is a symmetric, compactly supported, bounded kernel density function with unit variance and deneK
h(v
) =h
;1K
(v=h
), with bandwidthh
. Let (r
) = Rz
rK
(z
)dz
and (r
) =Rz
rK
2(z
)dz <
1r
= 12with =(0)>
0.C{2
h
!0 asn
!1 such thatnh
!1. C{3m
( ) has continuous second derivatives.C{4
x
0 is an interior point of the support of the distribution ofX
i, the density ofX
ij is twice continuously dierentiable, andf
j( ), the marginal density ofX
ij, satisesf
j(x
0)>
0.For local linear quasi-likelihood estimation we require longer expansions (and hence stronger con- ditions) than for the other estimation methods. In this case, we replace conditions
C
3;C
4 by the following stronger conditions.C{5
m
( ) has continuous fth derivatives.C{6
x
0 is an interior point of the support of the distribution ofX
i, the density ofX
ij is continuously dierentiable, the marginal densityf
j ofX
ij is twice continuously dier- entiable and satisesf
j(x
0)>
0, and the bivariate joint densityf
jk ofX
ij is twice continuously dierentiable. Moreover, we requireZ (
x
j;x
0)2f
jk(x
jx
0)dx
j<
1Z
(
x
j;x
0)r@
`(
@x
0)`f
jk(x
jx
0)dx
j<
1l
= 12r
= 12:
It is sometimes helpful to frame results in the context of an arbitrary covariance matrix. When this is the case, the covariance matrix of
Y
givenX
is denoted = (jk), the inverse covariance matrix is denotedV
= ;1 = (v
jk), and we setv
k = PJj=1v
kj. Recall that under the variance component model, the covariance matrix ofY
givenX
is =2I
+2e
Je
tJ soV
= ;1=;2nI
;(d
J=J
)e
Je
tJoV
1=2 = ;1I
;hn1;(1;d
J)1=2o=J
ie
Je
tJwhere
d
J =J
2=
(2 +J
2). Under the variance component model, the diagonal and o-diagonal elements of these matrices are constant so it is convenient to denote the diagonal elements jjof by
2d, the diagonal elementsv
jj and o-diagonal elementsv
jk ofV
= ;1 asv
d andv
orespectively, and the diagonal and o-diagonal elements of
V
1=2 by ~v
d and ~v
orespectively. Finally, under the variance component model,v
k is also constant so we writev
k=v
.2 POOLED AND COMPONENT ESTIMATION
fsec:componentg
The pooled estimator
m
bpool(x
0h
) ofm
(x
0) is dened as the local linear kernel regression estimator with kernel functionK
( ) and bandwidthh
when all theY
's andX
's are combined into a single data set of lengthnJ
. That ism
bpool(x
0) = (10)8
<
:
n
;1Xni=1 J
X
j=1
1
(
X
ij;x
0)=h
1
(
X
ij ;x
0)=h
tK
h(X
ij ;x
0)9
=
;1
8
<
:
n
;1Xni=1 J
X
j=1
1
(
X
ij;x
0)=h Y
ijK
h(X
ij ;x
0)9
=
:
The optimal pooled estimator minimizes the mean squared error of
m
bpool(x
0h
) atx
0 overh
. To dene the components estimatorm
bW(x
0h c
), forj
= 1:::J
, letm
bj(x
0h
) be the local linear kernel regression estimator of the (Y
ij) on the (X
ij), with bandwidthh
j. That ism
bj(x
0h
j) = (10)(
n
;1Xni=1
1
(
X
ij;x
0)=h
j1
(
X
ij;x
0)=h
jt
K
hj(X
ij ;x
0))
;1
(
n
;1Xni=1
1
(
X
ij ;x
0)=h
jY
ijK
hj(X
ij;x
0))
:
Then if
h
= (h
1:::h
J)t andc
= (c
1:::c
J)t, the components estimator is the weighted average of the component estimators given bym
bW(x
0h c
) =XJj=1
c
jm
bj(x
0h
j) XJj=1
c
j = 1:
(2) feq:qc1gThe optimal components estimator minimizes the mean squared error at
x
0 over bothh
andc
. The following result is proved in appendix A.1 and A.2.Theorem 1
Suppose that conditionsC
1;C
4 hold. Denes
(x
0) =J
;1PJj=1f
j(x
0). For local linear kernel regression, the optimal pooled estimator and the optimal components estimator are asymptotically equivalent. The bias, variance, optimal bandwidth and mean squared error at thisoptimal bandwidth for the former are given by
biasf
m
bpool(x
0h
)g (1=
2)h
2m
(2)(x
0) (3) feq:qfg1gvarf
m
bpool(x
0h
)g (2+2)fnhJs
(x
0)g;1 (4) feq:qfg2gh
5optpool(local linear) (2+2)nJs
(x
0)fm
(2)(x
0)g2 (5) feq:qfg3gmseoptpool(local linear) (5
=
4)fm
(2)(x
0)g2=5h(2+2)fnJs
(x
0)g;1i4=5:
(6) feq:qfg4gfth:pooledg
Theorem 1 shows that the pooled estimator has the same asymptotic properties under the model (1) as it has under the nonparametric regression model in which the errors are independent and identically distributed with variance
2+2. Moreover, working componentwise as in the compo- nents estimator does not enable us to make use of the known dependence structure in the model (1) in the sense that we can do no better than using the pooled estimator.3 QUASI-LIKELIHOOD ESTIMATION
fsec:qleg
In this section we apply Severini & Staniswalis' (1994) proposal to model (1). Because of its undesirable asymptotic properties we modify the quasi-likelihood estimator in the second part of this section. The modication yields an estimator asymptotically equivalent to the pooled estimator.
3.1 Ordinary Quasi-likelihood Estimator
fsubsec:sev-stang
Recall that
V
= ;1. Thenm
bpqle(x
0h
), the intercept in the solution ofn
X
i=1
2
6
6
4
1
:::
1: ::: :
: ::: :
(
X
i1;x
0)p:::
(X
iJ ;x
0)p3
7
7
5
V
2
6
6
4
K
h(X
i1;x
0)fY
i1;Ppk=0
bk(X
i1;x
0)kg: :
K
h(X
iJ;x
0)fY
iJ ;Ppk=0
bk(X
iJ;x
0)kg3
7
7
5=
0
(7) feq:qleg is the local polynomial version of the quasi-likelihood estimator in Severini & Staniswalis (1994,Equation (18)) for model (1). The local linear quasi-likelihood estimator which we consider rst has
p
= 1.In Appendix A.3, we prove the following asymptotic results:
Theorem 2
Suppose that conditionsC
1;C
2 andC
5;C
6 hold. Letm
bpqle(x
0h
) be the solution of (7). Thenbiasf
m
b1qle(x
0h
)gh
2m
(2)(x
0)=
2varf
m
b1qle(x
0h
)g = OPn(
nh
);1)o:
For the variance component model (1), where the
X
k's are independent and identically distributed with marginal distributionf
( ) and varianceX2 , the asymptotic variance reduces tovarf
m
b1qle(x
0h
)g (0)(2+2)nhJf
(x
0)8
<
:1 +
J
;f
(1)(x
0)f
(x
0)!
2
(
J
;1)X29
=
:
where
=J
2=
(J
2 +2). fth:qlegNote, that the local linear quasi-likelihood estimator has the same asymptotic bias but a larger asymptotic variance than that of the pooled estimator in the variance component model with inde- pendent and identically distributed
X
j's. In view of this result (and the lack of results on estimating the nonparametric component of the model referred to in the introduction), we also obtained re- sults for kernel and local quadratic quasi-likelihood estimators (p
= 0 andp
= 2 respectively) of the regression function.Under similar conditions to those in Theorem 2, we show in Appendix A.3 that the kernel estimator has
biasf
m
b0qle(x
0h
)gh
2(
m
(1)(x
0)PJk=1
v
kf
k(1)(x
0)PJk=1
v
kf
k(x
0) +m
(2)(x
0)=
2)
varfm
b0qle(x
0h
)g (0)nPJk=1(v
k)2kkf
k(x
0)onh
PJk=1v
kf
k(x
0)2where
v
k=PJj=1v
kj withv
jk the elements ofV
. For a variance component model (1), where theX
k's are independent and identically distributed with marginal distributionf
( ) and variance X2, the asymptotic variances reduce tovarf
m
b0qle(x
0h
)g (0)(2+2)nhJ f
(x
0):
The kernel quasi-likelihood estimator has the same asymptotic variance as the pooled estimator.
However, as it is generally dependent on the design, its bias also depends on the design.
For the local quadratic quasi-likelihood estimator, we show that biasf
m
b2qle(x
0h
)gh
4(6);2(4) (4);1(
m
(3)(x
0) 3!S
NS
D ;m
(4)(x
0) 4!)
where
S
D andS
N are given in (23) and (24), respectively, and for the variance component model (1), where theX
k's are independent and identically distributed with marginal distributionf
( ) and variance X2 , the asymptotic variance isvarf
m
b2qle(x
0h
)g (0)(2+2)nh
5Jf
(x
0)
J
;2 (
J
;1)f
(4);1g2Z
(
x
;x
0)4f
(x
)dx
;X2 2:
where
=J
2=
(J
2 +2). The asymptotic variance of the local quadratic estimator is of higher order OP(n
;1h
;5) thanOP(n
;1h
;1) obtained by the pooled estimator.3.2 Modied Quasi-likelihood Estimator
fsubsec:mqleg
Analysing the proof of the asymptotic results for the quasi-likelihood estimator, the slow rate of convergence of the asymptotic variance is caused by the o-diagonal elements of
V
= ;1. This suggests that we modify the quasi-likelihood estimator by replacingV
= ;1 byV
= diag(;1).Theorem 3
Suppose the conditions of Theorem 2 hold. Letm
b1mqle(x
0h
) be the solution of (7) forp
= 1, whereV
= ;1= (v
jk) is replaced byV
= diag(;1). Thenbiasf
m
b1mqle(x
0h
)gh
2m
(2)(x
0)=
2 varfm
b1mqle(x
0h
)g 1nh
PJj=1
v
jj2 jjf
j(x
0)n
PJj=1
v
jjf
j(x
0)o2 (0):
where = (
jk). For the variance component model (1), the asymptotic variances reduce to varfm
b1mqle(x
0h
)g 1nh
(2+2)PJj=1
f
j(x
0) (0)and hence the local linear modied quasi-likelihood estimator is asymptotically equivalent to the
pooled estimator. fth:mqleg
For the kernel and local quadratic modied quasi-likelihood estimators, we obtain biasf
m
b0mqle(x
0h
)gh
28
<
:
m
(1)(x
0)PJj=1
v
jjf
j(1)(x
0)PJj=1
v
jjf
j(x
0) +m
(2)(x
0)=
29
=
biasf
m
b2mqle(x
0h
)gh
42(4);(6) (4);18
<
:
m
(3)(x
0) 3!PJj=1
v
jjf
j(1)(x
0)PJj=1
v
jjf
j(x
0) +m
(4)(x
0) 4!9
=
and
varf
m
bpmqle(x
0h
)g 1nh
PJj=1
v
2jjjjf
j(x
0)n
PJj=1
v
jjf
j(x
0)o2F
p where = (jk) andF
p = (0) forp
= 0F
p = (0)2(4)f(0);(2)g+(4);(0)f
(4);1g2 forp
= 2:
For the variance component model (1), the asymptotic variances reduce to varf
m
bpmqle(x
0h
)g 1nh
(2 +2)PJj=1
f
j(x
0)F
p:
For
p
= 2, the modied quasi-likelihood estimator is asymptotically better than the quasi- likelihood estimator because its variance converges at a faster rate. Forp
= 0, it is easy to see that a sucient condition for asymptotic equivalence isv
j=v
jj =constant
which is, for example, satised by the variance component model (1).Note that both the asymptotic bias and the asymptotic variance are invariant to multiplying
V
by a constant i.e., the matrixV
has to be determined only up to a multiplicative factor.4 TWO-STEP ESTIMATION
fsec:combineg
In this section, we propose a two-step estimator which exhibits some asymptotic improvement over the pooled and modied quasi-likelihood estimators.
Again let
V
= ;1and letV
1=2be its symmetric square root. LetL
=V
1=2 and letm
b1pool( ) be the pooled estimator of Section 2. WriteZ
=LY
;(L
;I
)cm
1pool(X
):
We propose to estimate
m
(x
0) bym
bC(x
0), the local linear kernel regression estimator of the re- gression of theZ
's on theX
's, That is, by solvingm
bC(x
0) = (10)8
<
:
n
;1Xni=1 J
X
j=1
1
(
X
ij ;x
0)=h
1
(
X
ij;x
0)=h
tK
h(X
ij;x
0)9
=
;1
8
<
:
n
;1Xni=1 J
X
j=1
1
(
X
ij;x
0)=h Z
ijK
h(X
ij ;x
0)9
=
:
The intuition for this estimator is very simple: write
Y
=m
(X
)+, multiply both sides by ;1=2 and rearrange terms so that we have "expression
" =m
(X
)+, where the i are now independent and identically distributed. The "expression
" we obtain equalsZ
.In the appendix, we prove the following result:
Theorem 4
Suppose the conditions of Theorem 1 hold. Dened
J =J
2=
(2+J
2). Then, for>
0,biasf
m
bc(x
0h
)g ;1=
2h
2m
(2)(x
0)v
~d;(1
=
2)h
2v
~oPJj=1
f
j(x
0)Pk6=jE
fm
(2)(X
1k)jX
1j=x
0gPJj=1
f
j(x
0) (8) feq:qd1gvarf
m
bc(x
0h
)g (nh
);128
<
:
J
X
j=1
f
j(x
0)9
=
;1
(9) feq:qd2gwhere
v
~d andv
~o are the diagonal and the o-diagonal elements ofV
1=2, respectively. fth:combgAn optimal
can be determined by minimizing the asymptotic mean squared error. The minimization problem results in a cubic equation in. Note that since the bias is design dependent (because of the structure ofZ
), so also is the optimal .If we choose
equal to ;1 then the asymptotic variance of the two-step estimator is smaller than that of the pooled estimator. If we wish to treatZ
=m
(X
) + in the same scale as the original data (1), we should set equal to ~v
d;1. In this case, 2 = 2h1;n1;(1;d
J)1=2o=J
i;2 whered
J =J
2=
(2+J
2). Thus the asymptotic variance isvarf
m
bc(x
0h
)g (nh
);12PJj=1
f
j(x
0)1; 1;(1;d
J)1=2=J
2:
Now 221;f1;(1;d
J)1=2g=J
];2 2+2so the two-step estimator with
= ~v
o;1 has larger asymptotic variance than the two-step estimator with =;1 but still has smaller asymptotic variance than the pooled estimator. In either case, the asymptotic biases of the two estimators are dicult to compare but note that the asymptotic bias of the two-step estimator can be smaller than that of the pooled estimator because ~v
o is negative, allowing the possibility of cancellation to occur.5 ESTIMATION OF THE VARIANCE COMPONENTS
fsec:varcompg
Let
Y
= (Y
t1::: Y
tn)tbe the vector of pooled responses and letE
be the deviations ofY
from the regression linefm
t(X
1)::: m
t(X
n)gt. For all estimators used in this paper, there is an (nJ
)(nJ
) matrixS
with the property that the vector of predicted values equalsSY
, and hence the vector of residuals is (I
;S
)Y
=D
explicit formulae in special cases are given in the appendix.The simplest approach to estimating the variance components is to pretend that the residuals have mean zero and covariance matrix the same as if
m
( ) were known. For example, the Gaussian"likelihood" for
=2 and =2+J
2 can be written as;
n
(J
;1)log;n
log; 1n
X
i=1 J
X
j=1
n
Y
ij ;m
(X
ij);(Y
i;m
i)o2;J
n
X
i=1(
Y
i;m
i)2 whereY
i =n
;1PJj=1Y
ij andm
i =n
;1PJj=1m
(X
ij). This "likelihood" is maximized at b =J n
n
X
i=1(
Y
i;m
i)2b = 1
n
(J
;1)n
X
i=1 J
X
j=1
n
Y
ij;m
(X
ij);(Y
i;m
i)o2 when b>
b and at b =b = 1nJ
n
X
i=1 J
X
j=1
f
Y
ij ;m
(X
ij)g2otherwise. Substituting a consistent estimator of
m
( ) yields consistent estimates of (22), and from results of Gutierrez & Carroll (1996) combined with (10), it can be shown that the resulting estimators have the same limit distribution as ifm
( ) actually were known.However, as described below the covariance matrix of the residuals is not the same as if
m
( ) were known, and following the procedure used in many venues (e.g., Chambers & Hastie, 1992, pp.368{369), we can adjust for the loss of degrees of freedom due to estimating
m
( ). In practice, we center the residuals at their mean, usingD
=D
;e
tnJD =
(nJ
), which has approximately mean zero and the covariance matrixC
(22) =2C
1 +2C
2, whereC
1 andC
2 are the known (nJ
)(nJ
) matricesC
1 = (I
;S
)diag(e
Je
tJ)(I
;S
)tC
2 = (I
;S
)(I
;S
)t:
In principle, we can still use normal-theory maximum likelihood (with covariance
C
(22) =2C
1+
2C
2) to estimate (22). However, the diculty with maximum likelihood in this context is that the (nJ
)(nJ
) matrixC
( ) is impractical to invert. We consider two alternative methods of adjustment.One approach is to make a restricted maximum likelihood (REML) style adjustment by substi- tuting the estimate of
m
( ) into the estimating equations, taking their (approximate) expectations, subtracting these expectations from the original estimating equations and then solving the resulting(approximately) unbiased estimating equations. For the case that
b>
b, after some considerable algebra, we obtain the approximately unbiased estimating equations0 =
D
tUU
tD
;w
1;J u
1 0 =D
tD
;w
2;J u
2 wherew
r = trace(W
r; 1J UU
tW
r)u
r = trace(UU
tW
r)U
=0
B
B
@
e
J0 ::: 0 0 e
J::: 0 : ::: : : 0 ::: 0 e
J1
C
C
A
W
1 = (I
;S
)tUU
t(I
;S
)W
2 = (I
;S
)t(I
;S
):
Solving these two equations, we obtain b =u
2D
tUU
tD
;u
1D
tD w
1u
2;w
2u
1b =
J w
1D
tD
;w
2D
tUU
tD
w
1u
2;w
2u
1:
Alternatively, we can abandon the "likelihood" and employ a method of moments device. Let otrace( ) be the sum of the o-diagonal elements of a matrix. Then we can solve the two equations
trace(
DD
t) = 2trace(C
1) +2trace(C
2) otrace(DD
t) = 2otrace(C
1) +2otrace(C
2) so that b2 = otrace(C
1)trace(D D
t);trace(C
1)otrace(D D
t) otrace(C
1)trace(C
2);trace(C
1)otrace(C
2) b2 = otrace(C
2)trace(D D
t);trace(C
2)otrace(D D
t)otrace(
C
2)trace(C
1);trace(C
2)otrace(C
1):
These estimators can be shown to have the same limiting distribution as the method of moments estimators for known
m
( ), namely b2fm
( ) knowng = otrace(E E
t)=
fnJ
(nJ
;1)g b2fm
( ) knowng = ftrace(E E
t)=
(nJ
)g;b2:
fsec:discussg
We have considered a number of dierent approaches (more than we have reported on here) to estimating the regression function when we have a simple dependence structure between obser- vations. The simple pooled estimator which ignores the dependence structure performs very well asymptotically. Intuitively, this is because dependence is a global property of the error structure which (at least in the form we have examined) is not important to methods which act locally in the covariate space. Specically, in the limit, local estimation methods are eectively dealing only with independent observations.
The performance of the pooled estimator raises the question of whether there is some method of local estimation which nonetheless exploits the dependence structure in such a way that it performs better than the pooled estimator. The quasi-likelihood estimator is very appealing for estimating the parametric component in a partially linear model and the general approach for estimating nonparametric components described by Carroll, Ruppert and Welsh (1996) suggests that the extension we have considered in this paper is well worth considering. We were surprised to nd that quasi-likelihood estimation is asymptotically no better than pooled estimation. After trying a number of alternative approaches, we discovered that the two-step method has smaller asymptotic variance than the pooled estimator but does not necessarily have a lower asymptotic bias. The question of whether it is possible to construct an estimator with uniformly smaller asymptotic mean squared error than the pooled estimator remains open.
It is interesting to note that even if we were to assume a parametric form for the regression function, we would gain con!icting intuition into the problem of estimating the regression function in our problem. First notice that if we were to assume a constant regression function, then the maximum likelihood estimator (under Gaussianity) of the constant regression function is the sample mean which is, in this context, the pooled estimator. On the other hand, if we assume a linear regression function, the maximum likelihood estimator (under Gaussianity) of the linear regression function is the weighted least squares estimator which performs better than the least squares estimator which is, in this context, the pooled estimator. Thus the intuition we gain depends on which parametric model we consider.
REFERENCES
Carroll, R. J., Ruppert, D. & Welsh, A. (1996). Nonparametric estimation via local estimating equations, with applications to nutrition calibration. Preprint.
Carroll, R. J. & Wand, M. P. (1991). Semiparametric estimation in logistic measurement error models. Journal of the Royal Statistical Society, Series B, 53, 573{585.
Chambers, J. M. & Hastie, T. J. (1992). Statistical Models in S. Wadsworth & Brooks/Cole, Pacic Grove, CA.
Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association, 87, 998-1004.
Gutierrez, R. G. & Carroll, R. J. (1996). Plug-in semiparametric estimating equations. Biometrika, to appear.
Ruppert, D., and Wand, M. P. (1994). Multivariate Locally Weighted Least Squares Regression, Annals of Statistics, 22, 1346{1370.
Severini, Th. A., and Staniswalis, J. G. (1994). Quasi-likelihood Estimation in Semiparametric Models, Journal of the American Statistical Association, 89, 501{511.
Appendix A PROOFS OF THEOREMS
fsec:appg
A.1 Proof of Theorem 1
fpr:pooledg
We rst derive the results for the component estimator. From Fan (1992), Ruppert & Wand (1994) and Carroll, Ruppert & Welsh (1996), we have the results:
biasf
m
bpj(x
0h
)g(1=
2)h
2m
(2)(x
0)varf
m
bpj(x
0h
)g(2+2)fnhf
j(x
0)g;1m
bpj(x
0h
);m
(x
0h
);(h
2=
2)m
(2)(x
0)fnf
j(x
0)g;1Xni=1f
Y
ij ;m
(X
ij)gK
h(X
ij;x
0)(10):
feq:qc2g The last step is implicit in the rst two papers and explicit in the third. It is easily seen from (10)that for
j
6=k
, covfm
bpj(x
0h
)m
bpk(x
0h
)g=O
(n
;1), and hence for asymptotics arguments, the component estimatorsm
bpj(x
0h
) are independent.It thus follows that
biasf
m
bW(x
0h c
)g(1=
2)m
(2)(x
0)XJj=1
c
jh
2j =XJj=1
c
jb
j(x
0h
j) varfm
bW(x
0h c
)g(2 +2)n
;1XJj=1
c
2jfh
jf
j(x
0)g;1=XJj=1
c
2jv
j(x
0h
j):
The individual component bias functions are
b
j(x
0h
j) and the individual component variance functions arev
j(x
0h
j). The problem becomes to minimize (inh
andc
) the functionmseW(
x
0h c
)8
<
:
J
X
j=1
c
jb
j(x
0h
j)9
=
2
+XJ
j=1