• Keine Ergebnisse gefunden

, and J. Nielsen

N/A
N/A
Protected

Academic year: 2022

Aktie ", and J. Nielsen"

Copied!
39
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

PROPERTIES OF A BACKFITTING

PROJECTION ALGORITHM UNDER WEAK

CONDITIONS

O. Linton

1

, E. Mammen

2

, and J. Nielsen

3

May 8, 1998

Abstract

We derive the asymptotic distribution of a new backtting procedure for estimating the closest additive approximation to a nonparametric regression function. The procedure employs a recent projection interpretation of popular kernel estimators provided by Mammen et al.

(1997), and the asymptotic theory of our estimators is derived using the theory of additive projections reviewed in Bickel et al. (1995). Our procedure achieves the same bias and variance as the oracle estimator based on knowing the other components, and in this sense improves on the method analyzed in Opsomer and Ruppert (1997). We provide `high level' conditions independent of the sampling scheme. We then verify that these conditions are satised in a time series autoregression under weak conditions.

1

(2)

AMS 1991 subject classications. primary 62G07 , secondary 62G20

Keywords and phrases. Additive models Alternating projections Backtting Kernel Smooth- ing Local Polynomials Nonparametric Regression.

Short title. Backtting under weak conditions.

1 Introduction

Separable models are important in exploratory analyses of nonparametric regression. The backtting technique has long been the state of the art method for estimating these models, see Hastie and Tibshirani (1991). While backtting has proven very useful in application and simulation studies, it has been somewhat dicult to analyze theoretically, which has long been a drawback to its universal acceptance. Recently, a new method, called marginal integration, has been proposed, see Linton and Nielsen (1995), Tjstheim and Auestad (1994) and Newey (1994), see also earlier work by Auestad and Tjstheim (1991)]. This method is perhaps easier to understand for non-statisticians since it involves averaging rather than iterative solution of nonlinear equations. Its statistical properties are trivial to obtain, and have been established in the aforementioned papers. Although tractable, marginal integration is not generally ecient. Fan, Mammen, and Hardle (1996) and Linton (1996) showed how to improve on the eciency of the marginal integration estimator in regression; in the latter paper, this was achieved by carrying out one backtting iteration from this initial consistent

0

1Cowles Foundation for Research in Economics, Yale University, 30 Hillhouse Avenue, New Haven, CT 06520-8281, USA. Phone: (203) 432-3699. Fax: (203) 432-6167. http://www.econ.yale.edu/~linton. Supported by the National Science Foundation and the North Atlantic Treaty Organization.

2Institut fur Angewandte Mathematik, Ruprecht-Karls-Universitat Heidelberg, Im Neuenheimer Feld 294, 69120 Hei- delberg, Germany Supported by the Deutsche Forschungsgemeinschaft, Sonderforschungsbereich 373 "Quantikation und Simulation okonomischer Prozesse", Humboldt-Universitat zu Berlin.

3PFA Pension, Sundkrogsgade 4, DK-2100 Copenhagen, Denmark

2

(3)

starting point. This modication actually achieves full oracle eciency, i.e., one achieves the same result as if one knew the other components. This suggests that backtting itself is also ecient in the same sense. Moreover, backtting, since it relies only on one-dimensional smooths is free from the curse of dimensionality.

Recent work by Opsomer and Ruppert (1997) and Opsomer (1997) has addressed the algorithmic and statistical properties of backtting. Specically, they gave sucient conditions for the existence and uniqueness of a version of backtting, or rather an exact solution to the empirical projection equations, suitable for any (recentred) smoother matrix. They also derived an expansion for the conditional mean squared error of their version of backtting: the asymptotic variance is equal to the oracle bound, while the precise form of the bias, as for the integration method, depends on the way recentering is carried out, but in any case is not oracle, except when the covariates are mutually independent. This important work conrms the eciency, at least with respect to variance, of their version of] backtting. Unfortunately, their version of backtting is not design adaptive, which is somewhat surprising given that they use local polynomial smoothers throughout. Furthermore, their proof technique required rather strong conditions specically, the amount of dependence in the covariates was strictly limited.

In this paper, we dene a new backtting-type estimator for additive nonparametric regression.

We make use of an interpretation of the Nadaraya-Watson estimator and the local linear estimator as projections in an appropriate Hilbert space, which was rst provided by Mammen et al. (1997).

Our additive estimator is dened as the further projection of these multivariate estimators down on the space of additive functions. We examine this estimator and show how; in both the Nadaraya- Watson case and in the local linear case;the estimator can be interpreted as a backtting estimator dened through iterative solution of the empirical equations. We establish the geometric convergence of the backtting equations to the unique solution using the theory of additive projections, see Bickel et al. (1995). We use this result to establish the limiting behaviour of the estimates: we give both

3

(4)

the asymptotic distribution and a uniform convergence result. Our procedure achieves the same bias and variance as the oracle estimator based on knowing the other components, and in this sense improves on the method analyzed in Opsomer and Ruppert (1997). Although the criterion function is dened in terms of the high-dimensional estimates, we show that the estimator is also characterized by equations that only depend on one- and two-dimensional marginals, so that the curse of dimensionality truly does not operate here. Our rst results are established using ideas from Hilbert space mathematicsand hold under `high level'conditions, which are formulatedindependently of specic sampling assumptions. We then verify these conditions in a time series regression with strong mixing data. Our conditions are strictly weaker than those of Opsomer and Ruppert (1997), and do not necessarily restrict the dependence between the covariates in any way.

This paper is organized as follows. In section 2 we show how local polynomial estimators can be interpreted as projections. In section 3 we introduce our additive estimators in the simplest situation, i.e., for the Nadaraya-Watson-like pilot estimator, establishing the convergence of the backtting algorithm and the asymptotic distribution of the estimator under high level conditions that are suitable for a range of sampling schemes. In section 4 we extend the analysis to local polynomials. In section 5 we investigate a time series setting and give primitive conditions that imply the high level conditions. In section 6 we illustrate our procedure on nancial data. All proofs are contained in the appendix.

2 A projection interpretation of the local polynomials

Let YX be random variables of dimensions 1 and d respectively and let (Y1X1):::(YnXn) be a random sample drawn from (YX): We rst provide a new interpretation of local polynomial estimators of the regression function m(x1:::xd) = E(YjX = x) evaluated at the vector x = (x1:::xd)T see Mammen, Marron, Turlach and Wand (1997). This new point of view will be useful

4

(5)

for interpreting our estimators of the restricted additive functionm(x) = +m1(x1)++md(xd).

The full dimensional local q'th order] polynomial regression smoother, which we denote by

m

b(x) = (mb0(x) ::: mbqd(x))T, satises

m

b(x) = argminm

=(m0:::mq d)T n

X

i=1

Yi;m0;

X1i;x1 h

m1;:::;

Xid;xd

h

q

mqd

2Yd

`=1Kh(Xi`;x`) where q is the order of the polynomial approximation. In fact, for simplicity of notation we will(1) concentrate on the local linear case considered in Ruppert and Wand (1995) for whichq = 1 ; the Nadaraya-Watson case, for whichq = 0 is even simpler, see below. Dene the matrices of dimension n(d + 1) and nn, respectively]

X

(x) =

0

B

B

B

@

1 X11h;x1 Xd1h;xd ... ... ... ...

1 X1nh;x1 Xdnh;xd

1

C

C

C

A

K

(x) = diag Qd`=1Kh(X`1;x`) Qd`=1Kh(Xn`;x`) and write

m

b(x) =

X

(x)T

K

(x)

X

(x);1

X

(x)T

K

(x)

Y

V

b;1(x)

R

b(x) (2) where

Y

= (Y1:::Yn)T,

V

b(x) =

X

(x)T

K

(x)

X

(x) and

R

b(x) =

X

(x)T

K

(x)

Y

.

For the new interpretation of local linear estimators we think of the data

Y

= (Y1:::Yn)T as an element of the space of tuples of 2n functions

F =(fij :i = 1:::nj = 0:::d) : Here, fij are functions fromRd to R:

We do this by puttingfi0(x)Yi and fij(x)0 for j 6= 0. We dene the following norm on F

kfk2 =

Z 1 n

n

X

i=1

hfi0(x) +Xd

j=1 fij(x)xj ;Xij h

i

2Yd

j=1Kh(Xij;xj) dx 5

(6)

whereKh() =K(=h)=h with K() a univariate kernel. Consider now the following subspaces of F:

Ffull = ff 2F :fij does not depend on i for j = 0:::dg

Fadd = ff 2Ffull:fi0(x) = g1(x1) +:::gd(xd) for some functions gj :R!Rfor j = 1:::d and fij(x) = gj(xj) for some functions gj :R!Rfor j = 1:::d if j 6=0g: The estimate

m

b(x) denes an element of F by putting fij(x) = mbj(x) j = 01:::d. This is an element of Ffull. It is easy to see that, with respect to k k,

m

b is the orthogonal projection of

Y

onto Ffull: Below we introduce our version

m

e of the backtting estimator as the orthogonal projection of

m

b onto Faddwith respect to k k]. For an understanding of

m

e it will become essential that it be the orthogonal projection of

Y

onto Fadd. For the denition of such norms and linear spaces for higher order local polynomials and for other smoothers we refer to Mammen, Marron, Turlach and Wand (1997). Each local polynomial estimator corresponds to a specic choice of inner product in a Hilbert space, and the denition of the corresponding additive estimators is then the projection further down onFadd: In particular, for the local constant estimator (Nadaraya Watson-like smoothers) one chooses:

F = (fi :i = 1:::n) : Here, fi are functions fromRd to R

Ffull = f 2F :fi does not depend on i

Fadd = f 2Ffull:fi(x) = g1(x1) +:::gd(xd) for some functions gj :R!R

kfk2 =

Z 1 n

n

X

i=1 fi(x)]2Yd

j=1Kh(Xij ;xj) dx:

Note that for functions

m

in Ffull i.e. m := m1 =::: = mn] we get

k

m

k2=

Z

m(x)2p(x) dx^

where ^p(x) = n;1Pnj=1Kh(Xij;xj) is the kernel density estimate of the design density. In particular, in this case

m

e is the projection of the full dimensional Nadaraya-Watson estimate onto the subspace

6

(7)

of additive with respect to the norm of the space

L

2(bp). We give a slightly dierent motivation for the projection estimate

m

e in the next section, see (7). There we will discuss the case of local constant smoothing in detail.

3 Estimation with Nadaraya Watson-Like Smoothers

In this section we will motivate our backtting estimate based on regression smoothers like the Nadaraya-Watson

m(x) = nb ;1 Pn

i=1

Qd

`=1Kh(x`;Xi`)Yi

n;1Pni=1Qd`=1Kh(x`;Xi`) : (3) The specic choice of the Nadaraya-Watson estimator is not important, but the smoother is supposed to have the ratio form

m(x) =b br(x) p(x) =b

n

X

i=1 wi(x)Yi (4)

wherep(x) which depends only onb Xn =fX1:::Xng is an estimator of p(x) the marginal density of X: Here, the weighting sequence fwi(x)gni=1 only depends on Xn as does the weighting sequence

fwi(x)gni=1 of the numerator br(x) =Pni=1wi(x)Yi: The assumption that the pilot estimatem existsb i.e., is everywhere and always nite] will be dropped in our asymptotic analysis in the next section, which will allow us to include the case of high dimensionsd. We assume for the most part that

m(x) = + m1(x1) +::: + md(xd) (5) although our denitions make sense more generally i.e., when the regression function is not additive, in which case the target function is the closest additive approximation to the regression function.

For identiability we assume that

Z

mj(xj)pj(xj)dxj = 0 j = 1:::d (6) 7

(8)

where the marginal density of Xj is denoted by pj(): Denote also the marginal density of (XiXj) bypij() respectively (ij = 1:::d). The vector (Xi :i6=j) is denoted by X;j and its density by p;j.

Recall that backtting is motivated as solving an empirical version of the set of equations

m1(x1) = E(YjX1 =x1);;Efm2(X2)jX1 =x1g

;:::;Efmd(Xd)jX1 =x1g ... = ...

md(xd) = E(YjXd =xd);;Efm1(X1)jXd =xdg

;:::;Efmd;1(Xd;1)jXd =xdg:

In the sample, one replaces E(YjXj = xj) by one-dimensional smoothers mbj(), and iterates from some arbitrary starting values formj() see Hastie and Tibshirani (1991, p. 108). Letp(x) andb m(x)b be multidimensional density and regression smoothers dened above. We dene backtting estimates mej as the minimizers of the following norm

kmb ;mkpb=

Z

m(x)b ;;m1(x1);:::;md(xd)]2p(x)dxb (7) where the minimization runs over all functionsm(x) = +Pjmj(xj) with R mj(xj)pbj(xj)dxj = 0 see Nielsen and Linton (1996) we suppose that the density estimatep is non-negative]. This meansb that m(x) =e +b me1(x1) +::: +mej(xd) is the projection in the space

L

2(bp) of m onto the aneb subspace of additive functions fm 2

L

2(bp) : m(x) = + m1(x1) +::: + md(xd)g. This is a central point of our discussion. For projection operators backtting is well understood (method of alternating projections, see below). Therefore, this interpretation will enable us to understand convergence of the backtting algorithm and the asymptotics ofmej. We remark that not every backtting algorithm

8

(9)

based on iterative smoothing can be interpreted as an alternating projection method. The solution to (7) is characterized by the following system of equations (j = 1:::d):

mej(xj) =

Z

m(x)b p(x)b

pbj(xj)dx;j ;

X

k6=j

Z

mek(xk) p(x)b

bpj(xj)dx;j;b (8)

=b Z m(x)b p(x)dxb (9)

where mbj(xj) = n;1Pni=1Kh(xj ; Xij)Yi/bpj(xj) is the univariate Nadaraya-Watson regression smoother, in whichpbj(xj) =R p(x)dxb ;j is the marginal of the density estimatep(x): Straightforwardb algebra gives

Z

m(x)b p(x)b

pbj(xj)dx;j = pb;1j (xj)n;1Xn

i=1 Kh(xj;Xij)YiZ Y

`6=jKh(x`;Xi`)dx;j

= mbj(xj):

Furthermore, =b R m(x)b bp(x)dx = R br(x)dx and when R wj(x)dx = 1 we nd, as in Hastie and Tibshirani (1991), that b = n;1Pni=1Yi, i.e., that is the sample mean. Sob is ab pn-consistent estimate of the population mean and the randomness from this estimation is of smaller order and can be eectively ignored. Note also that

=b Z mbj(xj)pbj(xj)dxj forj = 1:::d: (10) We therefore dene a backtting estimator mej(xj) j = 1:::d as a solution to the system of equations

mej(xj) =mbj(xj);X

k6=j

Z

mek(xk) p(x)b

bpj(xj)dx;j; j = 1:::db 9

(10)

with dened by (10). Up to now we have assumed that multivariate estimates of the density andb of the regression function exist. This assumption is not reasonable for large dimensionsd (or at least such estimates can perform very poorly). Furthermore, this assumption is not necessary. Note that (8) can be rewritten as

mej(xj) =mbj(xj);X

k6=j

Z

mek(xk)pbjk(xjxk)

pbj(xj) dxk ;b: (11) In this equation only two dimensional marginals ofbp are used. Note also that the solutionsmej(xj) to (11) inherit the smoothness properties of m(x) andb p(x). We can therefore estimate the derivativesb ofmj(xj) for example, by

drmej(xj)

dxrj = drmbj(xj) dxrj ;

X

k6=j

Z

mek(xk) ddxrrj

pbjk(xjxk) pbj(xj)

dxk r = 12:::

In the next section we will discuss estimatesmej that are dened by (11) along with their asymptotic properties. In practice, our backtting algorithm works as follows. One starts with an arbitrary initial guessme0]j formej. In the j-th step of the r-th iteration cycle one puts

mejr](xj) =mbj(xj);X

k<j

Z

mekr](xk)pbjk(xjxk) pbj(xj) dxk ;

X

k>j

Z

mekr;1](xk)pbjk(xjxk)

bpj(xj) dxk ;b and the process is iterated until a desired convergence criterion is satised. The integrals are com- puted numerically, see section 4 below for further comments.

3.1 Asymptotics for the Nadaraya-Watson-like Version

We consider now estimatesmej that are dened by (11) with dened by (10), whereb mbjpbjk andpbj

are some given estimates. The next theorem gives conditions under which, with probability tending to one, there exists a solution mej of (11) that is unique and that can be calculated by backtting.

10

(11)

Furthermore, the backtting algorithm converges with geometric rate. Our assumptions, given below, are `high-level' and only refer to properties of mbj pbjk and pbj for example, we do not require that p is the underlying density of X or that mbj bpjk and pbj are kernel estimates] ; these properties can be veried for a range of smoothers under quite general heterogeneous and dependent sampling schemes, and we investigate this in section 5 below.

Assumptions. We suppose that there exists a density function p on Rd with marginals pj(xj) =

Z

p(x) dx;j

and

pjk(xjxk) =Z p(x) dx;(jk) for j 6=k:

(A1) For all j 6=k it holds that

Z p2jk(xjxk)

pk(xk)pj(xj)dxjdxk <1: (A2) For all j 6=k it holds that

Z

pbjk(xjxk)

pk(xk)pbj(xj); pjk(xjxk) pk(xk)pj(xj)

2pk(xk)pj(xj)dxjdxk =oP(1):

Furthermore, Z

mbj(xj)bpj(xj) dxj const.

By denition this constant is equal to b, see (10).

(A3) There exists a constant C such that with probability tending to one for all j

Z

mb2j(xj)pj(xj)dxj C:

(A4) There exists a constant C such that with probability tending to one for all j 6=k, supx

k Z

pb2jk(xjxk)

pb2k(xk)bpj(xj)dxj C:

11

(12)

(A5) We suppose that for a sequence n #0the one-dimensional smoothers mbj can be decomposed as mbj =mbAj+mbBj with R mbj(xj)pbj(xj) dxj not depending on j and, where the rst component mbAj is mean zero and satises

supx

k

Z

pbjk(xjxk)

pbk(xk) mbAj(xj)dxj

=oP

n

logn

:

For s = A and s = B we dene mesj as the solution of the following equation:

mesj(xj) =mbsj(xj);X

k6=j

Z

mesk(xk)bpjk(xjxk)

pbj(xj) dxk;bs (12) where bs = R mbs(x)p(x)dx.b Existence and uniqueness of meAj and meBj is stated in the next theorem. Note that mesj is dened as mej in equation (11) with mbj replaced by mbsj . We suppose that for (deterministic) functions jn() the term meBj satises

meBj(xj) =jn(xj) +oP( n):

These conditions, which we discuss further below, are all straightforward to verify, except perhaps A5, and turn out to be weaker than those made by Opsomer and Ruppert (1997).

The following result is crucial in establishing the asymptotic properties of the estimates.

Theorem 1 Convergence of backfitting]. Suppose that conditions A1-A2 hold. Then, with probability tending to one, there exists a solutionmej of (11) and (10) that is unique. Furthermore there exist constants 0 < < 1 and c > 0 such that, with probability tending to one, the following inequality holds

Z

h

mejr](xj);mej(xj)i2pj(xj)dxj c2rZ me0](x)2p(x)dx: (13) Here, forr = 0 the function mer](x) =me1r](x1)+:::+medr](xd) is the starting value of the backtting algorithm.

12

(13)

Furthermore, for s = A and s = B, with probability tending to one there exists a solution mesj of (12) that is unique.

Our next theorem states that the stochastic part of the backtting estimate is easy to understand.

It coincides with the stochastic part of a one-dimensional smooth. Therefore, for an understanding of the asymptotic properties of the backtting estimate it remains to study its asymptotic bias. This will be done after the theorem for the special case that an asymptotic theory is available for the pilot estimatem.b

Theorem 2. Suppose that conditions A1 - A5 hold for a sequence n. Then, it holds that supx

j

jmeAj(xj);mbAj(xj)j =oP( n):

In particular, one gets

mej(xj) =mbAj(xj) +jn(xj) +oP( n):

We now apply Theorem 2 to the case that full dimensional pilot estimatesp(x),b br(x) and m(x) =b

br(x)=p(x) =b Pni=1wi(x)Yi exist and thatbme1:::med are dened as minimizers of (7) i.e.,+b me1+ :::+med is the projection ofm onto the class of additive functions in Lb 2(p).] For the one-dimensionalb smooths,mbj we have, with appropriate weights wji(xj) that

mbj(xj) =

Z

m(x)b p(x)b

bpj(xj) dx;j =Xn

i=1 wji(x)Yi:

We compare now the estimatemej with the infeasible estimate mj that uses the knowledge of the other componentsml with l 6=j. More precisely, we dene the infeasible estimator mj(xj) to be the one-dimensional smooth of the unobserved dataYi =mj(Xij)+"i with"i =Yi;;Pnk=1mk(Xik)]

on Xij thus

mj(xj) =Xn

i=1 wji(xj)Yi j = 1:::d: (14) 13

(14)

Then, under appropriate regularity conditions,

n2=5fmj(xj);mj(xj)g=)Nnbj(xj)vj(xj)o j = 1:::d (15) for certain functions bj() and vj(). Moreover, because of covfmj(xj) mk(xk)g=o(n;4=5) one has

n2=5fmj(xj);mj(xj)g and n2=5fmk(xk);mk(xk)g are asymptotically independent forj 6=k:

(16) The additional information that Rmj(xj)pj(xj)dxj = 0 may have some value and we can dene the mean corrected version of mj(xj) by mcj(xj) = mj(xj);n;1Pni=1mj(Xij) which has the same asymptotic variance as mj(xj) but bias bcj(x) = bj(x);Rbj(x)pj(xj)dxj:

We suppose now that our conditions hold with mbA(x) = Pni=1wi(x)"i and mbB(x) =Pni=1wi(x) m(Xi). One can decompose

mbAj(xj) = Z mbA(x) p(x)b

pbj(xj)dx;j =Xn

i=1 wji(xj)"i mbBj(xj) = Z mbB(x) p(x)b

bpj(xj)dx;j =Xn

i=1 wji(xj)m(Xij):

Suppose now that it can be shown for a functionb that

mbB(x) = m(x) + n;2=5b(x) + oP(n;2=5): (17) We have the following

Corollary 1. Suppose that conditions A1-A5 hold with n=n;2=5and that (14) - (17) apply.

Then

n2=5

2

6

6

6

4

me1(x1);m1(x1) ...

med(xd);md(xd)

3

7

7

7

5

=)N

0

B

B

B

B

B

B

@ 2

6

6

6

4

b1(x1) bd(x... d)

3

7

7

7

5

2

6

6

6

6

6

6

4

v1(x1) 0 0

0 ... ...

... ... 0

0 0 vd(xd)

3

7

7

7

7

7

7

5 1

C

C

C

C

C

C

A

14

(15)

where vj(xj) = vj(xj) j = 1:::d are dened above, while bj(xj) are solutions to the following minimization problem

b1()min:::bd()

Z

b(x);;b1(x1);:::;bd(xd)]2p(x)dx s:t: Z bj(xj)pj(xj)dxj = 0 j = 1:::d:

For the special case that the function b is already of additive form b(x) = b1(x1) +::: + bd(xd) the bias functions bj(xj) coincide with the bias bcj(xj) of the `corrected' oracle estimate mcj(xj). Also

n2=5fm(x)e ;m(x)g=)N b+(x)v+(x)]

where b+(x) =Pjbj(xj) and v+(x) =Pjvj(xj):

Suppose additionally that for a sequence n with n;2=5 =o(n) supx jmbB(x);m(x);n;2=5b(x)j = OP(n)

supx jmj(xj);mj(xj)j = OP(n) for j = 1:::d:

Then, we have for j = 1:::d

supx jmej(x);mj(x)j=OP(n):

4 Estimation with Local polynomials

We discuss now local polynomials. For simplicity of notation we consider only local linear smoothing.

All arguments and theoretical results given for this special case can be easily generalized to local polynomials of higher degree.

Backtting estimators based on local polynomials can be written in the form of equation (7) by choosing p(x) =b Vb00(x);

V

bT0;0(x)

V

b;1;0;0(x)

V

b0;0(x) where

15

(16)

V

b(x) =

0

@

Vb00(x)

V

b0;0(x)

V

b;00(x)

V

b;0;0(x)

1

A

X

(x)T

K

(x)

X

(x)

with the scalarVb00(x) = n;1Pni=1Qd`=1Kh(Xi`;x`) and

V

b;00(x)

V

b;0;0(x) dened appropriately.

This approach has two disadvantages. First, it may work only in low dimensions ; since for the asymptotics, existence of the matrix

V

b;1;0;0(x) and convergence of

V

b;0;0(x) is required under our assumptions and this may hold only for low dimensional argument x]. Second, the corresponding backtting algorithm does not consist in iterative local polynomial smoothing.

We now discuss another approach based on local polynomials that works in higher dimensions and that is based on iterative local polynomial smoothing. We motivate this approach for the case that

V

b(x) does exist, but we will see that the denition of the backtting estimate is based on only one- and two-dimensional `marginals' of

V

b(x). So its asymptotic treatment requires only consistency of these marginals, and the asymptotics work also for higher dimensions. This is similar to the discussion in the last section where consistency has been needed only for one- and two- dimensional marginals of the kernel density estimatep.b

For functions f = (f0:::fd) with componentsfj :Rd !Rand d + 1 by d + 1 positive denite matrix functionM() dene the norm

kfkM =Z f(x)TM(x)f(x)dx:

There is a one-to-one correspondence between functions f and functions in Ffull. Furthermore, taking M =

V

b the norm kkM is simply the norm induced by the norm kk. In Section 2 our version

m

e(x) = (me0(x):::med(x))T of the backtting estimate was dened as the projection of the function inFfullcorresponding to]

m

b see (1)] with respect tokk onto the spaceFadd. Therefore,

m

e

coincides with the L2(

V

b) projection, with respect to the norm kfkVb, of

m

b onto the subspace Madd, 16

(17)

where

Madd = f

u

(x) = (u0(x):::ud(x))T 2Mj

u0(x) = + u1(x1) +::: + ud(xd)u`(x) = w`(x`) for ` = 1:::d

whereu1:::ud are functions R!Rwith Z

V

bjj0(xj)uj(xj)dxj = 0 for j = 1:::d and where w` :` = 1:::d are functions: R!Rg

where for eachj the (d+1)(d+1) matrix

V

bj(xj) =R

V

b(x)dx;j: The classMaddcontains functions that are additive in the rst component for` = 0] and where the other components for ` = 1:::d]

depend only on a one-dimensional argument. A function f inMadd is specied by a constant and 2d functionsR!R. Becausef` ` = 1:::d depend only on one argument, in abuse of notation we write alsof`(x`) instead of f`(x). Note that there is a one-to-one correspondence between elements ofMadd andFadd.

We now discuss how

m

e is calculated by backtting. Note that

m

e is dened as minimizer of

kmb ;mkVb. Recall that this is equivalent to minimizek

Y

;

m

k2 over Fadd. We discuss now mini- mization of this term with respect to thej-th components mj(xj) and + mj(xj). Dene for each j

kfk2j(xj) =

Z 1 n

n

X

i=1

hfi0(x) +Xd

j=1fij(x)xj;Xij

h

i

2Yd

j=1Kh(Xij ;xj) dx;j and note the obvious fact that

kfk2 =

Z

kfk2j(xj)dxj j = 1:::d:

Therefore, because an integral is minimized by minimizing the integrand, our problem is solved by minimizingk

Y

;

m

k2j(xj) for xedxj with respect to mj(xj) and+mj(xj), forj = 1:::d. After some standard calculations, this leads to:

mej(xj)Vb0j0(xj) +mej(xj)Vbjj0(xj) = 1nXi=1n

Kh(Xij;xj)Yi;^Vb0j0(xj) 17

(18)

; X

`6=j

Z

me`(x`)Vb0`j0(x`xj) dx`

; X

`6=j

Z

me`(x`)Vb``j0(x`xj) dx` (18) mej(xj)Vbjj0(xj) +mej(xj)Vbjjj (xj) = 1nXi=1n

Xij;xj

h Kh(Xij;xj)Yi;^Vbjj0(xj)

; X

`6=j

Z

me`(x`)Vb0`jj(x`xj) dx`

; X

`6=j

Z

me`(x`)Vb`j`j(x`xj) dx`: (19) Here, we have used one- and two-dimensional marginals of the matrix

V

b:

V

br(xr) =

Z

V

b(x) dx;r (20)

V

brs(xrxs) =

Z

V

b(x) dx;(rs): (21)

The elements of these matrices are denoted byVbrpq(xr) andVbpqrs(xrxs) withpq = 0:::d. Together with the norming condition Z

mej(xj)Vbjj0(xj)dxj = 0 (22) equations (18) and (19) dene ^, mej and mej for given

Y

and me`me` :`6=j].

Equations (18) and (19) can be rewritten as

mej(xj) = mj(xj) + !mj(xj) (23) mej(xj) = mj(xj) + !mj(xj) (24) where mj(xj) !mj(xj) mj(xj) and !mj(xj) are dened by:

mj(xj)Vb0j0(xj) + mj(xj)Vbjj0(xj) = 1nXi=1n

Kh(Xij;xj)Yi (25)

18

Referenzen

ÄHNLICHE DOKUMENTE

Keywords: Additive Mean, Geometric Ergodicity, Geometric Mixing, Local Polynomial Regression, Marginal Integration, Multiplicative Volatility, Stationary Probability Den- sity..

The main objective of this paper is to consider estimation and testing of the interaction terms using marginal integration theory.. Again the latter makes it possible to construct

The idea of the transformation is to obtain good results for the estimation of the local linear estimator with the help of a global bandwidth, although the density of the

Asymptotic results which are unified for the three types of dependent errors, are obtained by Beran (1999) for kernel estimator ˆ g of the trend and most recently by Beran and

In this paper, we have developed a nonparametric procedure to test whether the time-varying regression function in model (1) has the same shape over a time period I ⊂ [0, 1].. In

The current paper is motivated by such existing studies, especially by Kulasek- era and Wang (1997), Fan and Linton (2003), Dette and Spreckelsen (2004), and Zhang and Dette (2004),

◆ Use either fixed width windows, or windows that contain a fixed number of data points. ◆

In this paper we investigate the finite sample performance of four kernel-based estimators that are currently available for additive nonparametric regression models - the