A Appendix: Proofs - , and J. Nielsen

Before we come to the proofs of our results let us collect some facts about iterative projections. Let us dene the following spaces of additive functions

H=^fm²

L

²(p) : m(x) = m¹(x¹) +::: + md(xd) (p a.s.)

m(x)p(x)dx = 0^g

Hj =^fm ²^H:m(x) = mj(xj) (p a.s.) for a function mj ²

L

²(pj)^g:

The norm in the space ^H is denoted by ^km^k² = ^R m²(x)p(x)dx for m ² ^H. For m ² ^Hj we get with mj(xj) = m(x) (p a.s.) that ^km^k² = ^R m²(x)p(x)dx = ^R m²j(xj)pj(xj)dxj. The projection of an element of ^H onto ^Hj is denoted by $j. The operator %j =I^;$j gives the projection onto the linear space

?j = ^fm ²^H:

m(x)(xj)p(x)dx = 0 for all ²^Hj^g

= ^fm ²^H:

m(x)p(x)dx^;j = 0 (pj a.s.)^g: Form(x) = m¹(x¹) +::: + md(xd)²^H we get

%jm(x) = m¹(x¹) +::: + mj^;1(xj^;1) +mj(xj) +mj⁺¹(xj⁺¹) +::: + md(xd) (36)

with mj(xj) =^;^X

k⁶⁼j

mk(xk)p^jk(xjxk)

pj(xj) dx^k: (37)

We dene the operator%^bj as %j but with mj(xj) on the right hand side of (36) replaced by mb_j(xj) =^;^X

k⁶⁼j

mk(xk)p^bjk(xjxk)

pbj(xj) dx^k: (38)

PutT = %d%¹ andT =^b %^bd%^b¹. We will see below that in our set up the backtting algorithm is based on iterative applications of T. A central tool for understanding backtting will be given by^b the next lemma that describes iterative applications ofT.

Lemma norm of the operator T ]. Suppose that condition A1 holds. Then T : ^L²(p) ^!

2(p) is a positive self adjoint operator with operator norm = sup^fkTf^k :^kf^k 1^g < 1. Hence, for every m²^H ^{we get}

kT^rm^k ^r^km^k: (39)

Furthermore, for every m ² ^H there exist mj ² ^Hj (1 j d) such that m(u) = m¹(u¹) +::: + md(ud) (p a.s.) and with a constant c > 0

km^kcmax^fkm¹^k:::^kmd^kg: (40)

Proof of Lemma. We start by proving (39). It is known that (39) holds with ² 1^;

j⁼¹sin²(j) where cosj =(^Hj^Hj⁺¹+:::+^Hr) and where for two subspacesL¹ andL² the quan-tity(L¹L²) is the cosine of the minimalangle betweenL¹andL², i.e.,(L¹L²) = sup^f^R h¹(x)h²(x) p(x)dx : hj ² Lj ^\(L¹ ^\L²)^?^khj^k 1(j = 12^g. This result was shown in Smith, Solomon, and Wagner (1977). For a discussion, see Deutsch (1985) and Bickel, Klaassen, Ritov and Wellner (1993), Appendix A.4. We will show now that for 1 j d the subspaces ^Mj =^H¹+::: +^Hj are closed subsets of L²(p). This implies that (^Hj⁺¹^Mj)< 1 for j = 1:::d^;1, see again Deutsch (1985), Lemma 2.5 and Bickel, Klaassen, Ritov and Wellner (1993), Appendix A.4, Proposition 2. To prove that^Mj is closed we will use the following two facts. For two closed subspacesL¹ andL² of L²(p) it holds thatL¹+L² is closed if and only if there exists a constantc > 0 such that for all m ²L¹+L² there existm¹ ²L¹ and m² ²L² with m(u) = m¹(u¹) +m²(u²) (p a.s.) and

km^k cmax^km¹^k^km²^k]: (41) Furthermore,L¹+L² is closed if the projection ofL² onto L¹ is compact. For the proof of these two statements see Bickel, Klaassen, Ritov and Wellner (1993), Appendix A.4, Proposition 2. Suppose

now that it has already been proved for j jo^;1 that ^Mj is closed and that we want to show that

Mj^o is closed. As mentioned above, for this claim it suces to show that $j^o^jMj^o^;1 is compact. We remark rst that (41) implies that for everym² ^Mj^o^;1 there existmj ² ^Hj (j jo^;1) such that m(u) = m¹(u¹) +::: + mj^o^;1(uj^o^;1) (p a.s.) and with a constant c > 0

km^kcmax^km¹^k:::^kmj^o;1^k]: (42) We will prove that

k$j^om^k² const:

jX^o^;1

j⁼¹

R²_jj^o(xjxj^o)pj(xj)pj⁰(xj⁰)dxjdxj⁰

km^k² (43) with Rjj^o(xjxj^o) = p^jj^o(xjxj^o)

pj^o(xj^o)pj(xj):

Inequality (43) implies compactness of $j^o^jMj^o^;1. To see this one argues as in the standard proofs for compactness of Hilbert-Schmidt operators, see e.g., Example 3.2.4 in Balakrishnan (1981).

It remains to show (43). This follows from (42) with applications of the Cauchy-Schwarz inequal-ity.Equation (40) follows as (42).

Proof of Theorem 1. The following lemma establishes the result.

Lemma norm of the operator T^b ]. Suppose that conditions A1-A2 hold. Choose with < < 1. Then, with probability tending to one, the operator norm supⁿ^kTf(x)]^b ^k:^kf^k 1^o is bounded by ^.

Proof of Lemma. We remark rst that the distance betweenmj and m^bj, see (37)-(38) can be bounded as follows.

kmb_j ^;m_j^k ^X

k⁶⁼j

kmk^kSjk

with Sjk² =

pjk(xjxk) pk(xk)pj(xj)^;

pbjk(xjxk) pk(xk)^bpj(xj)

2pk(xk)pj(xj)dxj dxk: WithSj = maxk⁶⁼j^jSjk^j this and equation (40) imply

kmbj ^;mj^k d

c^km^kSj:

Now because of (A2), Sj =oP(1). This gives ^k%^bj ^;%j^k=oP(1). Now the statement of the lemma follows from

Tb^;T^k=oP(1):

Lemma stochastic expansion ofm^e]. Suppose that conditions A1-A2 hold. Then there exist constants 0 < < 1 ^and C > 0 such that with probability tending to one, for m^e the following stochastic expansion holds for s1:

m(x) =e ^X^s

r⁼⁰

Tb¹^rm^b¹(x) + ::: +^X^s

r⁼⁰

Tbrdm^bd(x) + R^s^](x)

where T^bj = %^bj%^bj^;1%^b¹%^bd%^bd^;1%^bj⁺¹ and R^s^](x) = R¹^s^](x¹) +::: + Rd^s^](xd) is a function in ^H with

kRj^s^]^k C^s: (44)

Under the additional assumption of (A3) it holds that sup_x

jR_j^s^](xj)^j C^s: (45) 31

Proof of Lemma. We remark rst that (11) can be rewritten as m(x) =e %^bjm(x) +^e m^bj(xj):

Iterative applications of this equation for j = 1:::d gives m(x) =e T^bm(x) +^e ^b(x) where

b(x) =%^bd%^b¹m^b¹(x) + ::: +%^bdm^bd^;1(x) +m^bd(xd):

With the last equality we get the following expansion m(x) =e ^X¹

r⁼⁰

Tb^r^b(x):

Plugging the denition of^b into this equation gives m(x) =e ^X¹

r⁼⁰

Tb¹^rm^b¹(x) + ::: +^X¹

r⁼⁰

Tbrdm^bd(x):

The operator norms ofT^b¹:::T^bd are smaller than , with probability tending to one, for < 1 large enough. This follows from the last lemma and it shows that the innite series expansion in the last equation is well dened. Furthermore, this can be used to prove that for C¹ > 0 large enough, with probability tending to one, ^kR_j^s^]^k C¹^s. This implies claim (44) because of (40). Assume now (A4). For the proof of (45) note that forC² > 0 large enough with probability tending to one for all functionsfg in ^Hj with sup_x^j^jf(xj)^j 1 and ^kg^k 1 it holds for k ⁶=j that

pbjk(xjxk)

pbk(xk) f(x^j)dxj

C² (46)

pbjk(xjxk)

pbk(xk) g(x^j)dxj

C²: (47)

Equation (47) follows from assumption (A4) by application of the Cauchy Schwarz inequality. Equa-tions (46) and (47) imply that for C³ > 0 large enough with probability tending to one for all functionsh in^H with ^kh^k 1 it holds that

Thb ^k C³: (48)

Claim (45) can be shown by using (44) and (48).

Proof of Theorem 2. The following lemma establishes the result.

Lemma behaviour of the stochastic component of m^e^]. Suppose (A1) - (A5). Then we have that

sup_x

jmeAj(xj)^;m^bAj(xj)^j=OP(logn n): (49)

Proof of Lemma. Proceeding as in the last lemma we get with s = Clogn (where C is chosen large enough)

me^A(x)^;m^b^A(x) =^X^s

r⁼¹

Tb¹^rm^b^A¹(x) + ::: +^X^s

r⁼¹

Tbrdm^bAd(x) + R^s^](x) whereR^s^](x) = R¹^s^](x¹) +::: + R_d^s^](xd) is a function in^H with

sup_x

jR_j^s^](xj^j n: It remains to show

sup_x ^kT^b¹^rm^b^A¹(x)^k=OP( n):

This follows from assumption (A5) by arguments as in the proof of the last lemma.

Proofs of Theorems 1⁰ and 2⁰. The theorems follow as Theorems 1 and 2 by essentially the same arguments. In particular, instead of L²(p) we consider now L²(V p) = ^ff = (f⁰:::f^d) :f^j :

Rd!Rwith ^R f^T(x)V f(x)p(x) dx <^1g. Furthermore, now the spaces^H and ^Hj are dened as

H = ^fm²

L

²(V p) : m⁰(x) = m¹(x¹) +::: + md(xd) (p a.s.) for functions m¹ ²

L

²(p¹):::

md ²

L

²(pd)

m⁰(x)p(x)dx = 0 and for j = 1:::d the functions m^j depend only on xj^g

Hj = ^fm²^H:m⁰(x) = mj(xj) (p a.s.) for a function mj ²

L

²(pj) and for `⁶=j it holds that m^`(x)0^g:

Note that again every functionf in^His a sum of functions in ^Hj: there exist functionsfj :^R^!^R²

with x ^! e⁰ ej

fj(xj) is a function in^Hj f(x) =^X^d

j⁼¹ e⁰ ej

fj(xj):

Here for j = 0:::d the vector ej denotes the (j + 1)st eigenvector of ^R^d⁺¹. The operators %j is now dened as in (36) with

mj(xj) =^;^X

k⁶⁼j

M

^;1j

S

jkpjk(xjxk)

pj(xj) m^k(xk)dxk:

Furthermore, we dene the operator %^bj now as %j but with m_j(xj) on the right hand side of (36) replaced by

mj(xj) =^;^X

k⁶⁼j

^M

^;1j (xj)

^S

jk(xjxk)mk(xk)dxk:

Proceeding as above one can show that the norm of the operators T = %d%¹ and T =^b %^bd%^b¹ is smaller than < 1 with probability tending to one]. Theorems 1⁰ and 2⁰ follow by stochastic expansions of

m

^e, compare the last two lemmas.

Proof of Theorem 3. Let ^kg^k¹= sup_x^jg(x)^j: Then, under these conditions, 34

kbpjk^;E(p^bjk)^k¹ = O

by (50) and (51), assumptions A2 and A4 are also satised by straightforward use of the geometric series expansion and the above result. Specically, we have

pbj(xj) = 1 pj(xj)^;

pbj(xj)^;pj(xj) pbj(xj)pj(xj) :

Likewise, assumption A3 is satised by B2, B3, and (52). By the triangle inequality, sup_x

As for the rst term, without loss of generality, we can suppose that 35

mbAj(xj) =n^;1^Xⁿ by straightforward change of variables. The argument is now quite similar to that given in Masry (1996). We drop the k subscript for convenience. Since the support of X is compact, it can be covered by a nite numberc(n) of cubes Inr with centresxr with dimensionl(n): We then have

supx^2X

To handle the second term we must use an exponential inequality and a blocking argument as in Masry's proof. In conclusion, by appropriate choice ofc(n) we obtain Q¹+Q² =O(log n=n¹⁼²) with probability one.

References

1] Auestad, B. and Tjstheim, D., (1991). Functional identication in nonlinear time series.

InNonparametric Functional Estimation and Related Topics, ed. G. Roussas, Kluwer Academic:

Amsterdam. pp 493{507.

2] Balakrishnan, A. V. (1981). Applied functional analysis, Springer, New York, Heidelberg, Berlin.

3] Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1993). Ecient and adaptive estimation for semiparametric models. The John Hopkins University Press, Baltimore and London.

4] Bollerslev, T., Engle, R.F., and D.B. Nelson(1994). ARCH Models. InThe Handbook of Econometrics, vol. IV, eds. D.F. McFadden and R.F. Engle III. North Holland.

5] Deutsch, F.(1985). Rate of convergence of the method of alternating projections. In: Paramet-ric optimization and approximation. Ed. by B. Brosowski and F. Deutsch 96 - 107 Birkhauser, Basel, Boston, Stuttgart.

6] Fan, J, E. Mammen, and W. Hardle(1996). Direct estimation of low dimensional compo-nents in additive models.Preprint.

7] ^{Hardle, W.} (1990). Applied Nonparametric Regression. Cambridge: Cambridge University Press.

8] Hardle, W. and L. Yiang(1996) `Nonparametric Autoregression with MultiplicativeVolatil-ity and Additive Mean,' Forthcoming inJ. Time Ser. Anal.

9] Hastie, T. and R. Tibshirani (1991). Generalized Additive Models. Chapman and Hall, London.

10] Linton, O.B. (1996). Ecient estimation of additive nonparametric regression models.

Biometrika, To Appear.

11] Linton, O.B. and W. Hardle. (1996). Estimating additive regression models with known links.Biometrika

83

, .

12] Linton, O.B. and J.P. Nielsen.(1995). Estimating structured nonparametric regression by the kernel method.Biometrika

82

, 93-101.

13] Mammen, E., Marron, J. S., Turlach, B. and Wand, M. P.(1997). A general framework for smoothing.Preprint. Forthcoming.

14] Masry, E.(1996). Multivariate regression estimation: Local polynomial tting for time series.

Stochastic Processes and their Applications.

65

, 81-101.

15] ^{Masry, E.} (1996). Multivariate local polynomial regression for time series: Uniform strong consistency and rates.J. Time Ser. Anal.

17

, 571-599.

16] Newey, W.K.(1994). Kernel estimation of partial means.Econometric Theory.

10

, 233-253.

17] Nielsen, J.P., and O.B. Linton (1997). An optimization interpretation of integration and backtting estimators for separable nonparametric models.J. Roy. Statist. Soc., Ser. B, Forth-coming.

18] Opsomer, J. D.,(1997). On the existence and asymptotic properties of backtting estimators.

Preprint.

19] Opsomer, J. D. and D. Ruppert(1997). Fitting a bivariate additive model by local poly-nomial regression.Ann. Statist.

25

, 186 - 211.

20] Robinson, P.M.(1983). Nonparametric estimators for time series. J. Time Ser. Anal.

4

, 185-197.

21] Rosenblatt, M. (1956). A central limit theorem and strong mixing conditions, Proc. Nat.

Acad. Sci.

4

, 43-47.

22] Ruppert, D., and M. Wand (1994). Multivariate locally weighted least squares regression.

Ann. Statist.

22

, 1346-1370.

23] Smith, K. T., D. C. Solomon, and S. L. Wagner (1977). Practical and mathematical aspects of the problem of reconstructing objects from radiographs. Bull. Amer. Math. Soc.

83

, 1227 -1270.

24] Tjstheim, D., and B. Auestad (1994). Nonparametric identication of nonlinear time series: projections. J. Am. Stat. Assoc.

89

, 1398-1409.

Im Dokument , and J. Nielsen (Seite 28-39)