Large Sample Theory in a Semiparametric Partially Linear Errors-in-Variables Models

(1)

SEMIPARAMETRIC PARTIALLY LINEAR

ERRORS-IN-VARIABLES MODEL

Hua Liang, Wolfgang Hardle and Raymond J. Carroll April 4, 1997

Abstract

We consider the partially linear model relating a response ^Y to predictors (^X^T) with mean function ^X^T +^g(^T) when the ^X's are measured with additive error. The semiparametric likelihood estimate of Severini and Staniswalis (1994) leads to biased estimates of both the parameterand the function^g( ) when measurement error is ignored. We derive a simple mod- ication of their estimator which is a semiparametric version of the usual parametric correction for attenuation. The resulting estimator of is shown to be consistent and its asymptotic distribution theory is derived. Consistent standard error estimates using sandwich{type ideas are also developed.

Key Words and Phrases:

Errors-in-Variables Functional Relations Measurement Error Non- parametric Likelihood Orthogonal Regression Partially Linear Model Semiparametric Models Structural Relations.

Short title:

Partially Linear Models and Measurement Error.

AMS 1991 subject classication:

Primary: 62J99, 62H12, 62E25, 62F10 Secondary: 62H25, 62F10, 62F12, 60F05.

Hua Liang is Associate Professor of Statistics, Institute of Systems Science, Chinese Academy of Sciences, Bei- jing 100080, China. Wolfgang Hardle is Professor of Econometrics, at the Institut fur Statistik und Okonometrie, Humboldt-Universitat zu Berlin, D-10178 Berlin, Germany. Raymond J. Carroll is Professor of Statistics, Nutrition and Toxicology, Department of Statistics, Texas A&M University, College Station, TX 77843{3143. This research was supported in part by Sonderforschungsbereich 373 \Quantikation und Simulation Okonomischer Prozesse". The rst author was supported by Alexander von Humboldt Foundation. The third author was supported by a grant from the National Cancer Institute (CA 57030) and by the Alexander von Humboldt Foundation. The authors would like to thank Stefan Prot for his comments.

(2)

Consider the semiparametric partially linear model based on a sample of size

n

,

Y

i =

X

_Ti

+

g

(

T

i) +

i

(1) where

X

iis a possibly vector{values covariate,

T

i is a scalar covariate, the function

g

( ) is unknown, and the model errors

i are independent with conditional mean zero given the covariates. The partially linear model was introduced by Engle, et al. (1986) to study the eect of weather on electricity demand, and further studied by Heckman (1986), Chen (1988), Speckman (1988), Cuzick (1992a,b), Hua & Hardle (1997) and Severini & Staniswalis (1994).

We are interested in estimation of the unknown parameter

and unknown function

g

( ) in model (1) when the covariates

X

are measured with error, and instead of observing

X

, we observe

W

i=

X

i+

U

i

(2)

where the measurement errors

U

i are independent and identically distributed, independent of (

Y

i

X

i

T

i), with mean zero and covariance matrix uu. We will assume that uu is known, taking up the case that it is estimated in section 5. The measurement error literature has been surveyed by Fuller (1987) and Carroll, et al. (1995).

If the

X

's are observable, estimation of

at ordinary rates of convergence can be obtained by a local{likelihood algorithm, as follows. For every xed

, let

g

^b(

T

) be an estimator of

g

(

T

). For example, in the Severini and Staniswalis implementation,

g

^b(

T

) maximizes a weighted likelihood assuming that the model errors

i are homoscedastic and normally distributed, with the weights being kernel weights with symmetric kernel density function

K

( ) and bandwidth

h

. Having obtained

b

g

(

T

),

is estimated by a least squares operation:

minimize ^Xⁿ

i⁼¹

n

Y

i^;

X

_Ti

^;^b

g

(

T

i

)^o²

:

In this particular case, the estimate for

can be determined explicitly by a projected least squares algorithm. Let

g

^byh( ) and ^b

g

xh( ) be the kernel regressions with bandwidth

h

of

Y

and

X

on

T

, respectively. Then

bx=

"

n

X

i⁼¹

f

X

i^;^b

g

xh(

T

i)^g^f

X

i^;

g

^bxh(

T

i)^g^T

#

;1^Xn i⁼¹

f

X

i^;^b

g

xh(

T

i)^g^f

Y

i^;

g

^byh(

T

i)^g

:

(3) One of the important features of the estimator (3) is that it does not require undersmoothing, so that bandwidths of the usual order

h

n

¹⁼⁵ lead to the result

n

¹⁼²(

n

) Normal(0

B

^;1

CB

^;1)

(4)

(3)

where

B

is the covariance matrix of

X

^;

E

(

X

^j

T

) and

C

is the covariance matrix of

^f

X

^;

E

(

X

^j

T

)^g. The least squares form of (3) can be used to show that if one ignores measurement error and replaces

X

by

W

, the resulting estimate is inconsistent for

. The form though suggests even more. It is well{known that in linear regression, inconsistency caused by measurement error can be overcome by applying the so{called \correction for attenuation". In our context, this suggests that we use the estimator

bn=

"

n

X

i⁼¹

f

W

i^;

g

^bwh(

T

i)^g^f

W

i^;^b

g

wh(

T

i)^g^T ^;

n

uu

#

;1^Xn i⁼¹

f

W

i^;^b

g

wh(

T

i)^g^f

Y

i^;

g

^byh(

T

i)^g

:

(5) The estimator (5) can be derived in much the same way as the Severini{Staniswalis estimator. For every

, let

g

^b(

T

) maximize the weighted likelihood ignoring measurement error, and then form

via a negatively penalized operation:

minimize ^Xⁿ

i⁼¹

n

Y

i^;

W

_Ti

^;^b

g

(

T

i

)^o²^;

^Tuu

:

(6) The negative sign in the second term in (6) looks odd until one remembers that the eect of measurement error is attenuation, i.e., to underestimate

in absolute value when it is scalar, and thus one must correct for attenuation by making

larger, not by shrinking it further towards zero.

In this paper, we analyze the estimate (5), showing that it is consistent, asymptotically normally distributed with a variance dierent from (4). Just as in the Severini{Staniswalis algorithm, in kernel weighting ordinary bandwidths of order

h

n

^;1⁼⁵ may be used.

The outline of the paper is as follows. In Section 2, we dene the weighting scheme to be used and hence the estimators of

and

g

( ). Section 3 is the statement of the main results for

, while the results for

g

( ) are stated in Section 4. Section 5 states the corresponding results for the measurement error variance uu estimated. Section 6 gives a numerical illustration. Final remarks are given in Section 7. All proofs are delayed until the appendix.

2 DEFINITION OF THE ESTIMATORS

For technical convenience we will assume that the

T

i are conned to the interval 0

1]. Throughout, we shall employ

C

(0

< C <

¹) to denote some constant not depending on

n

but may assume dierent values at each appearance. In our proofs and statement of results, we will let the

X

's be unknown xed constants, a situation which is commonly called the functional relation, see Kendall

& Stuart (1992) and Anderson (1984). The results apply immediately to the case that the

X

's are independent random variables, see Section 7.

(4)

Let

!

ni(

t

) =

!

ni(

t T

¹

:::T

n) be probability weight functions depending only on the design points

T

¹

:::T

n

:

For example

!

ni(

t

) = 1

h

n

Z sⁱ

s^i;1

K

t

^;

s h

n

ds

1

i

n

(7)

where

s

⁰ = 0

s

n= 1 and

s

i = (1

=

2)(

T

i+

T

i⁺¹)

1

i

n

^;1

:

Here

h

n is a sequence of bandwidth parameters which tends to zero as

n

^! ¹ and

K

( ) is a kernel function, which is supported to have compact support and to satisfy

supp(

K

) = ^;1

1]

sup^j

K

(

x

)^j

C <

¹

^Z

K

(

u

)

du

= 1 and

K

(

u

) =

K

(^;

u

)

:

In the paper, for any a sequence of variables or functions (

S

¹

:::S

n), we always denote

S

^T = (

S

¹

:::S

n)

S

^ei =

S

i ^;^Pnj⁼¹

!

nj(

T

i)

S

j

S

^e^T = (

S

^e¹

::: S

^en)

:

For example,

W

^f^T = (

W

^f¹

::: W

^fn)

W

fi =

W

i^;^Pnj⁼¹

!

nj(

T

i)

W

j^e

g

i=

g

(

T

i)^;^P_nk⁼¹

!

nk(

T

i)

g

(

T

k)

G

^e = (

g

^e¹

::: g

^en)^T

:

The fact that

g

(

t

) =

E

(

Y

i^;

X

_Ti

^j

T

=

t

) =

E

(

Y

i^;

W

_Ti

^j

T

=

t

) suggests

b

g

n(

t

) = ^Xⁿ

j⁼¹

!

nj(

t

)(

Y

j^;

W

_Tj

^bn) (8)

as the estimator of

g

(

t

).

In some cases, it may be reasonable to assume that the model errors

i are homoscedastic with common variance ². In this event, since

E

^f

Y

i^;

X

_Ti

^;

g

(

T

i)^g²= ²and

E

^f

Y

i^;

W

_Ti

^;

g

(

T

i)^g²=

E

^f

Y

i^;

X

_Ti

^;

g

(

T

i)^g²+

^Tuu

, we dene

b

2n=

n

^;1^Xⁿ

i⁼¹(

Y

^ei^;

W

^f_Ti

^bn)²^;

^b_Tnuu

^bn

:

(9) as the estimator of ².

3 MAIN RESULTS

We make the following assumptions.

Assumption 1.1.

There exist functions

h

j( ) dened on 0

1] such that the

j

th component of

X

i, namely

X

ij, satises

X

ij =

h

j(

T

i) +

V

ij, 1

i

n

, 1

j

p

, where

V

ij is a sequence of real numbers which satisfy limn^!1

n

^;1^P_ni⁼¹

V

i = 0 and limn^!1

n

^;1^P_ni⁼¹

V

i

V

_Ti =

B

is a positive denite matrix, where

V

i = (

V

i¹

:::V

ip)^T.

Assumption 1.2. g

( ) and

h

j( ) are Lipschitz continuous of order 1.

(5)

Assumption 1.3.

Weight functions

!

ni( ) satisfy:

(

i

) max

1in n

X

j⁼¹

!

ni(

T

j) =

O

(1)

(

ii

) max

1ijn

!

ni(

T

j) =

O

(

b

n)

(

iii

) max

1in n

X

j⁼¹

!

nj(

T

i)

I

(^j

T

j^;

T

i^j

> c

n) =

O

(

c

n)

where

b

n=

n

^;4⁼⁵

c

n=

n

^;1⁼⁵log

n:

Our two main results concern the limit distributions of the estimate of

and ².

THEOREM 3.1.

Suppose Assumptions 1.1-1.3 hold and

E

(

⁴ +^k

U

^k⁴)

<

¹. Then

^bn is an asymptotically normal estimator, i.e.

n

¹⁼²(

^bn^;

)⁾

N

(0

B

^;1;

B

^;1)

where

; =

E

^h(

^;

U

^T

)²^f

X

^;

E

(

X

^j

T

)^gf

X

^;

E

(

X

^j

T

)^g^Tⁱ+

E

^f(

UU

^T^;uu)

^T(

UU

^T ^;uu)^g+

E

(

UU

^T

²)

:

Note that ;=

E

(

^;

U

^T

)²

B

+

E

^f(

UU

^T^;uu)

^T(

UU

^T^;uu)^g+ uu ² if

is homoscedastic and independent of (

XT

).

THEOREM 3.2.

Suppose the condition of Theorem 3.1 hold, and that the

's are homoscedastic with variance ², and independent of (

XT

). Then

n

¹⁼²(^b_n²^; ²)⁾

N

(0 ²)

where ² =

E

^f(

^;

U

^T

)²^;(

^Tuu

+ ²)^g².

Remarks

As described in the introduction, an important aspect of the results of Severini and Staniswalis is that their methods lead to asymptotically normal parameter estimates in kernel regression even with bandwidths of the usual order

h

n

^;1⁼⁵. The same holds for our estimators in general. For example, suppose that the design points

T

i satisfy that there exist constants

M

¹

M

² such that

M

¹

=n

min_i

n^j

T

i^;

T

i^;1^jmax_i

n ^j

T

i^;

T

i^;1^j

M

²

=n:

Then Assumptions 1.3(i)-(iii) are satised by simple verication.

(6)

It is relatively easy to estimate the covariance matrix of

^bn. Let dim(

X

) be the number of components of

X

. A consistent estimate of

B

is just

f

n

^;dim(

X

)^g^;1^Xⁿ

i⁼¹

f

W

i^;

g

^bwh(

T

i)^g^f

W

i^;

g

^bwh(

T

i)^g^T^;uu

:

In the general case, one can use (30) to construct a consistent sandwich{type estimate of ;, namely

n

^;1^Xⁿ

i⁼¹

n

W

fi(

Y

^ei^;

W

^f_Ti

^bn) + uu

^bn

on

W

fi(

Y

^ei^;

W

^f_Ti

^bn) + uu

^bn

oT

:

In the homoscedastic case, namely that

is independent of (

XTU

) with variance ², and with

U

being normally distributed, a dierent formula can be used. Let ^C(

) =

E

^f(

UU

^T^; uu)

^T(

UU

^T^;uu)^g. Then a consistent estimate of ;is

(^b_n²+

^b_Tnuu

^bn)

B

^bn+^b_n²uu+^C(

^bn)

:

In the classical functional model, instead of obtaining an estimate of uu through replication, it is instead assumed that the ratio of uu to ² is known. Without loss of generality, we set this ratio equal to the identity matrix. The resulting analogue of the parametric estimators to the partially linear model is to solve the following minimization problem:

n

X

i⁼¹

j

Y

ei^;

W

^f_Ti

p1 +^k

^k²^j² ^{= min!}

where here and in the sequel^k ^k denotes the Euclidean norm. One can use the techniques of this paper to show that this estimator is consistent and asymptotically normally distributed.

The asymptotic variance of the estimate of

in this case when

is independent of (

XT

) can be shown to equal

B

^;1

"

(1 +^k

^k²)² ²

B

+

E

^f(

^;

U

^T

)²;¹;^T¹^g 1 +^k

^k²

#

B

^;1 where ;¹ = (1 +^k

^k²)

U

+ (

^;

U

^T

)

:

4 ASYMPTOTIC RESULTS FOR THENONPARAMETRICPART

THEOREM 4.1.

Suppose Assumptions 1.1-1.3 hold and

!

ni(

t

) are Lipschitz continuous of order 1 for all

i

= 1

:::n:

If

E

(

⁴+^k

U

^k⁴)

<

¹. Then for xed

T

i, the asymptotic bias and asymptotic variance of

g

^bn(

t

) are respectively,^P_ni⁼¹

!

ni(

t

)

g

(

T

i)^;

g

(

t

) and^P_ni⁼¹

!

_ni² (

t

)(

^Tuu

+ ²). These are all of order

O

(

n

^;2⁼⁵) for kernel estimators.

If the (

X

i

T

i) are random, then the bias and variance formulae are the usual ones for nonparametric kernel regression.

(7)

Although in some cases the measurement error covariance matrix uu has been established by independent experiments, in others it is unknown and must be estimated. The usual method of doing so is by partial replication, so that we observe

W

ij =

X

i+

U

ij

j

= 1

:::m

i.

We consider here only the usual case that

m

i 2, and assume that a fraction

of the data has such replicates. Let

W

i be the sample mean of the replicates. Then a consistent, unbiased method of moments estimate for uu is

buu =

Pni⁼¹^Pmj⁼¹ⁱ (

W

ij^;

W

i)(

W

ij^;

W

i)^T

Pni⁼¹(

m

i^;1)

:

The estimator changes only slightly to accommodate the replicates, becoming

bn =

"

n

X

i⁼¹

n

W

i^;^b

g

wh(

T

i)^oⁿ

W

i^;^b

g

wh(

T

i)^o^T^;

n

(1^;

=

2)^buu

#

;1

n

X

i⁼¹

n

W

i^;^b

g

wh(

T

i)^o^f

Y

i^;

g

^byh(

T

i)^g

(10) where

g

^bwh( ) is the kernel regression of the

W

i's on

T

i.

Using the techniques in the appendix, one can show that the limit distribution of (10) is Normal(0

B

^;1;²

B

^;1), with

;² = (1^;

)

E

^h(

^;

U

^T

)²^f

X

^;

E

(

X

^j

T

)^gf

X

^;

E

(

X

^j

T

)^g^Tⁱ +

E

^h(

^;

U

^T

)²^f

X

^;

E

(

X

^j

T

)^gf

X

^;

E

(

X

^j

T

)^g^Tⁱ

+(1^;

)

E

^h^f

UU

^T^;(1^;

=

2)uu^g

^T^f

UU

^T^;(1^;

=

2)uu^g+

UU

^T

²ⁱ

+

E

^h^f

UU

^T ^;(1^;

=

2)uu^g

^T^f

UU

^T^;(1^;

=

2)uu^g+

UU

^T

²ⁱ

:

(11) In (11),

U

refers to the mean of two

U

's. In the case that

is independent of (

XT

), the sum of the rst two terms simplies to ^f ²+

^T(1^;

=

2)uu

^g

B

.

Standard error estimates can also be derived. A consistent estimate of

B

is

B

bn=^f

n

^;dim(

X

)^g^;1^Xⁿ

i⁼¹

n

W

i^;

g

^bwh(

T

i)^oⁿ

W

i^;

g

^bwh(

T

i)^o^T^;(1^;

=

2)^buu

:

Estimates of ;² are also easily developed. In the homoscedastic case with normal errors, the sum rst two terms is estimated by (^b_n² + (1^;

=

2)

^b_Tn^buu

^bn)

B

^bn. The sum of the last two terms is a deterministic function of ( ²

uu), and these estimates are simply substituted into the formula.

A general sandwich{type estimator is developed as follows. Dene

=

n

^;1^P_ni⁼¹

m

^;1_i , and dene

R

i=

W

^fi(

Y

^ei^;

W

^f^T_i

^bn) +^buu

^bn

=m

i+ (

=

)(

m

i^;1)ⁿ(1

=

2)(

W

i¹^;

W

i²)(

W

i¹^;

W

i²)^T ^;^buu

o

:

(8)

Kernel fit: SBP on patient Age

40 50 60

Patient Age

125130135140Average SBP

Figure 1: Estimate of the function

g

(

T

) in the Framingham data ignoring measurement error.

Then a consistent estimate of ;² is the sample covariance matrix of the

R

i's.

6 NUMERICAL EXAMPLE

To illustrate the method, we consider data from the Framingham Heart Study. We considered

n

= 1615 males with

Y

being their average blood pressure in a x 2{year period,

T

being their age and

W

being the logarithm of the observed cholesterol level, for which there are two replicates.

We did two analyses. In the rst, we used both cholesterol measurements, so that in the notation of Section 5,

= 1. In this analysis, there is not a great deal of measurement error. Thus, in our second analysis, which is given for illustrative purposes, we used only the rst cholesterol measurement, but xed the measurement error variance at the value obtained in the rst analysis, in which case

= 0. For nonparametric tting, we chose the bandwidth using crossvalidation to predict the response. In precise, we compute the square error using a geometric sequence of 191 bandwidths ranging in 1

20]. The optimal bandwidth is selected to minimize the square error among these 191 candidates. An analysis ignoring measurement error found some curvature in

T

, see Figure 1 for the estimate of

g

(

T

).

(9)

As below mention, we will consider four case, using XploRe4 (See Hardle, et al. (1995)) to calculate each case. Our results are as follows. First consider the case that the measurement error was estimated, and both cholesterol values were used to estimate uu. The estimator of

ignoring measurement error was 9

:

438, with estimated standard error 0

:

187. When we accounted for measurement error, the estimate increased slightly to

^b= 12

:

540, and the standard error increased to 0

:

195.

In the second analysis, we xed the measurement error variance and used only the rst cholesterol value. The estimator of

ignoring measurement error was 10

:

744, with estimated standard error 0

:

492. When we accounted for measurement error, the estimate increased slightly to

^b= 13

:

690, and the standard error increased to 0

:

495.

7 DISCUSSION

Our results have been phrased as if the

X

's were xed constants. If they are random variables, the proofs simplify and the same results are obtained, with now

V

i =

X

i^;

E

(

X

i^j

T

i).

The nonparametric regression estimator (8) is based on locally weighted averages. In the random

X

context, the same results apply if (8) is replaced by a locally linear kernel regression estimator.

If we ignore measurement error, the estimator of

is given by (3) but with the unobserved

X

replaced by the observed

W

. This diers from the correction for attenuation estimator (5) by a simple factor which is the inverse of the reliability matrix (Gleser, 1992). In other words, the estimator which ignores measurement error is multiplied by the inverse of the reliability matrix to produce a consistent estimate of

. This same algorithm is widely employed in parametric measurement error problems for generalized linear models, where it is often known as an example of regression calibration (see Carroll, et al., 1995, for discussion and references). The use of regression calibration in our semiparametric context thus appears to hold promise when (1) is replaced by a semiparametric generalized linear model.

We have treated the case that the parametric part

X

of the model has measurement error and the nonparametric part

T

is measured exactly. An interesting problem is to interchange the roles of

X

and

T

, so that the parametric part is measured exactly and the nonparametric part is measured with error, i.e.,

E

(

Y

^j

XT

) =

T

+

g

(

X

). Fan and Truong (1993) have shown in this case that with normally distributed measurement error, the nonparametric function

g

( ) can be estimated only at logarithmic rates, and not with rate

n

^;2⁼⁵. We conjecture even so that

is estimable at parametric rates, but this remains an open problem.

(10)

Anderson, T.W. (1984). Estimating Linear Statistical Relationships. Annals of Statistics, 12, 1-45.

Carroll, R.J., Ruppert, D. and Stefanski, L.A. (1995), Nonlinear Measurement Error Models. Chap- man and Hall, New York.

Chen, H. (1988). Convergence Rates for Parametric Components in a Partly Linear Model. Annals of Statistics, 16, 136-146.

Cuzick, J. (1992a). Semiparametric Additive Regression. Journal of the Royal Statistical Society, Series B, 54, 831-843.

Cuzick, J. (1992b). E"cient Estimates in Semiparametric Additive Regression Models with Un- known Error Distribution. Annals of Statistics, 20, 1129-1136.

Engle, R. F., Granger, C.W.J., Rice,J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association,

81

, 310-320.

Fan, J. & Truong, Y. K. (1993). Nonparametric regression with errors in variables. Annals of Statistics, 21, 1900{1925.

Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.

Gleser, L.J. (1992). The importance of assessing measurement reliability in multivariate regression.

Journal of the American Statistical Association, 87, 696-707.

Hardle,W., Klinke, S. and Turlach, B.A. (1995). XploRe: An Interactive Statistical Computing Enviroment. Springer-Verlag,

Heckman, N. E. (1986). Spline smoothing in partly linear models. Journal of the Royal Statistical Society, Series B, 48, 244-248.

Kendall, M. and Stuart, A. (1992). The Advanced Theory of Statistics 2, 4th ed, Charles Gri"n, London.

Liang, H. and Hardle, w. (1997). Asymptotic normality of parametric part in partially linear heteroscedastic regression models. manuscript.

Speckman, P. (1988). Kernel Smoothing in Partial Linear Models. Journal of the Royal Statistical Society, Series B, 50, 413-436.

Severini, T.A. and Staniswalis, J.G. (1994). Quasilikelihood Estimation in Semiparametric Models.

Journal of the American Statistical Association, 89, 501-511.

8 APPENDIX

In this appendix, we prove several lemmas required. Lemma A.1 provides bounds for

h

j(

T

i)^;

Pnk⁼¹

!

nk(

T

i)

h

j(

T

k) and

g

(

T

i)^;^P_nk⁼¹

!

nk(

T

i)

g

(

T

k). The proof is immediate.

(11)

Lemma A.1.

Suppose that Assumptions 1.1 and 1.3 (iii) hold. Then max

1in^j

G

j(

T

i)^;^Xⁿ

k⁼¹

!

nk(

T

i)

G

j(

T

k)^j=

O

(

c

n) for

j

= 0

:::p

where

G

⁰( ) =

g

( ) and

G

l( ) =

h

l( ) for

l

= 1

:::p:

Lemma A.2.

If Assumptions 1.1-1.3 hold. Then

nlim^!1

n

^;1

X

^e^T

X

^e =

B

Proof.

Denote

h

ns(

T

i) =

h

s(

T

i)^;^P_nk⁼¹

!

nk(

T

i)

X

ks. It follows from

X

js=

h

s(

T

j) +

V

js that the (

sm

) element of

X

^e^T

X

^e (

sm

= 1

:::p

) is

n

X

j⁼¹

X

ejs

X

^ejm = ^Xⁿ

j⁼¹

V

js

V

jm+^Xⁿ

j⁼¹

h

ns(

T

j)

V

jm+^Xⁿ

j⁼¹

h

nm(

T

j)

V

js+^Xⁿ

j⁼¹

h

ns(

T

j)

h

nm(

T

j)

def= ^Xⁿ

j⁼¹

V

js

V

jm+^X³

q⁼¹

R

⁽_nsm^q⁾

The strong law of large number implies that limn^!11

=n

^P_ni⁼¹

V

i

V

_Ti =

B

and Lemma A.1 means

R

⁽³⁾nsm =

o

(

n

), which together with the Cauchy-Schwarz inequality show that

R

⁽¹⁾nsm =

o

(

n

) and

R

⁽²⁾nsm=

o

(

n

)

:

This completes the proof of the lemma.

Lemma A.3.

(Bennett's inequality) Let ;¹

:::

;n be independent random variables with zero means and bounded ranges: ^j;i^j

M:

Then for each

>

0

P

^fj^Xⁿ

i⁼¹;i^j

>

^g2exp

(

;

²^.

"

2

(^Xn

i⁼¹

var

(;i) +

M

)#)

:

Denote

⁰_j =

j

I

(^j

j^j

n

¹⁼⁴) and

⁰⁰_j =

j ^;

⁰_j

j

= 1

:::n:

We next establish several results for nonparametric regression.

Lemma A.4.

Assume that Assumption 1.3 holds and

E

= 0 and

E

⁴

<

¹

Then max

1in^j n

X

k⁼¹

!

nk(

T

i)

k^j=

O

P(

n

^;1⁼⁴log^;1⁼²

n

)

: Proof.

Let

M

=

Cb

n

¹⁼⁴. Lemma A.3 implies

P

0

@max

1in^j n

X

j⁼¹

!

nj(

T

i)(

⁰_j^;

E

⁰_j)^j

> C

¹

n

^;1⁼⁴log^;1⁼²

n

1

A

n

X

i⁼¹

P

8

<

: j

n

X

j⁼¹

!

nj(

T

i)(

⁰_j^;

E

⁰_j)^j

> C

¹

n

^;1⁼⁴log^;1⁼²

n

9

=

2

n

exp

(

;

C

¹

n

^;1⁼²log^;1

n

Pnj⁼¹

!

_nj² (

T

i)

E

²_j+ 2

b

nlog^;1⁼²

n

)

2

n

exp^f;

C

¹²

C

log

n

^g^;^!0 for some large

C

¹

>

0

(12)

which implies that

max

1in^j n

X

j⁼¹

!

nj(

T

i)(

⁰_j ^;

E

⁰_j)^j=

O

P(

n

^;1⁼⁴log^;1⁼²

n

)

:

(12) On the other hand, we know that

max

1in n

X

j⁼¹

!

nj(

T

i)^j

E

⁰⁰_j^j max

1kn max

1in^j

!

nk(

T

i)^j^Xⁿ

j⁼¹

n

^;1

E

^j

j^j⁴

Cn

^;2⁼³

E

^j

^j⁴=

O

(

n

^;1⁼²) =

O

P(

n

^;1⁼⁴log^;1⁼²

n

) (13)

n

X

j⁼¹

E

^j

⁰⁰_j^j

n

^;1^Xⁿ

j⁼¹

E

⁴_j =

E

⁴

:

(14)

Moreover, the Hartman-Winter theorem entails that

n

X

j⁼¹(^j

⁰⁰_j^j^;

E

^j

⁰⁰_j^j) =

O

2

6

4 8

<

:

n

X

j⁼¹

E

^j

⁰⁰_j^j²loglog

0

@

n

X

j⁼¹

E

^j

⁰⁰_j^j²

1

A 9

=

1=²³

7

5=

O

P(

n

¹⁼⁴log log¹⁼²

n

)

:

(15) It follows from (14) and (15) that

n

X

j⁼¹^j

⁰⁰_j^j=

O

P(

n

¹⁼⁴loglog¹⁼²

n

)

(16) and

max

1in^j n

X

j⁼¹

!

nj(

T

i)

⁰⁰_j^j max

1kin^j

!

nk(

T

i)^j^Xⁿ

j⁼¹

j

⁰⁰_j^j=

O

(

n

^;5⁼¹²loglog¹⁼²

n

) =

O

P(

n

^;1⁼⁴log^;1⁼²

n

) Combining the results of (12), (13) with (15), we obtain

max

1in^j n

X

k⁼¹

!

nk(

T

i)

k^j=

O

P(

n

^;1⁼⁴log^;1⁼²

n

)

:

This completes the proof of Lemma A.4.

Lemma A.5.

Assume that Assumptions 1.1-1.3hold. If

E

= 0 and

E

⁴

<

¹

:

Then

I

n=

o

P(

n

¹⁼²), where

I

n =^P_ni⁼¹^P_j⁶⁼_i

!

nj(

T

i)(

⁰_j^;

E

⁰_j)(

⁰_i^;

E

⁰_i)

:

Proof.

Let

j

n = ^h

n

²⁼³log²

n

ⁱ

(

a

] denotes the integer portion of

a:

)

A

j = ^nh⁽^j^;1)_jn ⁿ

i+ 1

:::

hjn jⁿ

io

A

_cj =^f1

2

:::n

^g^;

A

j and

A

ji =

A

j^;^f

i

^g

:

Observe that

I

n = ^X^jⁿ

j⁼¹

X

i²A^j

X

k²A^{j i}

!

nk(

T

i)(

⁰_k^;

E

⁰_k)(

⁰_i^;

E

⁰_i) +^X^jⁿ

j⁼¹

X

i²A^j

X

k²A^c^j

!

nk(

T

i)(

⁰_k^;

E

⁰_k)(

⁰_i^;

E

⁰_i)

def= ^X^jⁿ

j

U

nj+^X^jⁿ

j

V

nj^def=

I

¹n+

I

²n

(17)