Penalized quasi-likelihood estimation in partial linear models

(1)

0

1 2

d

d d

2

j

2 2

NNO AMMEN

ARAVAN DE EER

0 0 0

0

1 1 2 2

0

0 0

1 2

0

0 By

and

AMS 1991subject classications.

Key words and phrases.

Y x;z F x m z

m

Y ;T ; Y ;T ;::: Y;T Y

T Y;T P

t E Y T t ;

Y T t

T X ;Z X Z d d d

: x;z F x m z ;

Abstract.

1. Introduction.

R

R R

E M

Institut f ur Sto chastik

Fachb ereichMathematik

Humb oldt-Universitat zuBerlin

Unter den Linden 6

10099Berlin

Germany

S G

Mathematical Institute

University of Leiden

P.O. Box 9512

2300RA Leiden

The Netherlands

Consider a partial linear mo del,where the exp ectation of arandom vari-

able dep endson covariates ( ) through ( + ( )), with an unknownparam-

eter, and an unknown function. We applythe theory of empirical pro cesses to derive

the asymptotic prop ertiesof the p enalized quasi-likeliho o destimator.

Primary62G05, secondary62G20.

Asymptotic normality, p enalized quasi-likeliho o d, rates of

convergence.

Let ( ) ( ) b e indep endentcopies of ( ), where

isa real-valuedrandomvariable and . Denotethedistributionof( )by ,and

write

( )= ( = )

forthe conditionalexp ectation of given = . Inthis pap er,weshall study thepartial

linear mo del,where =( ), , , + = , and

(11) ( ) = ( + ( ))

* This research was supp orted by the Deutsche Forschungsgemeinschaft, Sonderfor-

schungsbereich 373 \Quantikation und Simulation

Okonomischer Prozesse", Humb oldt-

Universitat zu Berlin and the Europ ean Union HumanCapital and Mobility Programme

ERB CHRX-CT 940693.

(2)

1

0

Z

X

R R R

R d

k

y

n

i

# #

! 2

2

f 1g

2G f 2 1g

2G

0

! 1 01 1

j 0

0

0 0

0

1 2

2

0

2

1

0

( ) 2

( )

0 0

0

1

=1

=

^

0

F

m

x

F Y

d d

T X ;Z T ; m

m J m <

: J m m z dz:

k m k m

F g ;

g g x;z x m z ; J m < :

g g x;z x m z J g J m

: Q y

y s

V s ds;

V a;b ; a <b

V Y

V Y V V

# Y ;:::;Y

# V #

d

d#

Q Y # :

V Y

Y

with : a given function, an unknown parameter, the transp ose of

, and an unknownfunction in a given class of smo oth functions. Mo del (1.1) oers

a exible approach. The inclusion of the linear comp onent allows discrete covariates.

The link function may b e useful in case of a b ounded variable (see for instance

Example 2b elow, where binaryobservations are considered).

For simplicity, we shall restrict ourselves to the case = = 1. We shall assume

that =( ) hasb ounded supp ort,say [0 1] , and that isin theSob olev class

: ( ) , where

(12) ( )= ( ( ))

Here, 1 is a xed integer, and denotes the -thderivative of the function . In

summary,the mo delis

= ( )

with

= ( )= + ( ) : ( )

For , ( )= + ( ), we shall often write ( )= ( ).

Denethe quasi-(log-)likeliho o dfunction

(13) ( ; )=

( )

with a known function :( ) (0 ), , given. The quasi-likeliho o d

function was rst considered by Wedderburn(1974). Prop erties of quasi-likeliho o d func-

tions are discussed in McCullagh (1983) and McCullagh and Nelder (1989). There, the

function hasb een chosenas the conditionalvariance of theresp onse , and ithas b een

assumedthat dep ends only on the conditionalmean of , i.e. = ( ). The quasi-

likeliho o d approachis a generalization of generalized linear mo dels. The log-likeliho o d of

anexp onentialfamilyis replacedby a quasi-likeliho o d,in whichonly therelation b etween

theconditionalmeanandthe conditionalvariancehas tob e sp ecied. To seetherelations

of thequasi-likeliho o d functionswith generalized linear mo delsnote forinstance that the

maximum likeliho o d estimate

^

based on an i.i.d. sample froman exp onential

family with mean and variance ( ) is given by

( ; ) =0

In this pap er we do not assume that ( ) is the conditional variance of . The only

assumptions on the distribution of we use in this pap er concern the form of the con-

ditional mean (see (1.1)) and sub exp onential tails (see (A0), b elow). In particular, our

results may b e used in case of mo del missp ecication.

(3)

X

P P

2G

0

R

Example 1.

2 0

2G

2 1

! 1

0 0

2 1

f g

p

1

p

!1

p

2 2

=1

0

1 4 (2 +1)

2

1

1 1

2

1 2

1

1 1

(2 +1)

1

1 0

n

g

n

i

i i

n

n n n n n

n n

n

=

n

k = k

n n

n n n

n n

n

=

n

n n

n

k = k

: g Q F g J g ;

Q

n

Q Y T :

g

g x;z x m z J m <

F g

g

n

o n = O n

F V Q y y =

g

m Y ;:::;Y

X Z z h z z ; J h <

O

o n O =n

m k

h n

J h <

h n

h n J h

n

n J h

h m

V s

The p enalized quasi-likeliho o destimator is dened by

(14) ^ argmax[

( ( )) ( )]

where

( ) = 1

( ; ( ))

Throughout, we assume that indeeda solution ^ of the maximizationproblem (1.4)

exists. Then ^ ( ) =

^

+ ^ ( ), where

^

, and (^ ) . The estimated

conditional exp ectation is ^ = (^ ).

Generalized linear mo dels of the form (1.1) have rst b een considered by Green and

Yandell (1985) and Green (1987). The generalization to quasi-likeliho o d mo delshas also

b een studied in e.g. Chen (1988),Sp eckmann(1988) and Severini and Staniswalis(1994).

Thesepap ershoweverusedierent estimationpro cedures,suchasp olynomialapproxima-

tionorkernelsmo othing. Lo cal p olynomialsmo othingbasedonquasi-likeliho o dfunctions

is discussedin Fan, Heckmanand Wand (1995).

Our main aim is to obtain asymptotic normality of the p enalized quasi-likeliho o d

estimator

^

of , but rst we derive a rate of convergence for ^ . The asymptotic

prop ertiesoftheestimatorsdep endofcourseontheb ehaviourof thesmo othingparameter

as . It may b e random (e.g. determined through cross-validation). We assume

= ( ), and (1 )= ( ).

The followingexample is an imp ortantsp ecial case.

Let b e the identity,and 1. Then ( ; ) = ( ) 2,so that

^ is thep enalized least squaresestimator. It is called apartial smo othingspline. If is

non-random,

^

and ^ arelinearin . Seee.g.Wahba(1984),Silverman(1985).

Denotetheconditionalexp ectationof given = by ( ), [0 1]. If ( )

and is of the order given ab ove and non-random, then the bias of

^

is ( ) =

( ), whereas its variance is (1 ). This is a result of Rice (1986). It indicates

that the smo othness imp osed on ^ (in terms of the numb er of derivatives ) should

not exceed the smo othness of . In Theorem 4.1, we shall prove -consistency and

asymptotic normality of

^

under the condition ( ) . InRemark4.1, we showthat

incaseof roughfunctions , -consistencyof

^

canb eguaranteedbyundersmo othing.

Moreprecisely,there we allowthat dep endson andthat ( ) . We showthat

^

is -consistentandasymptoticallynormal,aslongas ischosensmallenough. Even

forthe optimalchoice , ( ) may tendto innity. This showsthatmuch

less smo othness is neededfor than for .

Thetheory forgeneralp enalized quasi-likeliho o destimators essentially b oilsdownto

that for Example 1, provided one can prop erly linearize in a neighb ourho o d of the true

parameters. For thispurp ose, we rst needto proveconsistency,which is notto o dicult

if ( ) stays away from zero. Unfortunately, this is frequently not the case, as we see

in Examples 2 and 3 b elow. In Section 7, we shall employ an ad ho c metho d to handle

(4)

0

P P

Z

X 0

j j

0

2 f g j 0

2

2 1

2

0 0

1

j

j j

n

=

n

k = k

i i i

W =C

j

k

j j

j

0 0

2

1 4 (2 +1)

0 0

0

[ ]

0

1

+1

0

1 2

1

=1

1

2 2

1 2

Example 2.

Example 3.

2. Main assumptions,notation and technic al tools.

2.1. Main assumptions.

R

2.2. Notation.

Y ; P Y T t F g t V s s s

s ; g

Y ; V s =s s > Q y

=

X ;Z ;

: o n ; = O n :

f dF =d

W Y T W Y T i ; ;:::

<C <

A E T C ; :

x;z z j ;:::;k x;z x

A dP

m z m z m z ;

m z z ;

m z J m J m g x;z x m z

g x;z g x;z g x;z ;

asymptotic normality are relatively inno cent, but proving consistency can b e somewhat

involved.

Let 0 1 , ( = 1 = ) = ( ( )), and let ( ) = (1 ),

(0 1). In this case, the quasi-likeliho o d is the exact likeliho o d, so that ^ is the

p enalized maximum likeliho o destimator.

Let (0 ), and ( ) = 1 , 0. Then ( ; ) is the log-

likeliho o d corresp ondingto the exp onential distributionwith parameter1 .

This pap er can b e seen as a statistical application of empirical pro cess theory as

consideredinDudley(1984),GineandZinn(1984),Pollard(1984,1990),Ossiander(1987),

and others. Some concepts and results in this eld are presentedin Section 2. In Section

3, rates of convergenceare obtained, and Section 4 uses the ratesto establish asymptotic

normality. InSection5,wediscussb o otstrappingthedistributionof

^

. Examples1-3are

studied in Section 6,and Section 7 revisits Example2.

We recall the assumption( ) [0 1] ,and

(21) = ( ) 1 = ( )

We also supp osethroughout that ( )= ( ) exists forall .

Write = ( ) ( = ( ), = 1 2 ). The following condition is

essentialin Section3: forsome constant 0 ,

( 0) (e ) almost surely

Let ( )= , =1 , and ( )= . We assumethat the matrix

=

is non-singular. Here, denotes the transp oseof .

By the Sob olev-emb eddingTheorem, one can write

( )= ( )+ ( )

with

( )=

and ( ) ( )= ( )(seee.g.OdenandReddy(1976)). Sofor ( )= + ( ),

( ) = ( )+ ( )

(5)

0

2

0 X

R

X

1

2

1 1

!1

k

j

j j k

n

i

i i

n

i

t ;

k

N j

B

L

j U

j N

j

L

j U

j

L

j

U

j

B B

>

=k

W =C

n >

B n

1

+1

=1

+1

1 2 2 2 2

2

0 0

2

0

2 2

=1 2

2

0

2 2

=1 2

[01]

+1 2

1

=1

0 1

0

[ ]

0

2

0

R R

R

Denition.

2.3. Technical tools.

Theorem 2.1.

Proof

Theorem 2.2.

R

j j

2 !

k k k k

!

2

k k k k

j j j j

2 k k

A L

A A

L 2 A

2 f g A

f g L

2 A 2 f g

A A

1

f 2G j j g j1j 1

t u

1

j

A !

2

A k1k 1

For each we have

Write (AA0) for the assumption that given , is (uniformly)

sub-Gaussian,i.e. forsome constant

almost surely

Let b e a uniformly b ounded class of functions dep ending only on

. Let . Supp osethat either (A0) holdsand

almost surely

g x;z x;z ; ;

g g x;z m z g x;z J g J m

a ; E a Y;T adP

a Y;T

a E a Y;T ; a

n

a Y ;T :

a ;

t ;

a E a T ; a

n

a T :

a a t ;

;

N ; ; N

a ;:::;a a a;a

j ;:::;N N ; ;

N a ;a a ;a

j ;:::;N a j ;:::;N a a

a H ; ; N ; ;

H ; ; N ; ;

<C <

sup H ; g g C ; J g C ; < :

T W

<C <

AA E T C ; :

a ;

t ; < <

: H ; ; < ; ;

( )= ( ) =

i.e. = , and ( ) = ( ), ( ) ( )= ( ).

For a measurable function : [0 1] , ( ( )) = denotes the

exp ectation of ( ) (whenever it exists) and

= ( ) =

1

( )

With a slight abuse of notation, we also write for : [0 1] dep ending only on

[0 1] ,

= ( ) =

1

( )

Moreover,

= sup ( )

and for , = .

Let b e a subset of a (pseudo-)metric space ( ) of real-valued functions.

The -covering numb er ( ) of is the smallest value of for

which there exist functions in , such that for each , ( ) for

some 1 . The -covering number with bracketing ( ) is the smallest

value of for whichthere exist pairsof functions [ ] , with ( ) ,

= 1 , such that for each there is a 1 such that

. The -entropy ( -entropy with bracketing) is dened as ( ) = log ( )

( ( )=log ( )).

0

( : ( ) )

. See Birmanand Solomjak (1967).

0 ,

( 0) (e )

: [0 1]

[0 1] 0 2

(22) limsupsup ( )

(6)

P

0

1

2+

1

2+

P

0

@

1

A

X

! 0

=1

1 2

0

=1

0

!1

2A

0

!1

2A k k

!1

2A k k n

>

n

a

n

i

i i

n

=

n

>

B

n

a ; a >Cn

n

>

B

n

a ; a

n

i

i i A k 1k 1

k k _

t u

k 1 k k 1 k

k1 k k1k

A

A k 1k 1

1

j k k

k k 0 j

t u

A k 1k 1

j p

0 j

A

t u

2 Proof.

Theorem 2.3.

P

Proof.

Theorem 2.4.

P

Proof.

3. Rates of convergence.

R

: H ; ; < ; :

:

=n W a T

a n

O n :

< <

: H ; ; < :

> <C <

:

a

> :

< <

: H ; ; < :

> >

:

n

a Y ;T P a <:

l

f

V F

; ;

almost surely

Then

Supp ose is uniformly b oundedand that forsome ,

Then forall thereexists a suchthat

Supp ose that forsome ,

Then forall thereis a suchthat

(23) limsupsup ( )

(24) sup

(1 ) ( )

( )

= ( )

It is shownin van de Geer (1990) that (AA0) and (2.3) imply(2.4). Similar

argumentsas there, combined with e.g. a result of Birge and Massart (1991,Theorem 4),

show that (AA0) can b e relaxed to (A0), provided (2.3) is strengthened to (2.2) (see also

van de Geer (1995)).

The following theorem will b e used in Section 4, only to show that consistency in

-norm implies consistency in -norm. Nevertheless, we present it in its full

strength, sothatone canverify thatthe ratesforthe -normand -normcoincide.

0 2

(25) sup ( )

0 0

(26) limsup sup 1 =0

Seevan de Geer (1988,Lemma 6.3.4).

0 2

(27) sup ( )

0 0

(28) limsup sup

1

( ( ) ( ))

Condition (2.7) ensures that is a Donsker class, and (2.8) is the implied

asymptotic equicontinuity of the empirical pro cess. See e.g. Pollard (1990) and the refer-

ences there forgeneral theory on Donsker classes.

Dene

( )=

( )

( ( ))

(7)

0

0 n

n i

i P

P

P Z

X X

Z

X

P

R

Lemma 3.1.

Proof.

Supp ose(A1) and (A2) are met. Then

and

2

j j 2

k 0 k

2G

0 0 0

0

0 k 0 k 0 k 0 k

0

0 0

k 0 k k 0 k

k 0 k

1 2

1

2

0

( )

^ 0

0

=1

0

=1

^ ( )

( )

0

=1

2 1 2

0

1

0 2

=1 2

2 2

0

2 2 2

0

0 2

0

n n n

n

g

F g

y

n g g

n n n

n

i

i n i i

n

i

T

i

n n n

n

i i

=

n n n

n

i i

n n n

n

n n n

n

n n

A V s = C ; s a;b ;

A

C

l C ; :

f V F

: g g O ;

: J g O :

y

V s

ds; g ;

Q Q

n

W T T

n

s T

V s

ds:

: Q Q

n

W

C

:

=n W O

F g Q F g J g

: Q Q J g J g o :

O o ;

O

( 1) ( ) 1 forall ( )

and

( 2)

1

( ) for all

Clearly, (A1) and (A2) hold in Example1, where = ( ) 1. Infact, under (A1) and

(A2), theratesproblemessentially reducestotheone ofExample1. Itis p ossibleto avoid

these assumptions, as weshall illustrate in Section 7.

(31) ^ = ( )

(32) (^ )= (1)

For a xed , we write

=

1

( )

and ^ = , = . We have

(^ )

( )= 1

(^ ( ) ( ))

1 ( ( ))

( )

So, by the Cauchy-Schwarzinequality and (A1),

(33)

(^ )

( ) ( 1

) ^

1

2

^

Note that (1 ) = (1) almost surely (by (A0)). On the other hand, b ecause

^ = (^ ) maximizes

( ( )) ( ), we have

(34)

(^ )

( ) ( (^ ) ( )) (1)

The combination of (3.3) and (3.4) gives

^ ^ (1)+ (1)

whichimplies ^ = (1).

(8)

R

X

P

X 0

0

0 k

k

P P

P

P P

P

P 1

1

1 1

1

0 0

0

0 0

2 1

2(2 +1)

1

2

1

2

2 1

2(2 +1) 2

~ 2

2

0

1 2

0

=1

1

~

2

0 1

0 2 0

=1

0

1 1 (2 )

1 (2 )

1 2

2 2 2

0 0

=1

0

1

0 2

0 1

1 2

g g

n n n n

n n

n n n

n

n n

n

n n

n

n n

n

i

i i

n n

n n n n

n n g g

>

=k

g g g

n n n n

n

i

i n i i

n

= k

n n

= k

n

=

n

n n n n

n

i

i n i i n

n

n n n n

=

j 0 j j 0 j j 0 j

2 2G k 0 k k k

j j

k k

k k k k

!

k k

j j j j

j j j 0 j

j 0 j 2G

f 0

2G

j j

g j1j 1

k 0 k k 0 k

0

k 0 k _

0 0

0 0 k 0 k

k 0 k _

:

C

g t g t t t C g t g t ;

t ; g;g g g O g O

g = J g O

g g g ;

g g J g

:

g

J g

g

J g

g

J g

O :

A dP

n

T T A; :

= J g O T

g = J g O g = J g O

= J g O

C g g g;g

H ;

J g

g ;

J g

C ; < :

=C g g

:

=n W T T

g g J g J g n

O n :

J g J g Q Q

:

n

W T T

C

g g J g J g n O n

(35)

1

( ) ~( ) ( ) ( ) ( ) ~( )

forall [0 1] and all ~ . Soalso ^ = (1), so that ^ = (1).

We shall now showthat ^ (1+ (^ ))= (1). Asin Section 2.2, write

^ = ^ +^

with ^ =

^

, and ^ (^ ). Then

(36)

^

1+ (^ )

^

1+ (^ ) +

^

1+ (^ )

= (1)

Now, = is assumedto b e non-singular,and

1

( ) ( ) almost surely

Thus,(3.6) impliesthat

^

(1+ (^ ))= (1). Because is in ab oundedset, also

^ (1+ (^ ))= (1). So ^ (1+ (^ ))= (1).

In view of (3.5), we now have ^ (1+ (^ )) = (1). Moreover,

~ , ~ . So by Theorem 2.1,

sup (

1+ ( ) :

1+ ( )

)

UsingTheorem2.2,assumption(A0), andthefact that ^ (1 ) ^ ,

wend

(37)

(1 ) (^ ( ) ( ))

^ (1+ (^ )) (1+ (^ ))

= ( )

Invoke this in (3.4):

( (^ ) ( ))

(^ )

( )

(38)

1

(^ ( ) ( )) 1

2

^

( ^ (1+ (^ )) (1+ (^ )) ) ( )

(9)

R 0

0

0 0

P

P k

k

k 0

0

0 0 0 0

0 1

2

1

2

2 1

2(2 +1)

1

2

1

2

2 1

2(2 +1)

4

2 +1

Remark 3.1.

R

4. Asymptotic normality.

0 k 0 k

k 0 k _

k 0 k

k 0 k _

k 0 k

t u

2 G

0

2

k 0 k

1 2

j j j 0 j j 0 j j 0 j

j

j n

n

n n

n

n n

=

n

n n

n n n n

=

n

n n

n

=k

n n

n

k = k

;n ;n n;n n;n ;n ;n

;n ;n ;n ;n ;n

;n ;n ;n

;n

k

n

k = k

;n

k = k

n ;n n n ;n

n ;n

1 2

2

0 2

2 2 2 2

0

0 1

1 2

0

2 2 2

0

0 1

1 2

0

2 2 2 2 1 1 2

1 (2 +1)

1 1 0 0

0 0 0 0 0

0 0 0 0

0

0 0

+1

0

(2 +1)

0

2 (2 +1)

0 0

0

0 3 4

2

0 0

0 3 0 3 0 0 0

0 4 0 4 0 0 0

0 0 0 0

1

0 0 0

C C

g g :

J g J g

g g J g J g n O n ;

g g J g

g g J g J g n O n :

g g J g O n n :

O n

Y ;T ;:::; Y ;T Y ;T

Y T F g T g n

; ;::: W Y F g T T

C n A dP P

T

A c ;

c > n =

O n J g

g g O J g ;

J g O J g :

< ;C ;C < t ; g t

A l C ; l l C ;

A f C ; f f C :

l l g f f g

h z

E Xf T l T Z z

E f T l T Z z

; 2

^

Thus,

(^ ) ( )

+( ^ (1+ (^ )) (1+ (^ )) ) ( )

as wellas

^ ( )

+( ^ (1+ (^ )) (1+ (^ )) ) ( )

Solve these two inequalitiesto nd that

^ + (^ )= ( + + )

Because we assumed = ( ), this completes the pro of.

The situation can b e adjusted to the case of triangular arrays. Let

( ) ( ) b e indep endent copies of ( ), and supp ose that the

conditional exp ectation of given is equal to ( ( )), with , =

1 2 . Assume that (A0) holds for = ( ( )) and , with constant

not dep ending on . Assume moreover that for = , b eing the

distributionof , wehave

forall

where 0 is indep endent of . Then one nds under (A1) and (A2), for 1 =

( (1+ ( )) ),

^ = ( (1+ ( )))

and

(^ )= (1+ ( ))

We shall use the assumptions: for some constants

0 , and forall [0 1] , we have for = ( ),

( 3) ( ) and ( ) ( ) forall

and

( 4) ( ) and ( ) ( ) for all

Write = ( ) and = ( ), and take

( )=

( ( ) ( ) = )

(10)

P

0

p

1 P

P

P Theorem 4.1.

Proof.

n

=

n

=

n n

=

n

=

n

n n

i

i i i

=

n

n 0

j

0

j j

k k

p

0 k k

k 0 k

k k

1

k k

p

0

k k

k 0 k

j j

2 1

1 0

2 1

0 0

2

0

0 2 2

0 2

2

0 0 1 2

2 2

0 0 0

1 2

2 2

0

1 4

2

1

0 0 1 2

2

0 1

=1

0 2

0 0 1 2

2 2

0 Supp ose (A3) and (A4) aremet. Assumemoreover that

has density b ounded away from onits supp ort

and

Then,

h x;z x h z :

h z E X Z z ;

h x;z x h z :

Y V

Y T

Y T E W T V T ;

E W l T h T f l h :

n f l h

: g g o n ;

: J g O ;

: h > ;

: Z ;

: J h < ;

: f l h > :

n

W l T h T

f l h

o :

g g o

g O

( )= ( )

Alsodene

~

( ) = ( = )

and

~

( )=

~

( )

Theorem4.1b elowgives conditionsforasymptoticnormalityof

^

. Iftheconditional

distributionof b elongsto an exp onential family with mean and variance ( ), then

^

is asymptotically ecient. The conditionalvariance of given is inthat case

var ( )= ( )= ( ( ))

so that

( ( ) ( ))= ( )

AccordingtoTheorem4.1,theasymptoticvarianceof (

^

)isthen ( ) .

(41) ^ = ( )

(42) (^ )= (1)

(43)

~

0

(44) 0

(45) ( )

(46) ( ) 0

(

^

)=

( ) ( )

( )

+ (1)

Weshall applyTheorem 2.3, to concludefrom (4.1) that ^ = (1).

BecauseTheorem 2.3is on uniformlyb oundedclasses,werst verify that ^ = (1).

This follows by the same arguments as used in Lemma 3.1. Because (4.2) holds, also

(11)

R

X X

X

X 1

1 1

2

1

0

P P

P

j j j j

k k k 0 k j 0 j

k 0 k

j 0 j

0 2G

2

0 j

j 0 0 0

f 0 2G j 0 j g

k 0 k k 0 k

0

0 0 0

2 1

1

2 0 0

0

supp ort( )

0

0 0

2

1

2 2

=0

=1

2

=1

0 2

0 2 0 0

0 0

=1

0 2

1 2

=1

0 0 0 2

=1

0 0 0 0 2

n n n

n n

n

z Z

n

ns n

n n

n ns

n

ns s

n n n n

n ns s

n

i

i n i i

n

i

n i i n i i

n n

n

i

i i i

=

n

i

n i i i i i

n

i

n i i n i i i i i

A g O g O

h > g g o o

m m o

m z m z o :

: g g ;

g x;z g x;z sh x;z

s x m z sh z ;

s

:

d

ds

Q F g J g :

l l g f f g

d

ds

Q F g

n

W l T h T

n

T T l T h T I II:

y t l g t h t g ; g g ; J g C

g g o l l

o

: I

n

W l T h T o n :

II

n

g T g T f T l T h T

n

T T g T g T f T l T h T

, this implies ^ = (1), so ^ = (1).

Now,

~

0, so ^ = (1) implies

^

= (1). Hence, also

^ = (1). Assumption(4.4) ensuresthat

sup ^ ( ) ( ) = (1)

Therefore,we may withoutloss of generality assumethat

(47) ^

so that we can use (A3) and (A4).

Becauseof (4.5), we havethat

^ ( )= ^ ( )+ ( )

=(

^

+ ) +(^ ( ) ( ))

forall . Thus,

(48) [

( (^ )) (^ )] =0

Clearly, for

^

= (^ ),

^

= (^ ),

( (^ )) = 1

^

( ) ( ) 1

[^ ( ) ( )]

^

( ) ( )=

Use (A3) and Theorem 2.1,to nd that the class

[ ( )] ( ( )) ( ): ( )

satises (2.7)of Theorem2.4. Since,also by(A3), ^ = (1)implies

^

=

(1),we obtain

(49) =

1

( ) ( )+ ( )

Let us write

= 1

[(^ ( ) ( )) ( )] ( ) ( )

+ 1

[^ ( ) ( ) (^ ( ) ( )) ( )] ( ) ( )

(12)

R X

X X

X

0

P P P

P

0 0

0 0 0 0 0

0 0 0

0 0

j 0 j k 0 k k 0 k

!

0 k k

j j 0 j j

k 0 k

j j k 0 k

0 k k

j

0 k k

t u i

n i i n i i i

n n n n n

n n n

n

i

i i i

n

i

n i i i i i

n n n

=

n

= =

n

i

n i i i i

n

=

n

=

n

= =

n

=

n

ns s

n n

=

n

i

i i i n

= =

=1

0 0 2

0 0 0 0 2 0

0 0 1 0

0

=1

0 0

2

=1

0 0 0 2

0 0 0

0 0 0 2

1 2

0 0 0

1 2

2

1 2

4

=1

0 2

3 4 0

2 1 2

3 4 0

2 1 2

0 0 0

1 2

2

2 1 2

1 4

2 2

=0

2

1

1 2

=1

0 2 0 0 0

1 2

2

2 1 2

n

T T l T l T h T

III IV V:

g x;z g x;z x m z m z h x;z a z a z ;

a z a z h z m z m z

: III

n

f T l T h T

n

a Z a Z f T l T h T :

o m m o a a o

a ; E a Z f T l T h T

o n

III f l h o o n :

IV C

n

g T g T l T h T

C C g g o n ;

V C C g g o n :

: II f l h o o n :

o n

:

d

ds

J g J g J h o n :

n

W l T h T f l h o o n :

+ 1

[^ ( ) ( )](

^

( ) ( )) ( )

= + +

Observe that

^ ( ) ( ) =(

^

) + ^ ( ) ( ) =(

^

) ( )+^ ( ) ( )

where ^ ( ) ( ) =(

^

) ( )+ ^ ( ) ( ). Hence,

(410) = (

^

) 1

( ) ( ) ( )+

1

[ ^ ( ) ( )] ( ) ( ) ( )

Because

^

= (1)and ^ = (1),also ^ = (1). Moreover,for

any measurablefunction : [0 1] , ( ( ) ( ) ( ) ( )) = 0. So, according to

Theorem 2.1, combined with Theorem 2.4, the second term in (4.10) is ( ). This,

and the lawof large numb ers,yields

=(

^

)( ( ) + (1))+ ( )

Invoke (A3) and (A4) to conclude that under(4.7),

1

(^ ( ) ( )) ( ) ( )

^ = ( )

and similarly,

^ = ( )

Thus,

(411) =(

^

)( ( ) + (1))+ ( )

Finally,we note that (4.2),(4.5) and the condition = ( ) give

(412) (^ ) 2 (^ ) ( )= ( )

Combine(4.8), (4.9), (4.11) and(4.12) to obtain

0= 1

( ) ( )+(

^

)( ( ) + (1))+ ( )

Apply condition(4.6) to complete the pro of.

(13)

k

k 2 +1 P

P

k 0 k

! 1 p

0

j j

L 0

0

0 0

0

3 3

3

3 3

3

3 3

0 3 4

0

1 4

2

0 1

1 2

0

0 1

1 4 (2 +1)

1

(2 1) (4 +2)

1

1 0

1

TEP

1

TEP

1 1

TEP

0

1 1

n ;n

=

n

;n ;n

=

n ;n

;n

;n ;n

n

=

n

k = k

;n

k = k

;n

;n ;n

;n n n

n

i i n i

n

i

n i i

i

n n

n

n n

n

n n

5. Estimating the distribution of the parametric component using Wild

Bootstrap.

C C n

n

J g o n ;

: J g J h o n :

g g o

J g O n :

J g J h

O n n

J h o n J h

h g

J h n

Y

W Y T

n " ;:::;"

C " C i ;:::;n:

Y T W " i ;:::;n

Y ;T ;:::; Y ;T

3.1. Let us supp ose the assumptionsgiven there are met, and that in addition (A3) and

(A4) hold, with constants , and not dep ending on . Supp ose that also (4.3),

(4.4) and (4.6) hold uniformlyin . Replace (4.1) and (4.2)by the condition

(1+ ( ))= ( )

and replace (4.5) by

(413) (1+ ( )) ( )= ( )

Then the conclusion of Theorem 4.1 is valid, provided that we can apply Theorem 2.3 to

conclude that ^ = (1). For this purp ose, we assumein addition to the ab ove

that

(1+ ( ))= ( )

For b ounded ( ), condition (4.13) holds if ( ) is b ounded. This follows from

ourassumption = ( ). Foroptimalchoices ,for(4.13)itsuces

that ( ) = ( ), i.e. ( ) may converge to innity. This means that

weaker conditions on the smo othness of are needed than on . Furthermore, if

( ) , -consistencyof

^

can always b e guaranteed by cho osing small (i.e.

undersmo othing).

Inference on the parametric comp onent of the mo del could b e based on

our asymptotic result in Theorem 4.1. There it is stated that the distribution of

^

is

not aected by the nonparametric nature of the other comp onent of the mo del, at least

asymptotically. This statement may b e misleading for small sample sizes. An approach

which reects more carefully the inuence of the nonparametriccomp onent is b o otstrap.

We discusshere three versionsof b o otstrap. The rst version is Wild Bo otstrap which is

relatedto prop osalsof Wu(1986) (seealso Beran(1986)and Mammen(1992))and which

wasrst prop osedbyHardleand Mammen(1993)in nonparametricset ups. Note thatin

our mo delthe conditional distributionof is not sp eciedb esides (1.1) and (A0).

The WildBo otstrap pro cedureworks asfollows.

S 1. Calculate residuals

^

= ^ ( ).

S 2. Generate i.i.d. random variables with mean 0, variance 1 and

whichfulll fora constant that (a.s.) for =1

S 3. Put = ^ ( )+

^

for =1 .

S 4. Use the (pseudo) sample (( ) ( )) for the calculation of the

parametric estimate

^

.

S 5. The distribution of

^

is estimated by the (conditional)distribution

(given ( ) ( )), of

^ ^

.

(14)

P

1 3

3 0

P

P i

i;

j

j j

L L !

k 0 k

0

k k

j

Theorem 5.1.

Proof.

0 0

[ ]

1 1

0

1 2

=1

1 2

0

[ ]

1

3

3 3

3

j j

3

3 3

3 3 3

3 3

2A

3

0

3

0

j j 0

i

n i

i i

i

" =C

i i

n n n n

K n

n

K

n

n n n

n

i i

n i

i

i n i

i

i; i;

a

n

i i;j i

=

n

=

i

W =C

n

Y T t V t

Y T

V T " i ;:::;n "

" C

E e C i ;:::;n

Q y

Y T

n Y ;:::;Y T ;:::; T

d ;

d

: g g O

: J g O :

W Y T

W W " T T "

W W :

j

:

=n W a T

a

O n :

j "

i ;:::;n C

E e T ;:::;T C ;

Assume that conditions(A0)-(A4) are met. Incase of application of

the secondor third versionof b o otstrap assumethat thejustmentionedadditional mo del

assumptionshold. Then

in probability. Here denotes the Kolmogorov distance(i.e. the sup norm of the corre-

sp ondingdistributionfunctions).

var ( = )= ( ( ))

we prop ose the following mo dication of the resampling. In Step 3 put = ^ ( )+

(^ ( )) for = 1 . In this case the condition that is b ounded can b e

weakenedtotheassumptionthat has sub exp onentialtails,i.e. fora constant itholds

that ( ) for =1 (compare (A0)).

Inthesp ecialsituationthat ( ; )isthelog-likeliho o d(asemiparametricgeneralized

linearmo del), theconditionaldistributionof issp eciedby ( ). Thenwerecommend

to generate indep endent with distributions dened by ^ ( ) ^ ( ),

resp ectively. This is aversion of parametric b o otstrap. Thefollowingtheoremstates that

these three b o otstrap pro cedureswork (for their corresp ondingmo dels).

( ) 0

Wewillgiveonlyasketchofthepro offortherstversionofresampling(Wild

Bo otstrap).The pro of forthe other versionsis more simple and followssimilarly.

We have to go again through the pro ofs of Lemma 3.1 and Theorem 4.1. We start

with proving

(51) ^ ^ = ( )

and

(52) (^ )= (1)

We write rst for = ^ ( )

= +( ( ) ^ ( ))

= +

Inthe pro ofof Lemma 3.1the main ingredientfrom empiricalpro cesstheory wasformula

(2.4) (see (3.7)). We argue now that the following analogue formulas hold for =1 and

=2

(53) sup

(1 ) ( )

= ( )

For = 1 equation (5.3) follows from the fact that b ecause of the b oundedness of

for =1 , wehave that thereexists a constant with

( )

(15)

2 i;

P

P P

P

3 00

00

j j

3

p

3

3 3

3

0 P

P

6. Examples.

Example 1.

fj 0 j

g

j

2 0

j j j 0

2

p

0

k k

L !

k k

! k k

t u

j

0

[ ( )]

1

=1

0 2

0 0 1 2

2 2

2

2 1

=1 2

0 2

2 2

0 0 1 2

2 4

2

0 0 1 2

2 4

0 2

2 2

0 0

0

n i n i

W = CC

n

n n n n n

n

n n

n n n

n

n n

i i

=

n;s

n;s n

K

n n

n

i i

=

n

=

i

i i

n n n

j C A T T

C i ;:::;n

E e T ;:::;T e;

A

U

V U U O c c < <

B

V B > ;

U Cc V v >

v B

: n

W l T h T

f l h

o :

g g g sh

d N ;n ; ;

W l T h T

f l h

:

f l h E W l T h T

Y X m Z W;

E W X ;T g x;z x m z

W

For =2wehaveforevery constant that onthe event = ( ) ^ ( )

: =1 the followingholds

( )

almost surely. Becausethe probability of tends to one,we arriveat (5.3).

We would like to make here the following remark for two random variables and

. If fulls = ( ) fora sequence then this implies that forevery 0 1

there existsa set and aconstant C with

( ) 1

( = ) 1

for . This remark may help to understandwhy we can continue as in the pro of of

Lemma 3.1 to show(5.1) and (5.2).

The next step is to showthat

(54) (

^ ^

)=

( ) ( )

( )

+ (1)

For seeing (5.4) we pro ceed similarly as in the pro of of Theorem 4.1. In particular, we

replace ^ by ^ =^ + .

Now oneapplies (5.4)for the pro of of

( (0 ^ ) ) 0

(in probability),where

^ =

( ) ( )

( )

Because of ^ ( ) ( ( ) ( ) ) (in probability) we get the

statementof the theorem.

Recall that in this case,

= + ( )+

where ( ) = 0, and that ^ ( ) =

^

+ ^ ( ) is the p enalized least squares

estimator. Invan de Geer(1990), Lemma3.1 has b een proved underthe condition(AA0)

that the error in the regressionmo del is sub-Gaussian,using the same approach as in

the pro of of Lemma 3.1. Condition (AA0) can b e relaxed to (A0), as a consequence of

Theorem 2.2. This is in accordance with earlier results on rates of convergence (see e.g.

Rice and Rosenblatt (1981)and Silverman (1985).

(16)

0 0

0

0 n

t y

j 0 j

0 2

2

j

2

j 0 j

1 1 2 2

0

0 0 0 0

2

0 0

()

0 0 0 0 0

2

1 1 2 2

0 0 0 0 0

Example 2.

R

Example 3.

R

7. Rates of convergence for Example 2.

h h h h W

P Y X ;Z P Y X ;Z F X m Z ;

V s s s s ;

F ; :

f V F ; ;

l

Y T t

p yt t ; y> ;

t = t x;z F x m z

F ; :

V s =s s >

f ;

V F ;

l ;

fl

h h h h

P Y X ;Z P Y X ;Z F X m Z F g T ;

furtherthat =

~

and =

~

. If isnormallydistributed,thenaccordingtoTheorem

4.1, the partial smo othing spline estimator

^

is an asymptotically ecient estimator of

.

Inthis case, we have

( =1 )=1 ( =0 )= ( + ( ))

and ( ) = (1 ), (0 1). Let us consider the common choice

( )= e

1+e

Then

( )= e

(1+e )

= ( ( ))

so that 1. We cannot use Lemma 3.1, b ecause (A1) is not satised. Therefore, we

present a separate pro of of (3.1) and(3.2) in the next section. Since conditions (A3) and

(A4) are met, Theorem 4.1 can then b e applied.

Let us assumethat the conditionaldensity of given = is

( )= ( )e 0

with ( )=1 ( ), ( )= ( + ( )), and with

( )=e

Take ( )=1 , 0. Then

( )=e

( ( ))=e

and

( )=e

. Observe that(A0) is met. Again, we cannot applyLemma 3.1, b ecause (A1) and

(A2) only hold on a b ounded set. So if we show by separate means that the parameters

are in a b oundedset, then the result of Lemma 3.1 follows immediately. Conditions (A3)

and (A4) hold, so asymptotic normalitywould also b e implied by this. Note that 1,

so as in Example1, =

~

and =

~

.

Consider the mo del

( =1 )= 1 ( =0 )= ( + ( ))= ( ( ))

(17)

0

0 1

0

1

1 n

>

=k

n

k

n

w

>

=k

n 0

5

0 1

2

+1

2 2

+1

2

0 1

2

R

Lemma 7.1.

Proof.

R

Undercondition (A5), we have

2G f 2 1g

! 0 2

1

j j 2

f 2Gg k1k 1

2G

2 j j

f 2 g

ff g 2 g

f 2 g k1k

f gf j j g

f g j1j 1

j1j k 1 k

g g x;z x m z ; ; J m < ;

F ; V s s s s ;

<C <

A f C ; :

H ;

F g

J g

g ; < :

g

g g ;

g J g J g g

F g :

F

s;t s F t g t

: N ; F g ; A ; > ;

A w F k g n

g g

g

J g

;

s s

g g h h ; J h ;

: H ; g g ; < :

n

= ( )= + ( ) ( )

and : (0 1) given. Furthermore, take ( ) = (1 ), (0 1). Assume thatfor

some 0 ,

( 5) ( ) for all

supsup (

( )

1+ ( )

: )

We canwrite for ,

= +

with ,and ( )= ( )(see Section2.2). Now,let ~b e axed function

and consider the class

( +~):

Since is of b ounded variation, the collectionof graphs

( ): 0 ( ( )+~( )) :

is a Vapnik-Chervonenkisclass, i.e. it forms a p olynomial class of sets (see Pollard (1984,

Chapter I I) fordenitions). Therefore (Pollard (1984, Lemma I I.25)),

(71) ( ( +~): ) forall 0

where the constants and dep endon and , but not on ~ and . (Here, we use the

fact that the class is uniformly b oundedby 1.)

Denefor = + ,

( )=[

1

(1+ ( )) ]

where [ ] denotes the integerpart of 0. Then

( ) : 1 ( ) 1

so by Theorem 2.1,

(72) sup ( ( ) )

Of course, if we replace here the -norm by the -norm, the result remains true

and holds uniformlyin .

(18)

X X P

P

0 0

0

0 0 0 0

Lemma 7.2.

Proof.

2

(

5

6

0

6

2

0

=1

0

=1

0

2 2 2

0 Supp ose(A5) and (A6) hold true. Then

and

2G

k 0 k

k 0 k j 0 j k 0 k

j 0 j j 0 j tu

1

0 2

k 0 k

2G

0 0

0

0 0

j j

j n

j

j n

j

j n

j

j n

j

j n

n n n

n

n n n

n

i i

n i

i

n

i

n i

i

n n n

n

g g g

g h

g g h ;

F

h

F

h

:

F g

J g

F

h

F g F

h

J g

g F

h

F

h

C ;

F F C

<C <

A

C

F g t

C

; t ; :

: F g F g O ;

: J g O :

F g F g F g = ; g :

g

Q F g Q F g

n Y

F g T

F g T n

Y

F g T

: Q F g Q F g J g J g :

Together,(7.1)and(7.2)give therequiredresult. To seethis,let , = + ,

and let = ( ). Supp osethat is such that

( )

and that is such that

( + ) ( + )

Then

( + )

1+ ( )

( + )

( + ) ( + ) +

1

1+ ( )

)+ ( + ) ( +

+ +

since ( ) (

~

)

~

, by condition (A5).

The entropy result of Lemma 7.1 can b e applied to establish a rate of convergence

in the same way as in Lemma 3.1. For this purp ose, we need the assumption: for some

constant0 ,

( 6)

1

( ( )) 1 1

for all [0 1]

(73) (^ ) ( ) = ( )

(74) (^ )= (1)

Dene

( )=( ( )+ ( )) 2

By the concavity of the log-function,and the denition of ^ ,

(

(^ ))

( ( ))= 1

log(

(^ ( ))

( ( )) )+

1

(1 )log ( 1

(^ ( ))

1 ( ( )) )

(75)

1

2

( (^ )) 1

2

( ( )) 1

2

( (^ ) ( ))

(19)

0

P

P X

s

X

s

X

p

q

p

X

p

q

p

q

p

q

p

p p

p

P

p

p p

p

p p

q

p

q

p 0

=1

0

=1

0

=1

0

=1

0

0 2

0 0

6

0

1

=1

0 0

0

1 1 (2 )

1 (2 )

1 2

0

n n n

n

i i

n i

i

n

i

n i

i

n

i

n i i

n

i

n i i

n

n n

i

i n i i i

n

= k

n

= k

=

n

n n n

0

0 0 0

0

0 0 0

0k 0 k 0k 0 0 0 k

2G

j 0 j

p

j 0 j

f

0

2Gg

0

k 0 k

0 0

k 0 k

k 0 0 0 k

t u

s s s

Q F g Q F g

n Y

F g T

F g T n

Y

F g T

:

n

W

F g T

F g T F g T

n

W

F G T

F g T F g T

F g F g F g F g :

W g;g

F g F g

F g

F g F g

F g

C

F g F g ;

=k

F g F g

F g J g

g :

W F g T F g T = F g T

F g F g J g

O n :

F g F g

J g O

: F g F g O ;

: F g F g O :

On the other hand,since log ( )=2log ( ) 2( 1),

(

(^ ))

( ( )) 2

(

(^ ( )

( ( ))

1)+ 2

(1 )(

1

(^ ( ))

1 ( ( )) 1)

(76) =

2

( ( )) (

(^ ( )) ( ( )))

+ 2

1 ( ( ))

( 1

(^ ( )) 1 ( ( )) )

(^ ) ( ) 1

(^ ) 1 ( )

Thecombinationof (7.5) and (7.6)gives an inequalityof thesame formas inequality

(3.8) in the pro of of Lemma 3.1. Moreover, we can invoke Lemma 7.1 in Theorem 2.2.

First of all, condition (AA0) holdsfor . Furthermore,for each ~ wehave

( )

(~)

( )

( ) (~)

2 2 ( ) 2 2

( ) (~)

by (A6). So the entropy condition (2.3) with =1 holdsfor the class

( ) ( )

( ) (1+ ( )) :

Thus,

(

(^ ( )) ( ( )) ) ( ( ))

(^ ) ( ) (1+ (^ ))

= ( )

Similarresultscanb e derivedfor( 1

(^ ) ( ) ). So, pro ceedingas in thepro of

of Lemma 3.1,we nd (^ )= (1), and

(77)

(^ ) ( ) = ( )

as wellas

(78) 1

(^ ) 1 ( ) = ( )

Clearly, (7.7) and (7.8)yield (7.3).

(20)

0

P

k 0 k

1

n n

g g >

n n n

J g C

n

p

1

2

j j j 0 j

k 0 k

1

jk 0 k 0k 0 kj

k 0 k

j 0 j tu

Lemma 7.3.

Proof.

8. References.

14

140

73

16

90

12

< ;C <

t ; g t

A f

C

:

: F g F g > > :

g g O :

<C <

F g F g F g F g o ; :

F g F g o

: g g o :

g g o

W

0

0 7

2

0 0

7

0 0

0

( )

0 0

0

ERAN

IRG

E ASSART

IRMAN OLOMJAK

HEN

UDLEY

AN ECKMAN AND

IN

E INN

Supp osethat

Then, under conditions(A5),(A6),( A7),(4.3) and (4.4), wehave

Ann.Statist.

Techn. Rep ort

Mat. Sb ornik

Ann.Statist.

Ecole d'Etede Probabilitesde

St. Flour,1882, Lecture Notes in Math.

J. Amer. Statist. Asso c.

Ann.Probab.

need an identiability condition. Assume that forsome constants0 andfor

all [0 1] , we have for = ( ),

( 7) ( )

1

forall

(79) inf ( ) ( ) 0for all 0

^ = ( )

Due to Lemma 7.1 and a result of e.g. Pollard (1984, Theorem I I.24) on

uniformlaws of large numb ers,we have forall 0 ,

sup ( ) ( ) ( ) ( ) = (1) almost surely

So (^ ) ( ) = (1). By (7.9), this implies

(710) ^ = (1)

As in the pro of of Theorem 4.1, we see that (4.3) and (4.4), together with (7.10), yield

^ = (1). Application of (A7) and Lemma 7.2 completes the pro of.

[1] B , R. (1986). Comment on "Jackknife, b o otstrap,and other resampling metho ds

in regressionanalysis"by C. F. J. Wu. , 1295-1298.

[2] B , L. and M , P. (1991). Rates of convergence for minimum contrast esti-

mators. ,UniversiteParis 6.

[3] B , M.

S. and S , M.J. (1967). Piece-wise p olynomial approximationsof

functionsof the classes . , 295-317.

[4]C ,H.(1988). Convergenceratesforparametriccomp onentsinapartlylinearmo del.

, 136-146.

[5]D ,R.M.(1984). A courseon empiricalpro cesses.

1-122. Springer,Berlin.

[6] F , J. , H , N. E. and W , M. P. (1995). Lo cal p olynomial regressionfor

generalized linear mo dels and quasi-likeliho o d functions. ,

141-150.

[7] G , E. and Z , J. (1984). On the central limit theorem for empirical pro cesses.

, 929-989.

(21)

L 32

55

21

77

15

33

4

89

47

50

45

18

61

14 REEN

ARDLE AMMEN

AMMEN

C ULLAGH ELDER

DEN EDDY

SSIANDER

2

OLLARD

ICE OSENBLATT

ICE

EVERINI TANISWALIS

ILVERMAN

PECKMAN

VANDE EER

VAN DE EER

AHBA

EDDERBURN

U

Pro ceedings2nd internationalGLIM conference. LectureNotes in Statistics

Int. Statist. Rev.

Ann.Statist.

Lecture Notes inStatist.

Generalized Linear Mo dels.

An Intro duction tothe Mathematical Theoryof

Finite Elements.

Ann.Probab.

Convergenceof Sto chastic Pro cesses.

EmpiricalPro cesses: Theory and Applications.

J. Approx. Theory

Statist. Probab.

Letters

J. Amer. Statist. Ass.

J. Roy. Statist. So c. Ser. B

J. Roy.Statist.So c.

Ser. B

RegressionAnalysisandEmpiricalPro cesses.

Ann. Statist.

Techn.

Rep ortTW

Statistical Analysis of Time Series

Biometrika

Ann.Statist.

,Springer

Verlag, New York,44-55.

[9]G ,P.J.(1987). Penalizedlikeliho o dforgeneralsemi-parametricregressionmo dels.

, 245-60.

[10] H , W. and M , E. (1993). Comparing nonparametric versusparametric

regressionts. , 1926-1947.

[11]M ,E. (1992). When do esb o otstrapwork: asymptotic resultsand simulations.

, Springer,Berlin.

[12] M C , P. and N , J.A. (1989). (2nd edit.)

Chapman and Hall, London.

[13]O , J.T. andR ,J.N. (1976).

Wiley, New York.

[14] O , M.(1987). A central limit theorem under metric entropy with brack-

eting. , 897-919.

[15] P , D.(1984). Springer,New York.

[16] P , D. (1990). NSF-CBMSRe-

gional Conference Series in Probability and Statistics 2.

[17] R , J. and R , M. (1981). Integrated mean square error of a smo othing

spline. , 353-369.

[18] R , J. (1986). Convergence rates for partially splined mo dels.

,203-208.

[19] S , T.A. and S , J.G. (1994). Quasi-likeliho o d estimation in semi-

parametric mo dels. , 501-511.

[20]S ,B.W. (1985). Some asp ectsof thesplinesmo othingapproachtononpara-

metric regressioncurve tting(with discussion). ,1-52.

[21]S , P.(1988). Kernelsmo othinginpartiallinearmo dels.

,413-436.

[22] G ,S.(1988). CWItract ,

Centrefor Mathematicsand ComputerScience, Amsterdam.

[23] G ,S.(1990). Estimating aregressionfunction. ,907-924.

[24] G , S. (1995). A maximal inequaltiy for the empirical pro cess.

95-05, University of Leiden.

[25] W , G. (1984). Partial spline mo delsfor the semi-parametricestimation of func-

tions of several variables. In 319-329. Institute of

Statistical Mathematics,Tokyo.

[26]W ,R.W.M.(1974). Quasi-likelihoo dfunctions,generalizedlinearmo dels,

and the Gauss- Newton metho d. , 439-447.

[27]W ,CF.J.(1986). Jackknife,b o otstrap,andother resamplingmetho dsinregression

analysis. ,1261-1350.