0
1 2
d
d d
2
j
2 2
NNO AMMEN
ARAVAN DE EER
0 0 0
0
1 1 2 2
0
0 0
1 2
0
0
0 By
and
AMS 1991subject classications.
Key words and phrases.
Y x;z F x m z
m
Y ;T ; Y ;T ;::: Y;T Y
T Y;T P
t E Y T t ;
Y T t
T X ;Z X Z d d d
: x;z F x m z ;
Abstract.
1. Introduction.
R
R R
E M
Institut f ur Sto chastik
Fachb ereichMathematik
Humb oldt-Universitat zuBerlin
Unter den Linden 6
10099Berlin
Germany
S G
Mathematical Institute
University of Leiden
P.O. Box 9512
2300RA Leiden
The Netherlands
Consider a partial linear mo del,where the exp ectation of arandom vari-
able dep endson covariates ( ) through ( + ( )), with an unknownparam-
eter, and an unknown function. We applythe theory of empirical pro cesses to derive
the asymptotic prop ertiesof the p enalized quasi-likeliho o destimator.
Primary62G05, secondary62G20.
Asymptotic normality, p enalized quasi-likeliho o d, rates of
convergence.
Let ( ) ( ) b e indep endentcopies of ( ), where
isa real-valuedrandomvariable and . Denotethedistributionof( )by ,and
write
( )= ( = )
forthe conditionalexp ectation of given = . Inthis pap er,weshall study thepartial
linear mo del,where =( ), , , + = , and
(11) ( ) = ( + ( ))
* This research was supp orted by the Deutsche Forschungsgemeinschaft, Sonderfor-
schungsbereich 373 \Quantikation und Simulation
Okonomischer Prozesse", Humb oldt-
Universitat zu Berlin and the Europ ean Union HumanCapital and Mobility Programme
ERB CHRX-CT 940693.
1
0
Z
Z
X
R R R
R d
k
k
y
n
n
i
i
# #
! 2
2
f 1g
2G f 2 1g
2G
0
! 1 01 1
j 0
0
0 0
0
1 2
2
0
2
1
0
( ) 2
( )
0 0
0
1
=1
=
^
0
F
m
x
F Y
d d
T X ;Z T ; m
m J m <
: J m m z dz:
k m k m
F g ;
g g x;z x m z ; J m < :
g g x;z x m z J g J m
: Q y
y s
V s ds;
V a;b ; a <b
V Y
V Y V V
# Y ;:::;Y
# V #
d
d#
Q Y # :
V Y
Y
with : a given function, an unknown parameter, the transp ose of
, and an unknownfunction in a given class of smo oth functions. Mo del (1.1) oers
a exible approach. The inclusion of the linear comp onent allows discrete covariates.
The link function may b e useful in case of a b ounded variable (see for instance
Example 2b elow, where binaryobservations are considered).
For simplicity, we shall restrict ourselves to the case = = 1. We shall assume
that =( ) hasb ounded supp ort,say [0 1] , and that isin theSob olev class
: ( ) , where
(12) ( )= ( ( ))
Here, 1 is a xed integer, and denotes the -thderivative of the function . In
summary,the mo delis
= ( )
with
= ( )= + ( ) : ( )
For , ( )= + ( ), we shall often write ( )= ( ).
Denethe quasi-(log-)likeliho o dfunction
(13) ( ; )=
( )
( )
with a known function :( ) (0 ), , given. The quasi-likeliho o d
function was rst considered by Wedderburn(1974). Prop erties of quasi-likeliho o d func-
tions are discussed in McCullagh (1983) and McCullagh and Nelder (1989). There, the
function hasb een chosenas the conditionalvariance of theresp onse , and ithas b een
assumedthat dep ends only on the conditionalmean of , i.e. = ( ). The quasi-
likeliho o d approachis a generalization of generalized linear mo dels. The log-likeliho o d of
anexp onentialfamilyis replacedby a quasi-likeliho o d,in whichonly therelation b etween
theconditionalmeanandthe conditionalvariancehas tob e sp ecied. To seetherelations
of thequasi-likeliho o d functionswith generalized linear mo delsnote forinstance that the
maximum likeliho o d estimate
^
based on an i.i.d. sample froman exp onential
family with mean and variance ( ) is given by
( ; ) =0
In this pap er we do not assume that ( ) is the conditional variance of . The only
assumptions on the distribution of we use in this pap er concern the form of the con-
ditional mean (see (1.1)) and sub exp onential tails (see (A0), b elow). In particular, our
results may b e used in case of mo del missp ecication.
X
P P
2G
0
0
0
R
Example 1.
2 0
2G
2 1
! 1
0 0
2 1
f g
p
1
p
!1
p
2 2
=1
0
1 4 (2 +1)
2
1
1 1
2
1 2
1
1
1
1 1
(2 +1)
1
1 0
n
n
g
n
n
n
n
i
i i
n
n n n n n
n n
n n
n
n
=
n
k = k
n n
n n n
n n
n
=
n
n
n
n n
n
k = k
: g Q F g J g ;
Q
n
Q Y T :
g
g x;z x m z J m <
F g
g
n
o n = O n
F V Q y y =
g
m Y ;:::;Y
X Z z h z z ; J h <
O
o n O =n
m k
h n
J h <
h n
h n J h
n
n J h
h m
V s
The p enalized quasi-likeliho o destimator is dened by
(14) ^ argmax[
( ( )) ( )]
where
( ) = 1
( ; ( ))
Throughout, we assume that indeeda solution ^ of the maximizationproblem (1.4)
exists. Then ^ ( ) =
^
+ ^ ( ), where
^
, and (^ ) . The estimated
conditional exp ectation is ^ = (^ ).
Generalized linear mo dels of the form (1.1) have rst b een considered by Green and
Yandell (1985) and Green (1987). The generalization to quasi-likeliho o d mo delshas also
b een studied in e.g. Chen (1988),Sp eckmann(1988) and Severini and Staniswalis(1994).
Thesepap ershoweverusedierent estimationpro cedures,suchasp olynomialapproxima-
tionorkernelsmo othing. Lo cal p olynomialsmo othingbasedonquasi-likeliho o dfunctions
is discussedin Fan, Heckmanand Wand (1995).
Our main aim is to obtain asymptotic normality of the p enalized quasi-likeliho o d
estimator
^
of , but rst we derive a rate of convergence for ^ . The asymptotic
prop ertiesoftheestimatorsdep endofcourseontheb ehaviourof thesmo othingparameter
as . It may b e random (e.g. determined through cross-validation). We assume
= ( ), and (1 )= ( ).
The followingexample is an imp ortantsp ecial case.
Let b e the identity,and 1. Then ( ; ) = ( ) 2,so that
^ is thep enalized least squaresestimator. It is called apartial smo othingspline. If is
non-random,
^
and ^ arelinearin . Seee.g.Wahba(1984),Silverman(1985).
Denotetheconditionalexp ectationof given = by ( ), [0 1]. If ( )
and is of the order given ab ove and non-random, then the bias of
^
is ( ) =
( ), whereas its variance is (1 ). This is a result of Rice (1986). It indicates
that the smo othness imp osed on ^ (in terms of the numb er of derivatives ) should
not exceed the smo othness of . In Theorem 4.1, we shall prove -consistency and
asymptotic normality of
^
under the condition ( ) . InRemark4.1, we showthat
incaseof roughfunctions , -consistencyof
^
canb eguaranteedbyundersmo othing.
Moreprecisely,there we allowthat dep endson andthat ( ) . We showthat
^
is -consistentandasymptoticallynormal,aslongas ischosensmallenough. Even
forthe optimalchoice , ( ) may tendto innity. This showsthatmuch
less smo othness is neededfor than for .
Thetheory forgeneralp enalized quasi-likeliho o destimators essentially b oilsdownto
that for Example 1, provided one can prop erly linearize in a neighb ourho o d of the true
parameters. For thispurp ose, we rst needto proveconsistency,which is notto o dicult
if ( ) stays away from zero. Unfortunately, this is frequently not the case, as we see
in Examples 2 and 3 b elow. In Section 7, we shall employ an ad ho c metho d to handle
0
0
P P
Z
X 0
j j
0
0
0
2 f g j 0
2
2 1
2
2
0 0
1
j
j j
n
n
n
=
n
k = k
i i i
W =C
j
j
k
k
j j
j
0 0
2
2
1 4 (2 +1)
0 0
0
0
[ ]
0
1
+1
0
1 2
1
=1
1
2 2
1 2
Example 2.
Example 3.
2. Main assumptions,notation and technic al tools.
2.1. Main assumptions.
R
2.2. Notation.
Y ; P Y T t F g t V s s s
s ; g
Y ; V s =s s > Q y
=
X ;Z ;
: o n ; = O n :
f dF =d
W Y T W Y T i ; ;:::
<C <
A E T C ; :
x;z z j ;:::;k x;z x
A dP
m z m z m z ;
m z z ;
m z J m J m g x;z x m z
g x;z g x;z g x;z ;
asymptotic normality are relatively inno cent, but proving consistency can b e somewhat
involved.
Let 0 1 , ( = 1 = ) = ( ( )), and let ( ) = (1 ),
(0 1). In this case, the quasi-likeliho o d is the exact likeliho o d, so that ^ is the
p enalized maximum likeliho o destimator.
Let (0 ), and ( ) = 1 , 0. Then ( ; ) is the log-
likeliho o d corresp ondingto the exp onential distributionwith parameter1 .
This pap er can b e seen as a statistical application of empirical pro cess theory as
consideredinDudley(1984),GineandZinn(1984),Pollard(1984,1990),Ossiander(1987),
and others. Some concepts and results in this eld are presentedin Section 2. In Section
3, rates of convergenceare obtained, and Section 4 uses the ratesto establish asymptotic
normality. InSection5,wediscussb o otstrappingthedistributionof
^
. Examples1-3are
studied in Section 6,and Section 7 revisits Example2.
We recall the assumption( ) [0 1] ,and
(21) = ( ) 1 = ( )
We also supp osethroughout that ( )= ( ) exists forall .
Write = ( ) ( = ( ), = 1 2 ). The following condition is
essentialin Section3: forsome constant 0 ,
( 0) (e ) almost surely
Let ( )= , =1 , and ( )= . We assumethat the matrix
=
is non-singular. Here, denotes the transp oseof .
By the Sob olev-emb eddingTheorem, one can write
( )= ( )+ ( )
with
( )=
and ( ) ( )= ( )(seee.g.OdenandReddy(1976)). Sofor ( )= + ( ),
( ) = ( )+ ( )
0
0
2
2
0 X
R
X
X
1
2
1 1
!1
k
j
j j k
n
n
i
i i
n
n
i
i
t ;
k
N j
B
L
j U
j N
j
L
j U
j
L
j
U
j
B B
>
=k
W =C
n >
B n
1
+1
=1
+1
1 2 2 2 2
2
0 0
2
0
2 2
=1 2
2
2
2
0
2 2
=1 2
[01]
+1 2
1
=1
0 1
0
0
[ ]
0
2
2
0
R R
R
R
Denition.
2.3. Technical tools.
Theorem 2.1.
Proof
Theorem 2.2.
R
j j
2 !
k k k k
!
2
k k k k
j j j j
2 k k
A L
A A
L 2 A
2 f g A
f g L
2 A 2 f g
A A
A A
1
f 2G j j g j1j 1
t u
1
j
A !
2
A k1k 1
For each we have
Write (AA0) for the assumption that given , is (uniformly)
sub-Gaussian,i.e. forsome constant
almost surely
Let b e a uniformly b ounded class of functions dep ending only on
. Let . Supp osethat either (A0) holdsand
almost surely
g x;z x;z ; ;
g g x;z m z g x;z J g J m
a ; E a Y;T adP
a Y;T
a E a Y;T ; a
n
a Y ;T :
a ;
t ;
a E a T ; a
n
a T :
a a t ;
;
N ; ; N
a ;:::;a a a;a
j ;:::;N N ; ;
N a ;a a ;a
j ;:::;N a j ;:::;N a a
a H ; ; N ; ;
H ; ; N ; ;
<C <
sup H ; g g C ; J g C ; < :
T W
<C <
AA E T C ; :
a ;
t ; < <
: H ; ; < ; ;
( )= ( ) =
i.e. = , and ( ) = ( ), ( ) ( )= ( ).
For a measurable function : [0 1] , ( ( )) = denotes the
exp ectation of ( ) (whenever it exists) and
= ( ) =
1
( )
With a slight abuse of notation, we also write for : [0 1] dep ending only on
[0 1] ,
= ( ) =
1
( )
Moreover,
= sup ( )
and for , = .
Let b e a subset of a (pseudo-)metric space ( ) of real-valued functions.
The -covering numb er ( ) of is the smallest value of for
which there exist functions in , such that for each , ( ) for
some 1 . The -covering number with bracketing ( ) is the smallest
value of for whichthere exist pairsof functions [ ] , with ( ) ,
= 1 , such that for each there is a 1 such that
. The -entropy ( -entropy with bracketing) is dened as ( ) = log ( )
( ( )=log ( )).
0
( : ( ) )
. See Birmanand Solomjak (1967).
0 ,
( 0) (e )
: [0 1]
[0 1] 0 2
(22) limsupsup ( )
P
0
1
2+
1
2+
P
0
@
1
A
X
! 0
=1
1 2
1 2
0
0
=1
0
!1
2A
0
0
0
!1
2A k k
!1
2A k k n
>
n
a
n
i
i i
n
=
=
n
n
>
B
n
a ; a >Cn
n
>
B
n
a ; a
n
i
i i A k 1k 1
k k _
t u
k 1 k k 1 k
k1 k k1k
A
A k 1k 1
1
j k k
k k 0 j
t u
A k 1k 1
j p
0 j
A
t u
2 Proof.
Theorem 2.3.
P
Proof.
Theorem 2.4.
P
Proof.
3. Rates of convergence.
R
: H ; ; < ; :
:
=n W a T
a n
O n :
< <
: H ; ; < :
> <C <
:
a
a
> :
< <
: H ; ; < :
> >
:
n
a Y ;T P a <:
l
f
V F
; ;
almost surely
Then
Supp ose is uniformly b oundedand that forsome ,
Then forall thereexists a suchthat
Supp ose that forsome ,
Then forall thereis a suchthat
(23) limsupsup ( )
(24) sup
(1 ) ( )
( )
= ( )
It is shownin van de Geer (1990) that (AA0) and (2.3) imply(2.4). Similar
argumentsas there, combined with e.g. a result of Birge and Massart (1991,Theorem 4),
show that (AA0) can b e relaxed to (A0), provided (2.3) is strengthened to (2.2) (see also
van de Geer (1995)).
The following theorem will b e used in Section 4, only to show that consistency in
-norm implies consistency in -norm. Nevertheless, we present it in its full
strength, sothatone canverify thatthe ratesforthe -normand -normcoincide.
0 2
(25) sup ( )
0 0
(26) limsup sup 1 =0
Seevan de Geer (1988,Lemma 6.3.4).
0 2
(27) sup ( )
0 0
(28) limsup sup
1
( ( ) ( ))
Condition (2.7) ensures that is a Donsker class, and (2.8) is the implied
asymptotic equicontinuity of the empirical pro cess. See e.g. Pollard (1990) and the refer-
ences there forgeneral theory on Donsker classes.
Dene
( )=
( )
( ( ))
0
0
0 n
n i
i P
P
P
P
P Z
X X
Z
X
P
R
Lemma 3.1.
Proof.
Supp ose(A1) and (A2) are met. Then
and
2
j j 2
k 0 k
2G
0 0 0
0
0 k 0 k 0 k 0 k
0
0 0
k 0 k k 0 k
k 0 k
1 2
1
2
2
0
0
( )
^ 0
0
=1
0
=1
^ ( )
( )
0
0
=1
2 1 2
0
1
0 2
=1 2
2 2
0
2 2 2
0
0 2
0
0
n n n
n
g
F g
y
n g g
n n n
n
i
i n i i
n
i
T
T
i
n n n
n
i i
=
n n n
n
n
i i
n n n
n
n n n
n
n
n
n
n n
n n
A V s = C ; s a;b ;
A
C
l C ; :
f V F
: g g O ;
: J g O :
y
V s
ds; g ;
Q Q
n
W T T
n
s T
V s
ds:
: Q Q
n
W
C
:
=n W O
F g Q F g J g
: Q Q J g J g o :
O o ;
O
( 1) ( ) 1 forall ( )
and
( 2)
1
( ) for all
Clearly, (A1) and (A2) hold in Example1, where = ( ) 1. Infact, under (A1) and
(A2), theratesproblemessentially reducestotheone ofExample1. Itis p ossibleto avoid
these assumptions, as weshall illustrate in Section 7.
(31) ^ = ( )
(32) (^ )= (1)
For a xed , we write
=
1
( )
and ^ = , = . We have
(^ )
( )= 1
(^ ( ) ( ))
1 ( ( ))
( )
So, by the Cauchy-Schwarzinequality and (A1),
(33)
(^ )
( ) ( 1
) ^
1
2
^
Note that (1 ) = (1) almost surely (by (A0)). On the other hand, b ecause
^ = (^ ) maximizes
( ( )) ( ), we have
(34)
(^ )
( ) ( (^ ) ( )) (1)
The combination of (3.3) and (3.4) gives
^ ^ (1)+ (1)
whichimplies ^ = (1).
R
X
P
X 0
0
0
0
0 k
k
k
k
k
k
P P
P
P
P
P P
P
P
P 1
1
1 1
1 1
1
1
1
0 0
0
0
0
0 0
2 1
2(2 +1)
1
2
1
2
2 1
2(2 +1) 2
~ 2
2
0
1 2
1 2
1 2
0
=1
1
~
2
0 1
0 2 0
=1
0
0
1 1 (2 )
1 (2 )
1 2
2 2 2
0 0
=1
0
1
0 2
0 1
1 2
g g
n n n n
n n
n n n
n
n
n n
n n
n
n n
n
n n
n
n
i
i i
n n
n n n n
n n g g
>
=k
g g g
n n n n
n
i
i n i i
n
= k
n n
= k
n
=
n
n n n n
n
i
i n i i n
n
n n n n
=
j 0 j j 0 j j 0 j
2 2G k 0 k k k
j j
j j
k k
k k k k
!
k k
j j j j
j j j 0 j
j 0 j 2G
f 0
2G
j j
g j1j 1
k 0 k k 0 k
0
k 0 k _
0 0
0 0 k 0 k
k 0 k _
:
C
g t g t t t C g t g t ;
t ; g;g g g O g O
g = J g O
g g g ;
g g J g
:
g
J g
g
J g
g
J g
O :
A dP
n
T T A; :
= J g O T
g = J g O g = J g O
= J g O
C g g g;g
H ;
J g
g ;
J g
C ; < :
=C g g
:
=n W T T
g g J g J g n
O n :
J g J g Q Q
:
n
W T T
C
g g J g J g n O n
(35)
1
( ) ~( ) ( ) ( ) ( ) ~( )
forall [0 1] and all ~ . Soalso ^ = (1), so that ^ = (1).
We shall now showthat ^ (1+ (^ ))= (1). Asin Section 2.2, write
^ = ^ +^
with ^ =
^
, and ^ (^ ). Then
(36)
^
1+ (^ )
^
1+ (^ ) +
^
1+ (^ )
= (1)
Now, = is assumedto b e non-singular,and
1
( ) ( ) almost surely
Thus,(3.6) impliesthat
^
(1+ (^ ))= (1). Because is in ab oundedset, also
^ (1+ (^ ))= (1). So ^ (1+ (^ ))= (1).
In view of (3.5), we now have ^ (1+ (^ )) = (1). Moreover,
~ , ~ . So by Theorem 2.1,
sup (
1+ ( ) :
1+ ( )
)
UsingTheorem2.2,assumption(A0), andthefact that ^ (1 ) ^ ,
wend
(37)
(1 ) (^ ( ) ( ))
^ (1+ (^ )) (1+ (^ ))
= ( )
Invoke this in (3.4):
( (^ ) ( ))
(^ )
( )
(38)
1
(^ ( ) ( )) 1
2
^
( ^ (1+ (^ )) (1+ (^ )) ) ( )
R 0
0
0
0 0
P
P
P
P
P
P
P k
k
k
k
k
k
k
k
k
k 0
0
0
0
0
0
0 0 0 0
0 1
2
1
2
2 1
2(2 +1)
1
2
1
2
2 1
2(2 +1)
4
2 +1
Remark 3.1.
R
4. Asymptotic normality.
0 k 0 k
k 0 k _
k 0 k
k 0 k _
k 0 k
t u
2 G
0
2
k 0 k
1 2
j j j 0 j j 0 j j 0 j
j j j 0 j j 0 j j 0 j
j
j n
n
n n
n
n
n
n n
=
n
n n
n n n n
=
n
n n
n
n
=k
n n
n
k = k
;n ;n n;n n;n ;n ;n
;n ;n ;n ;n ;n
;n ;n ;n ;n ;n
;n ;n ;n
;n
;n
k
n
k = k
;n
k = k
n ;n n n ;n
n ;n
1 2
2
0 2
2 2 2 2
0
0 1
1 2
0
2 2 2
0
0 1
1 2
0
2 2 2 2 1 1 2
1 (2 +1)
1 1 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0
0
0 0
+1
0
(2 +1)
0
2 (2 +1)
0 0
0
0 3 4
2
0 0
0 3 0 3 0 0 0
0 4 0 4 0 0 0
0 0 0 0
1
0 0 0
0 0 0
C C
g g :
J g J g
g g J g J g n O n ;
g g J g
g g J g J g n O n :
g g J g O n n :
O n
Y ;T ;:::; Y ;T Y ;T
Y T F g T g n
; ;::: W Y F g T T
C n A dP P
T
A c ;
c > n =
O n J g
g g O J g ;
J g O J g :
< ;C ;C < t ; g t
A l C ; l l C ;
A f C ; f f C :
l l g f f g
h z
E Xf T l T Z z
E f T l T Z z
; 2
^
Thus,
(^ ) ( )
+( ^ (1+ (^ )) (1+ (^ )) ) ( )
as wellas
^ ( )
+( ^ (1+ (^ )) (1+ (^ )) ) ( )
Solve these two inequalitiesto nd that
^ + (^ )= ( + + )
Because we assumed = ( ), this completes the pro of.
The situation can b e adjusted to the case of triangular arrays. Let
( ) ( ) b e indep endent copies of ( ), and supp ose that the
conditional exp ectation of given is equal to ( ( )), with , =
1 2 . Assume that (A0) holds for = ( ( )) and , with constant
not dep ending on . Assume moreover that for = , b eing the
distributionof , wehave
forall
where 0 is indep endent of . Then one nds under (A1) and (A2), for 1 =
( (1+ ( )) ),
^ = ( (1+ ( )))
and
(^ )= (1+ ( ))
We shall use the assumptions: for some constants
0 , and forall [0 1] , we have for = ( ),
( 3) ( ) and ( ) ( ) forall
and
( 4) ( ) and ( ) ( ) for all
Write = ( ) and = ( ), and take
( )=
( ( ) ( ) = )
( ( ) ( ) = )
P
0
0
p
1 P
P
P
P
P Theorem 4.1.
Proof.
n
n
=
n
=
n n
=
n
=
n
n n
i
i i i
=
n
n 0
j
0
j j
k k
p
0 k k
k 0 k
k k
1
k k
p
0
k k
k 0 k
j j
2 1
1 0
2 1
0 0
2
0
0 2 2
0 2
2
0 0 1 2
2 2
0 0 0
1 2
2 2
0
1 4
2
1
0 0 1 2
2
0 1
=1
0 2
0 0 1 2
2 2
0 Supp ose (A3) and (A4) aremet. Assumemoreover that
has density b ounded away from onits supp ort
and
Then,
h x;z x h z :
h z E X Z z ;
h x;z x h z :
Y V
Y T
Y T E W T V T ;
E W l T h T f l h :
n f l h
: g g o n ;
: J g O ;
: h > ;
: Z ;
: J h < ;
: f l h > :
n
W l T h T
f l h
o :
g g o
g O
( )= ( )
Alsodene
~
( ) = ( = )
and
~
( )=
~
( )
Theorem4.1b elowgives conditionsforasymptoticnormalityof
^
. Iftheconditional
distributionof b elongsto an exp onential family with mean and variance ( ), then
^
is asymptotically ecient. The conditionalvariance of given is inthat case
var ( )= ( )= ( ( ))
so that
( ( ) ( ))= ( )
AccordingtoTheorem4.1,theasymptoticvarianceof (
^
)isthen ( ) .
(41) ^ = ( )
(42) (^ )= (1)
(43)
~
0
(44) 0
(45) ( )
(46) ( ) 0
(
^
)=
( ) ( )
( )
+ (1)
Weshall applyTheorem 2.3, to concludefrom (4.1) that ^ = (1).
BecauseTheorem 2.3is on uniformlyb oundedclasses,werst verify that ^ = (1).
This follows by the same arguments as used in Lemma 3.1. Because (4.2) holds, also
R
X X
X
X
X 1
1 1
2
1
1
0
P P
P P
P P
P
P
P
P
P
j j j j
k k k 0 k j 0 j
k 0 k
j 0 j
j 0 j
0 2G
2
0 j
j 0 0 0
f 0 2G j 0 j g
k 0 k k 0 k
0
0 0 0
2 1
1
2 0 0
0
supp ort( )
0
0 0
2
1
2 2
=0
=0
=1
2
=1
0 2
0 2 0 0
0 0
=1
0 2
1 2
=1
0 0 0 2
=1
0 0 0 0 2
n n n
n n
n n
n
z Z
n
n
ns n
n n
n ns
n
ns s
n n n n
n ns s
n
i
i n i i
n
i
n i i n i i
n n
n
i
i i i
=
n
i
n i i i i i
n
i
n i i n i i i i i
A g O g O
h > g g o o
m m o
m z m z o :
: g g ;
g x;z g x;z sh x;z
s x m z sh z ;
s
:
d
ds
Q F g J g :
l l g f f g
d
ds
Q F g
n
W l T h T
n
T T l T h T I II:
y t l g t h t g ; g g ; J g C
g g o l l
o
: I
n
W l T h T o n :
II
n
g T g T f T l T h T
n
T T g T g T f T l T h T
, this implies ^ = (1), so ^ = (1).
Now,
~
0, so ^ = (1) implies
^
= (1). Hence, also
^ = (1). Assumption(4.4) ensuresthat
sup ^ ( ) ( ) = (1)
Therefore,we may withoutloss of generality assumethat
(47) ^
so that we can use (A3) and (A4).
Becauseof (4.5), we havethat
^ ( )= ^ ( )+ ( )
=(
^
+ ) +(^ ( ) ( ))
forall . Thus,
(48) [
( (^ )) (^ )] =0
Clearly, for
^
= (^ ),
^
= (^ ),
( (^ )) = 1
^
( ) ( ) 1
[^ ( ) ( )]
^
( ) ( )=
Use (A3) and Theorem 2.1,to nd that the class
[ ( )] ( ( )) ( ): ( )
satises (2.7)of Theorem2.4. Since,also by(A3), ^ = (1)implies
^
=
(1),we obtain
(49) =
1
( ) ( )+ ( )
Let us write
= 1
[(^ ( ) ( )) ( )] ( ) ( )
+ 1
[^ ( ) ( ) (^ ( ) ( )) ( )] ( ) ( )
R X
X X
X
X
0
0
0
0
0
0
0
0
P P P
P
P
P
P
P
P
P
P
0 0
0 0 0 0 0
0 0 0
0 0
j 0 j k 0 k k 0 k
!
0 k k
j j 0 j j
k 0 k
j j k 0 k
0 k k
j
0 k k
t u i
n i i n i i i
n n n n n
n n n
n
n
i
i i i
n
i
n i i i i i
n n n
=
n
= =
n
i
n i i i i
n
n
=
n
n
=
n
= =
n
=
n
ns s
n n
=
n
i
i i i n
= =
=1
0 0 2
0 0 0 0 2 0
0 0 1 0
0
=1
0 0
2
2
=1
0 0 0 2
0 0 0
0 0 0 2
1 2
0 0 0
1 2
2
1 2
4
=1
0 2
0 2
3 4 0
2 1 2
3 4 0
2 1 2
0 0 0
1 2
2
2 1 2
1 4
2 2
=0
2
1
1 2
=1
0 2 0 0 0
1 2
2
2 1 2
n
T T l T l T h T
III IV V:
g x;z g x;z x m z m z h x;z a z a z ;
a z a z h z m z m z
: III
n
f T l T h T
n
a Z a Z f T l T h T :
o m m o a a o
a ; E a Z f T l T h T
o n
III f l h o o n :
IV C
n
g T g T l T h T
C C g g o n ;
V C C g g o n :
: II f l h o o n :
o n
:
d
ds
J g J g J h o n :
n
W l T h T f l h o o n :
+ 1
[^ ( ) ( )](
^
( ) ( )) ( )
= + +
Observe that
^ ( ) ( ) =(
^
) + ^ ( ) ( ) =(
^
) ( )+^ ( ) ( )
where ^ ( ) ( ) =(
^
) ( )+ ^ ( ) ( ). Hence,
(410) = (
^
) 1
( ) ( ) ( )+
1
[ ^ ( ) ( )] ( ) ( ) ( )
Because
^
= (1)and ^ = (1),also ^ = (1). Moreover,for
any measurablefunction : [0 1] , ( ( ) ( ) ( ) ( )) = 0. So, according to
Theorem 2.1, combined with Theorem 2.4, the second term in (4.10) is ( ). This,
and the lawof large numb ers,yields
=(
^
)( ( ) + (1))+ ( )
Invoke (A3) and (A4) to conclude that under(4.7),
1
(^ ( ) ( )) ( ) ( )
^ = ( )
and similarly,
^ = ( )
Thus,
(411) =(
^
)( ( ) + (1))+ ( )
Finally,we note that (4.2),(4.5) and the condition = ( ) give
(412) (^ ) 2 (^ ) ( )= ( )
Combine(4.8), (4.9), (4.11) and(4.12) to obtain
0= 1
( ) ( )+(
^
)( ( ) + (1))+ ( )
Apply condition(4.6) to complete the pro of.
k
k 2 +1 P
P
P
P
k 0 k
! 1 p
0
j j
L 0
L 0
0
0
0 0
0
3 3
3
3 3
3 3
3
3 3
0 3 4
0
1 4
2
0 1
1 2
0
0
0 1
1 4 (2 +1)
1
(2 1) (4 +2)
1
1 0
1
TEP
TEP
1
TEP
TEP
1 1
TEP
0
1 1
n ;n
=
n
;n ;n
=
n ;n
;n
;n ;n
n
=
n
k = k
;n
k = k
;n
;n ;n
;n n n
n
i i n i
n
i
i
n i i
i
n n
n
n n
n
n n
n n
5. Estimating the distribution of the parametric component using Wild
Bootstrap.
C C n
n
J g o n ;
: J g J h o n :
g g o
J g O n :
J g J h
O n n
J h o n J h
h g
J h n
Y
W Y T
n " ;:::;"
C " C i ;:::;n:
Y T W " i ;:::;n
Y ;T ;:::; Y ;T
Y ;T ;:::; Y ;T
3.1. Let us supp ose the assumptionsgiven there are met, and that in addition (A3) and
(A4) hold, with constants , and not dep ending on . Supp ose that also (4.3),
(4.4) and (4.6) hold uniformlyin . Replace (4.1) and (4.2)by the condition
(1+ ( ))= ( )
and replace (4.5) by
(413) (1+ ( )) ( )= ( )
Then the conclusion of Theorem 4.1 is valid, provided that we can apply Theorem 2.3 to
conclude that ^ = (1). For this purp ose, we assumein addition to the ab ove
that
(1+ ( ))= ( )
For b ounded ( ), condition (4.13) holds if ( ) is b ounded. This follows from
ourassumption = ( ). Foroptimalchoices ,for(4.13)itsuces
that ( ) = ( ), i.e. ( ) may converge to innity. This means that
weaker conditions on the smo othness of are needed than on . Furthermore, if
( ) , -consistencyof
^
can always b e guaranteed by cho osing small (i.e.
undersmo othing).
Inference on the parametric comp onent of the mo del could b e based on
our asymptotic result in Theorem 4.1. There it is stated that the distribution of
^
is
not aected by the nonparametric nature of the other comp onent of the mo del, at least
asymptotically. This statement may b e misleading for small sample sizes. An approach
which reects more carefully the inuence of the nonparametriccomp onent is b o otstrap.
We discusshere three versionsof b o otstrap. The rst version is Wild Bo otstrap which is
relatedto prop osalsof Wu(1986) (seealso Beran(1986)and Mammen(1992))and which
wasrst prop osedbyHardleand Mammen(1993)in nonparametricset ups. Note thatin
our mo delthe conditional distributionof is not sp eciedb esides (1.1) and (A0).
The WildBo otstrap pro cedureworks asfollows.
S 1. Calculate residuals
^
= ^ ( ).
S 2. Generate i.i.d. random variables with mean 0, variance 1 and
whichfulll fora constant that (a.s.) for =1
S 3. Put = ^ ( )+
^
for =1 .
S 4. Use the (pseudo) sample (( ) ( )) for the calculation of the
parametric estimate
^
.
S 5. The distribution of
^
is estimated by the (conditional)distribution
(given ( ) ( )), of
^ ^
.
P
1 3
3 0
P
P
P i
i;
j
j j
L L !
k 0 k
0
0
k k
j
Theorem 5.1.
Proof.
0 0
[ ]
1 1
0
1 2
=1
1 2
1 2
0
[ ]
1
3
3 3
3
j j
3
3
3
3 3
3 3 3
3 3
2A
3
0
0
3
0
j j 0
i
n i
n i
i i
i
" =C
i i
n n n n
K n
n
K
n
n n n
n
i i
n i
i
i
i
i n i
i
i; i;
a
n
i i;j i
=
n
=
i
W =C
n
Y T t V t
Y T
V T " i ;:::;n "
" C
E e C i ;:::;n
Q y
Y T
n Y ;:::;Y T ;:::; T
d ;
d
: g g O
: J g O :
W Y T
W W " T T "
W W :
j
j
:
=n W a T
a
O n :
j "
i ;:::;n C
E e T ;:::;T C ;
Assume that conditions(A0)-(A4) are met. Incase of application of
the secondor third versionof b o otstrap assumethat thejustmentionedadditional mo del
assumptionshold. Then
in probability. Here denotes the Kolmogorov distance(i.e. the sup norm of the corre-
sp ondingdistributionfunctions).
var ( = )= ( ( ))
we prop ose the following mo dication of the resampling. In Step 3 put = ^ ( )+
(^ ( )) for = 1 . In this case the condition that is b ounded can b e
weakenedtotheassumptionthat has sub exp onentialtails,i.e. fora constant itholds
that ( ) for =1 (compare (A0)).
Inthesp ecialsituationthat ( ; )isthelog-likeliho o d(asemiparametricgeneralized
linearmo del), theconditionaldistributionof issp eciedby ( ). Thenwerecommend
to generate indep endent with distributions dened by ^ ( ) ^ ( ),
resp ectively. This is aversion of parametric b o otstrap. Thefollowingtheoremstates that
these three b o otstrap pro cedureswork (for their corresp ondingmo dels).
( ) 0
Wewillgiveonlyasketchofthepro offortherstversionofresampling(Wild
Bo otstrap).The pro of forthe other versionsis more simple and followssimilarly.
We have to go again through the pro ofs of Lemma 3.1 and Theorem 4.1. We start
with proving
(51) ^ ^ = ( )
and
(52) (^ )= (1)
We write rst for = ^ ( )
= +( ( ) ^ ( ))
= +
Inthe pro ofof Lemma 3.1the main ingredientfrom empiricalpro cesstheory wasformula
(2.4) (see (3.7)). We argue now that the following analogue formulas hold for =1 and
=2
(53) sup
(1 ) ( )
= ( )
For = 1 equation (5.3) follows from the fact that b ecause of the b oundedness of
for =1 , wehave that thereexists a constant with
( )
2 i;
P
P P
P
3 00
00
00
j j
3
p
3
3 3
3
0 P
P
6. Examples.
Example 1.
fj 0 j
g
j
2 0
j j j 0
2
p
0
k k
L !
k k
! k k
t u
j
0
0
[ ( )]
1
1
=1
0 2
0 0 1 2
2 2
2
2
2 1
=1 2
0 2
2 2
0 0 1 2
2 4
2
0 0 1 2
2 4
0 2
0 2
2 2
0 0
0
n i n i
W = CC
n
n
n
n n n n n
n
n n
n n n
n
n n
n n
i i
i i
=
n;s
n;s n
K
n n
n n
n
i i
i i
=
n
=
i
i i
n n n
j C A T T
C i ;:::;n
E e T ;:::;T e;
A
U
V U U O c c < <
B
V B > ;
U Cc V v >
v B
: n
W l T h T
f l h
o :
g g g sh
d N ;n ; ;
W l T h T
f l h
:
f l h E W l T h T
Y X m Z W;
E W X ;T g x;z x m z
W
For =2wehaveforevery constant that onthe event = ( ) ^ ( )
: =1 the followingholds
( )
almost surely. Becausethe probability of tends to one,we arriveat (5.3).
We would like to make here the following remark for two random variables and
. If fulls = ( ) fora sequence then this implies that forevery 0 1
there existsa set and aconstant C with
( ) 1
( = ) 1
for . This remark may help to understandwhy we can continue as in the pro of of
Lemma 3.1 to show(5.1) and (5.2).
The next step is to showthat
(54) (
^ ^
)=
( ) ( )
( )
+ (1)
For seeing (5.4) we pro ceed similarly as in the pro of of Theorem 4.1. In particular, we
replace ^ by ^ =^ + .
Now oneapplies (5.4)for the pro of of
( (0 ^ ) ) 0
(in probability),where
^ =
( ) ( )
( )
Because of ^ ( ) ( ( ) ( ) ) (in probability) we get the
statementof the theorem.
Recall that in this case,
= + ( )+
where ( ) = 0, and that ^ ( ) =
^
+ ^ ( ) is the p enalized least squares
estimator. Invan de Geer(1990), Lemma3.1 has b een proved underthe condition(AA0)
that the error in the regressionmo del is sub-Gaussian,using the same approach as in
the pro of of Lemma 3.1. Condition (AA0) can b e relaxed to (A0), as a consequence of
Theorem 2.2. This is in accordance with earlier results on rates of convergence (see e.g.
Rice and Rosenblatt (1981)and Silverman (1985).
0 0
0
0 n
t y
j 0 j
0 2
2
2
j
2
2
j 0 j
1 1 2 2
0
0 0 0 0
2
0 0
()
0 0 0 0 0
2
2
1 1 2 2
0 0 0 0 0
Example 2.
R
R
Example 3.
R
R
7. Rates of convergence for Example 2.
h h h h W
P Y X ;Z P Y X ;Z F X m Z ;
V s s s s ;
F ; :
f V F ; ;
l
Y T t
p yt t ; y> ;
t = t x;z F x m z
F ; :
V s =s s >
f ;
V F ;
l ;
fl
h h h h
P Y X ;Z P Y X ;Z F X m Z F g T ;
furtherthat =
~
and =
~
. If isnormallydistributed,thenaccordingtoTheorem
4.1, the partial smo othing spline estimator
^
is an asymptotically ecient estimator of
.
Inthis case, we have
( =1 )=1 ( =0 )= ( + ( ))
and ( ) = (1 ), (0 1). Let us consider the common choice
( )= e
1+e
Then
( )= e
(1+e )
= ( ( ))
so that 1. We cannot use Lemma 3.1, b ecause (A1) is not satised. Therefore, we
present a separate pro of of (3.1) and(3.2) in the next section. Since conditions (A3) and
(A4) are met, Theorem 4.1 can then b e applied.
Let us assumethat the conditionaldensity of given = is
( )= ( )e 0
with ( )=1 ( ), ( )= ( + ( )), and with
( )=e
Take ( )=1 , 0. Then
( )=e
( ( ))=e
and
( )=e
. Observe that(A0) is met. Again, we cannot applyLemma 3.1, b ecause (A1) and
(A2) only hold on a b ounded set. So if we show by separate means that the parameters
are in a b oundedset, then the result of Lemma 3.1 follows immediately. Conditions (A3)
and (A4) hold, so asymptotic normalitywould also b e implied by this. Note that 1,
so as in Example1, =
~
and =
~
.
Consider the mo del
( =1 )= 1 ( =0 )= ( + ( ))= ( ( ))
0
0
0
0
0 1
0
1
1
1 n
>
=k
n
k
k
k
k
n
w
>
=k
n 0
5
5
0 1
2
+1
2 2
+1
+1
+1
2
2
0 1
2
R
R
R
Lemma 7.1.
Proof.
R
R
R
R
Undercondition (A5), we have
2G f 2 1g
! 0 2
1
j j 2
f 2Gg k1k 1
2G
2 j j
f 2 g
ff g 2 g
f 2 g k1k
f gf j j g
f g j1j 1
j1j k 1 k
g g x;z x m z ; ; J m < ;
F ; V s s s s ;
<C <
A f C ; :
H ;
F g
J g
g ; < :
g
g g ;
g J g J g g
F g :
F
s;t s F t g t
: N ; F g ; A ; > ;
A w F k g n
g g
g
J g
;
s s
g g h h ; J h ;
: H ; g g ; < :
n
= ( )= + ( ) ( )
and : (0 1) given. Furthermore, take ( ) = (1 ), (0 1). Assume thatfor
some 0 ,
( 5) ( ) for all
supsup (
( )
1+ ( )
: )
We canwrite for ,
= +
with ,and ( )= ( )(see Section2.2). Now,let ~b e axed function
and consider the class
( +~):
Since is of b ounded variation, the collectionof graphs
( ): 0 ( ( )+~( )) :
is a Vapnik-Chervonenkisclass, i.e. it forms a p olynomial class of sets (see Pollard (1984,
Chapter I I) fordenitions). Therefore (Pollard (1984, Lemma I I.25)),
(71) ( ( +~): ) forall 0
where the constants and dep endon and , but not on ~ and . (Here, we use the
fact that the class is uniformly b oundedby 1.)
Denefor = + ,
( )=[
1
(1+ ( )) ]
where [ ] denotes the integerpart of 0. Then
( ) : 1 ( ) 1
so by Theorem 2.1,
(72) sup ( ( ) )
Of course, if we replace here the -norm by the -norm, the result remains true
and holds uniformlyin .
X X P
P
0 0
0
0
0 0 0 0
Lemma 7.2.
Proof.
2
2
2
2
(
5
5
6
6
0
6
2
0
0
0
=1
0
=1
0
0
2 2 2
0 Supp ose(A5) and (A6) hold true. Then
and
2G
k 0 k
k 0 k
k 0 k
k 0 k j 0 j k 0 k
j 0 j j 0 j tu
1
0 2
k 0 k
2G
0 0
0
0
0 0
j j
j n
j
j
j
j
j
j n
j
j
j
j n
j
j
j n
j
j
j
j
j n
n n n
n
n
n n n
n
i i
n i
i
n
i
i
n i
i
n n n
n
n
g g g
g h
g g h ;
F
h
F
h
:
F g
J g
F
h
F g F
h
J g
g F
h
F
h
C ;
F F C
<C <
A
C
F g t
C
; t ; :
: F g F g O ;
: J g O :
F g F g F g = ; g :
g
Q F g Q F g
n Y
F g T
F g T n
Y
F g T
F g T
: Q F g Q F g J g J g :
Together,(7.1)and(7.2)give therequiredresult. To seethis,let , = + ,
and let = ( ). Supp osethat is such that
( )
and that is such that
( + ) ( + )
Then
( + )
1+ ( )
( + )
( + ) ( + ) +
1
1+ ( )
)+ ( + ) ( +
+ +
since ( ) (
~
)
~
, by condition (A5).
The entropy result of Lemma 7.1 can b e applied to establish a rate of convergence
in the same way as in Lemma 3.1. For this purp ose, we need the assumption: for some
constant0 ,
( 6)
1
( ( )) 1 1
for all [0 1]
(73) (^ ) ( ) = ( )
(74) (^ )= (1)
Dene
( )=( ( )+ ( )) 2
By the concavity of the log-function,and the denition of ^ ,
(
(^ ))
( ( ))= 1
log(
(^ ( ))
( ( )) )+
1
(1 )log ( 1
(^ ( ))
1 ( ( )) )
(75)
1
2
( (^ )) 1
2
( ( )) 1
2
( (^ ) ( ))
0
0
P
P
P
P X
s
X
s
X
p
q
p
X
p
q
p
q
p
q
p
p p
p
p
p
p
P
p
p p
p
p
p p
q
p
q
p 0
=1
0
=1
0
=1
0
0
=1
0
0
0 2
0 2
0 0
6
0
0
1
=1
0 0
0
1 1 (2 )
1 (2 )
1 2
0
0
0
n n n
n
i i
n i
i
n
i
i
n i
i
n
i
i
i
n i i
n
i
i
i
n i i
n
n
n
n
n n
i
i n i i i
n
= k
n
n
= k
=
n
n
n n n
n n n
0
0 0 0
0
0
0
0
0
0 0 0
0k 0 k 0k 0 0 0 k
2G
j 0 j
j 0 j
p
p
j 0 j
f
0
2Gg
0
k 0 k
0 0
k 0 k
k 0 0 0 k
t u
s s s
Q F g Q F g
n Y
F g T
F g T n
Y
F g T
F g T
:
n
W
F g T
F g T F g T
n
W
F G T
F g T F g T
F g F g F g F g :
W g;g
F g F g
F g
F g F g
F g
C
F g F g ;
=k
F g F g
F g J g
g :
W F g T F g T = F g T
F g F g J g
O n :
F g F g
J g O
: F g F g O ;
: F g F g O :
On the other hand,since log ( )=2log ( ) 2( 1),
(
(^ ))
( ( )) 2
(
(^ ( )
( ( ))
1)+ 2
(1 )(
1
(^ ( ))
1 ( ( )) 1)
(76) =
2
( ( )) (
(^ ( )) ( ( )))
+ 2
1 ( ( ))
( 1
(^ ( )) 1 ( ( )) )
(^ ) ( ) 1
(^ ) 1 ( )
Thecombinationof (7.5) and (7.6)gives an inequalityof thesame formas inequality
(3.8) in the pro of of Lemma 3.1. Moreover, we can invoke Lemma 7.1 in Theorem 2.2.
First of all, condition (AA0) holdsfor . Furthermore,for each ~ wehave
( )
(~)
( )
( ) (~)
2 2 ( ) 2 2
( ) (~)
by (A6). So the entropy condition (2.3) with =1 holdsfor the class
( ) ( )
( ) (1+ ( )) :
Thus,
(
(^ ( )) ( ( )) ) ( ( ))
(^ ) ( ) (1+ (^ ))
= ( )
Similarresultscanb e derivedfor( 1
(^ ) ( ) ). So, pro ceedingas in thepro of
of Lemma 3.1,we nd (^ )= (1), and
(77)
(^ ) ( ) = ( )
as wellas
(78) 1
(^ ) 1 ( ) = ( )
Clearly, (7.7) and (7.8)yield (7.3).
0
P
P
P
P
k 0 k
1
n n
g g >
n n n
J g C
n
n
n
n
p
1
2
j j j 0 j
k 0 k
k 0 k
1
jk 0 k 0k 0 kj
k 0 k
k 0 k
j 0 j tu
Lemma 7.3.
Proof.
8. References.
14
140
73
16
90
12
< ;C <
t ; g t
A f
C
:
: F g F g > > :
g g O :
<C <
F g F g F g F g o ; :
F g F g o
: g g o :
g g o
W
0
0 7
2
0 0
7
0 0
0
0
( )
0 0
0
0
0
ERAN
IRG
E ASSART
IRMAN OLOMJAK
HEN
UDLEY
AN ECKMAN AND
IN
E INN
Supp osethat
Then, under conditions(A5),(A6),( A7),(4.3) and (4.4), wehave
Ann.Statist.
Techn. Rep ort
Mat. Sb ornik
Ann.Statist.
Ecole d'Etede Probabilitesde
St. Flour,1882, Lecture Notes in Math.
J. Amer. Statist. Asso c.
Ann.Probab.
need an identiability condition. Assume that forsome constants0 andfor
all [0 1] , we have for = ( ),
( 7) ( )
1
forall
(79) inf ( ) ( ) 0for all 0
^ = ( )
Due to Lemma 7.1 and a result of e.g. Pollard (1984, Theorem I I.24) on
uniformlaws of large numb ers,we have forall 0 ,
sup ( ) ( ) ( ) ( ) = (1) almost surely
So (^ ) ( ) = (1). By (7.9), this implies
(710) ^ = (1)
As in the pro of of Theorem 4.1, we see that (4.3) and (4.4), together with (7.10), yield
^ = (1). Application of (A7) and Lemma 7.2 completes the pro of.
[1] B , R. (1986). Comment on "Jackknife, b o otstrap,and other resampling metho ds
in regressionanalysis"by C. F. J. Wu. , 1295-1298.
[2] B , L. and M , P. (1991). Rates of convergence for minimum contrast esti-
mators. ,UniversiteParis 6.
[3] B , M.
S. and S , M.J. (1967). Piece-wise p olynomial approximationsof
functionsof the classes . , 295-317.
[4]C ,H.(1988). Convergenceratesforparametriccomp onentsinapartlylinearmo del.
, 136-146.
[5]D ,R.M.(1984). A courseon empiricalpro cesses.
1-122. Springer,Berlin.
[6] F , J. , H , N. E. and W , M. P. (1995). Lo cal p olynomial regressionfor
generalized linear mo dels and quasi-likeliho o d functions. ,
141-150.
[7] G , E. and Z , J. (1984). On the central limit theorem for empirical pro cesses.
, 929-989.
L 32
55
21
77
15
33
4
89
47
50
45
18
61
14 REEN
ARDLE AMMEN
AMMEN
C ULLAGH ELDER
DEN EDDY
SSIANDER
2
OLLARD
OLLARD
ICE OSENBLATT
ICE
EVERINI TANISWALIS
ILVERMAN
PECKMAN
VANDE EER
VANDE EER
VAN DE EER
AHBA
EDDERBURN
U
Pro ceedings2nd internationalGLIM conference. LectureNotes in Statistics
Int. Statist. Rev.
Ann.Statist.
Lecture Notes inStatist.
Generalized Linear Mo dels.
An Intro duction tothe Mathematical Theoryof
Finite Elements.
Ann.Probab.
Convergenceof Sto chastic Pro cesses.
EmpiricalPro cesses: Theory and Applications.
J. Approx. Theory
Statist. Probab.
Letters
J. Amer. Statist. Ass.
J. Roy. Statist. So c. Ser. B
J. Roy.Statist.So c.
Ser. B
RegressionAnalysisandEmpiricalPro cesses.
Ann. Statist.
Techn.
Rep ortTW
Statistical Analysis of Time Series
Biometrika
Ann.Statist.
,Springer
Verlag, New York,44-55.
[9]G ,P.J.(1987). Penalizedlikeliho o dforgeneralsemi-parametricregressionmo dels.
, 245-60.
[10] H , W. and M , E. (1993). Comparing nonparametric versusparametric
regressionts. , 1926-1947.
[11]M ,E. (1992). When do esb o otstrapwork: asymptotic resultsand simulations.
, Springer,Berlin.
[12] M C , P. and N , J.A. (1989). (2nd edit.)
Chapman and Hall, London.
[13]O , J.T. andR ,J.N. (1976).
Wiley, New York.
[14] O , M.(1987). A central limit theorem under metric entropy with brack-
eting. , 897-919.
[15] P , D.(1984). Springer,New York.
[16] P , D. (1990). NSF-CBMSRe-
gional Conference Series in Probability and Statistics 2.
[17] R , J. and R , M. (1981). Integrated mean square error of a smo othing
spline. , 353-369.
[18] R , J. (1986). Convergence rates for partially splined mo dels.
,203-208.
[19] S , T.A. and S , J.G. (1994). Quasi-likeliho o d estimation in semi-
parametric mo dels. , 501-511.
[20]S ,B.W. (1985). Some asp ectsof thesplinesmo othingapproachtononpara-
metric regressioncurve tting(with discussion). ,1-52.
[21]S , P.(1988). Kernelsmo othinginpartiallinearmo dels.
,413-436.
[22] G ,S.(1988). CWItract ,
Centrefor Mathematicsand ComputerScience, Amsterdam.
[23] G ,S.(1990). Estimating aregressionfunction. ,907-924.
[24] G , S. (1995). A maximal inequaltiy for the empirical pro cess.
95-05, University of Leiden.
[25] W , G. (1984). Partial spline mo delsfor the semi-parametricestimation of func-
tions of several variables. In 319-329. Institute of
Statistical Mathematics,Tokyo.
[26]W ,R.W.M.(1974). Quasi-likelihoo dfunctions,generalizedlinearmo dels,
and the Gauss- Newton metho d. , 439-447.
[27]W ,CF.J.(1986). Jackknife,b o otstrap,andother resamplingmetho dsinregression
analysis. ,1261-1350.