Parametric versus Nonparametric Goodness of Fit
Another View
H.Lauter and M.Nikulin
January 26, 1999
Department of Statistics, Institute of Mathematics, University of Potsdam, Germany Mathematical Statistics, UFR MI2S, University Bordeaux 2, France
& Steklov Mathematical Institute, Saint Petersbourg, Russia
Abstract
We consider chi-squared type tests for testing the hypothesis H0 that a density f of observations X1 ::: Xnlies in a parametric class of densities F. We consider a version of chi-squared type test using kernel estimates for the density. The main result is, following Liero, Lauter and Konakov (1998), the derivation of the asymptotic behavior of the power of the test under Pitman and "sharp peak" type alternatives. The connection of the rate of convergence of these local alternatives, the bandwidth of the kernel estimator, the parametric estimator, the power of the test are studied.
Keywords andPhrases: Chi-squared test, goodness-of-t test, density, kernel estima- tors, maximum likelihood estimator, local alternative, Pitman alternative, sharp peak alternative, asymptotic power.
AMS 1991SubjectClassication: Primary 62G10, 62G20.
The research for this paper was carried out within Sonderforschungsbereich 373 and was printed using funds made available by the Deutsche Forschungsgemeinschaft.
1
1. Introduction
Let X1 X2 Xn be a sample of i.i.d. random variables with density f.
We wish to test whether f belongs to some parametric family F of density functions
F = ff : f( ) = f( ) 2
R
kg (1) against the nonparametric alternativef =2F:
We consider the kernel estimator ^fn of the density functionf,
^fn(t) = 1n Xi=1n
h1nK t;Xi
hn
(2) where the kernel function K( ) is bounded, of bounded variation, has a bounded support and Z
K(x)dx = 1:
The random function ^fn(t) may be represented in the form
^fn(t) = f(n)(t ) + 1pn
Z 1
hnK t;u hn
dWn(u) (3) where we denote for a function ' (may be vector function)
'(n)(t) = Z 1
hnK t;u hn
'(u) du (4)
Wn(t) =pn(Fn(t);F(t )):
Here F is the distribution function of the random variables Xi and Fn is its empirical version:
Fn(t) = 1n Xj=1n
1
I];1t](Xj):We suppose also that
hn!0 and nhn!1 when n !1: 2
It is well-known that in this case
E
^fn(t)!f(t )i.e. ^fn(t) is asympotically unbiased for f(t ) at any point of continuity of f, and since
nhn
Var
^fn(t) =Var
1hnK t;Xn1
hn
= h1n
Z
1
0
K2 t;x hn
f(x )dx; 1 hn
Z
1
0
K t;x hn
f(x)dx
2
we have
nlim!1nhn
Var
^fn(t) = f(t )Z 10
K2(x)dx = kf(t )
i.e. ^fn(t) is mean squared consistent for f(t ). More about the properties of ^fn and its applications see, for example, in Silverman (1986), Bickel &
Rosenblatt (1973), Hardle & Marron (1990), Hardle & Mammen (1993).
For a function'(x ) and a random point = ^n( where ^nis the maximum likelihood estimator for ) we introduce the notation
'(x) = '(x ) r'(x ) = @@1'(x ) @
@k '(x )
T
r
'(x) = @@1'(x ) @
@k '(x )
T
:
Further for simplicity of notation we will writef instead of f( ) and f in- stead off( ^n). For a nonnegative function we denote by L2 theL2;space generated by the measure with density. Let
I
() be the information matrix,I
() = jjIi jjji j=1k =Z @
@if(x ) @@jf(x )f;1(x ) dx:
We denote by j(x ) for j = 1 k the coordinates of the vector- function = (1 k)T,
(x ) =
I
;1=2()f;1(x )rf(x ):3
So we have
(i( ) j( ))f;1 = i j
where
( )f;1 = ( )1f
is the inner product in the space L2f;1. The same relation is true for the coordinates j(x)(j = 1 k) of the vector-function =(x ^n): For a function h2L2f;1 we put
h]L (x) = Xk
j=1
;h( ) j( )f;1 j(x) h]L = h ; h]L: (5) It is clear that h]Lis the projection ofh in the space L2f;1
on the linear space L = Ln spanned by a set fj(x) j = 1 kg.
As test statistic we suggest a function, which is measuring the normalized deviation of the kernel estimator ^fn(x) from a modied parametric estimator f(n)(x ^n). We will consider these estimators as elements of L2. However we measure the distance between functions from L2 with the help of two seminormsk k1 and k k2:
khk1 = kh]Lkf;1 khk2 = kh]Lk: The process
(n)(t) = pnh^fn(t) ; f(n)(t ^n)i (6) further will be called the observable empirical process. We dene
T
(1n) = k (n)( )Lk2f;1 andT
(2n) = k (n)( )Lk2: (7) For nonnegative functions and h we put(h) =
k
Z h(x)(x) dx 2(h) =k
Z h2(x)2(x) dx (8) wherek
=Z
K2(x) dx
k
=Z
(KK)2(x) dx:
4
As test statistic we suggest the statistic
T
(n)T
(n) =T
(1n) +8
<
:
h;1n =2
hhn
T
(2n);(f)i (f)9
=
2
: (9)
Now we introduce the assumption under which we plan to investigate the asymptotic behavior of the statistic
T
(n).A1.
For the kernel holds K 2L1 and R K(x) dx = 1:A2.
The least even decreasing majorantK(x) of jK(x)jbelongs to the L1- space.A3.
The density function f(x ) as function: ! L2f;1, is continuously dierentiable on the open kernel of . That is there exists such vector functionrf = @@1f @
@kf
that f( ) ; f( 0) = hrf ( 0) ;0i +
r
0( ) where h i is the inner product inR
k,k kf;1 !0 when !0:
A4.
The Fisher information matrixI
() is positive denite.2. The kernel smoothing
Assuming A1 and letK(x) be the least even decreasing majorant ofjK(x)j, K(x) = sup
jtjx jK(t)j x0:
We consider the operators
Ahf(x) = Z 1
hK x;t h
f(t) dt = fh(x) h > 0 Mf](x) = suph>
0 Z 1
hK x;t h
f(t) dt:
5
Suppose that the nonnegative function !(t) satises the condition supI 1
jIj
Z
I !(x) dx 1
jIj
Z
I
!(x) dx <1 1 (10) where I is an interval,jIjis the length of I.
Theorem 2.1
(see J.B.Garnett (1981)). If K 2L1 and weight function ! satises the condition (10), then1. M is a bounded operator on L2!
2. Ah are uniformly bounded (for h > 0) operators on L2!
3. if f 2L2!, then fh !f in the metric of the space L2!, when h!0.
3. Maximum likelihood estimator
Consider a sample
X
= (X1 X2 Xn) of i.i.d. random variables Xi with density function f( ) = f( ) 2 IRm: The maximum likelihood esti- mator ^n is a measurable solution of the likelihood equationrLn(
X
) = Xnj=1
r logf (Xj ) =
0
: (11) It is well-known (see, for example, Witting & Nolle (1970), Konakov (1978), Greenwood & Nikulin (1996)), that under H0 and some smoothness condi- tions on function f(x ) we havepn^n; = 1pn
n
X
j=1 I;1()rlogf (Xj ) +
r
n (12) wherer
is a random vector such thatr
n !P0
(we shall writer
n = oP(1
I)).Thus,
pn^n; = I;1() Z rlogf (x ) dWn(x) + oP(
1
I): (13) It is clear that under smoothness conditions on the functions f(x )Z
jrlogf]h(x ) ; rlogf (x )j2f(x ) dx !0 6
when h!0. Therefore we deduce from (13) that
pn^n; = I;1() Z rlogf]Kn (x ) dWn(x) + oP(
1
I): (14) A similar representation holds under the set Kn of nonparametric local al- ternativesKn = ff : f( ) = f( n) +Nn ;c bn
n = 1 2 g (15) where
n= + n;
is a given vector, > 0, is a given function, Nn bn are sequences of positive numbers tending to 0, c is a constant (see e.g. Liero et al. (1998)).
4. The modied empirical process
We consider the smoothed empirical process (n)(t) = Z 1
hnK t;x hn
dWn(x) (16)
where
Wn(x) =pn(Fn(x);F(x ))
and investigate the limiting behavior of theL2;norm of the projectionL(n)(t) of the process (n)(t) on some nite-dimensional subspace L. For a nonneg- ative function we put, using (8),
kk2 =
Z
j(x)j2(x) dx
(f) =
k
Z f(x)(x) dx 2(f) =k
Z f2(x)2(x) dx:It is well-known that under some appropriate conditions (see Bickel and Rosenblatt (1973) and Hall (1984))
P
(h;1n =2hnk(n)k2 ;(f)
(f) x
)
! (x): (17)
7
Let be a function such that
2L2 kk = 1
and (n) be the projection of (n) on the one-dimensional subspace generated by ,
(n)(t) =
Z
(n)(x)(x) dx (t):
So (n)=a, where the random coecient a is dened by a = Z (n)(x)(x) dx = Z (n)(x) dWn(x)
(n)(x) = Z 1
hnK x;t hn
(t) dt:
Since
E
a = 0 andVar
(a) Z Kn2(x) dx we obtain from Theorem 2.1 the following proposition.Proposition 4.1.
Suppose that weight function satises the condition (10) and for some C = C() <1f(x ) < C(x) for all x: (18)
Then there exists suchC1, which depends only onC, the weight and kernel K (and does not depend on ), that
E
k(n)k2 < C1: (19) It is easily deduced from Proposition 4.1 thatProposition 4.2.
Suppose that the weight function satises the condition (10) and L is a nite dimensional subspace of the space L2. Then under condition (18) such C2 exists, which depends only on C, weight , kernel K and dimL, thatE
kL(n)k2 < C2 (20) where (Ln) is the projection of (n) on L.Now we denote by Ln a nite dimensional subspace, dimLn = m. Let L?n
8
denote the orthogonal complement of Ln in the space L2 and (Ln?n) be the projection of (n) on L?n.
Theorem 4.1.
Suppose that the weight function satises the condition (10),Lnis a subspace of the spaceL2, dimLnis xed,hn!O, when n!1. Then under condition (18)P
8
<
:
h;1n =2
hhnkLK?nn k
2 ;(f)i
(f) x
9
=
! (x): (21)
Proof. Since
kL(n?n) k
2 = k(n)k2 ; kL(nn)k2
and from Proposition 2 we can conclude that
supn
P
nk(Lnn)k2 > yo !0 when y!1 therefore from (17) we obtain (21).Now we consider a case, when
L = Ln = spanfj(x) j = 1 kg: It must be remind that (see(7))
T
(2n) = k n(n)( )Lk2: From the Theorem 4.1. we obtain the nextTheorem 4.2.
Suppose that weight function satises the condition (10), the density function f(x ) and kernel K(x) satises the conditionsA1
{A4
. Then under (18)P
8
<
:
h;1n =2hhnk
T
(2n);(f)i(f) x
9
=
! (x): (22)
9
5. The observable empirical process
Really we deal with the observable empirical process
(n)(t) = pnh^fn(t) ; f(n)(t ^n)i
= pnh^fn(t) ; f(n)(t )i + pnhf(n)(t ) ; f(n)(t ^n)i: (23) It is clear (see (12), (13), (14)) that under smoothness conditions on the function f(x )
(n)(t) = (n) + (n)(t) +
R
n(t ) (24) where the processR
n(t ) weakly converge to zero, when n!1, and(n)(t) = rf(n)(t )T I;1()
Z
rlogf](n)(x ) dWn(x): (25) Let PLn be the orthoprojector in the space L2f;1 on the subspace
Ln = spanf @
@jf(n)(x ) j = 1 kg: We introduce the matrix Rn() =jjri jjji j=1k, where
ri j =
Z @
@if(n)(x ) @@jf(n)(x )f;1(x ) dx i j = 1 k:
and denote by R;1=2() its square root. Let
'(n)(x ) = '(1n)(x ) '(kn)(x )= R;1=2()rf(n)(x ):
It is obvious that
'(in)( ) '(jn)( )f;1 = i j
where ( )f;1 is the inner product in the spaceL2f;1. So the systemf'(in)( ) i = 1 kg forms the orthonormal basis (in the metric of the space L2f;1) of the subspace Ln. Therefore
PLnh](t) = Xk
i=1
Z
h(x)'(in)(x )f;1(x ) dx '(in)(t ) h2L2f;1: 10
Now we denote by Ln the subspace Ln = spanf @
@jf(n)(x ^n) j = 1 kg and put
PLnh(t) = Xk
i=1
Z
h(x)'(in)(x ^n)f;1 dx '(in)(t ^n) h2L2f;1(x^n): It seems to be reasonable to expect that operators PLn and PLn weakly converge (on an appropriate sense) when n ! 1 to the orthoprojector PL
in the space L2f;1 on the subspace L, L = spanf @
@jf (x ) j = 1 kg:
For our needs it is suciently to prove that the asymptotic distribution of the processPLn (n)(t) is the same as the asymptotic distribution of the process
PL (n)(t). It may be prooved by the usual weak convergence technique.
One can verify that the limiting distribution of the statistic
T
(1n) is the 2k- distribution.REFERENCES
1. Aguirre, N., Nikulin, M. (1994). Chi-squared goodness-of-t test for the family of logistic distributions, Kybernetika,
30
, 214-222.2. Bickel, P., Rosenblatt, M. (1973). On some global measures of the derivations of density function estimates. Ann. Statist.,
1
, 1071- 1095.3. Garnett, J.B. (1981). Bounded analytic functions. Academic Press, N.Y.
4. Ghost, B., Wei-Min Huang (1991). The power and optimal kernel of the Bickel- Rosenblatt test for goodness-of-t. Ann. Statist.,
19
, 999-1009.5. Hardle, W., Mammen, E. (1993). Comparing nonparametric versus parametric regression ts. Ann. Statist.,
21
, 1926-1947.11
6. Hardle, W., Marron, J.S. (1990). Semiparametric comparison of re- gression curves. Ann. Statist.,
18
, 63-89.7. Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivar. Anal.,
14
, 1-16.8. Konakov, V. (1978). Complete asymptotic expansions for the maxi- mum deviation of the empirical density function. Theory Probability Appl.,
28
, 495-509.9. Liero, H., Lauter, H., Konakov, V. (1998). Nonparametric versus para- metric goodness-of-t. Statistics,
31
, 115-149.10. Greenwood, P.E., Nikulin, M. (1996). A Guide to Chi-squared Testing.
J. Wiley & Sons, N.Y.
11. Rosenblatt, M. (1975). A quadratic measure of deviation of two - dimensionaldensity estimatesand a test of independence. Ann. Statist.,
3
, 1- 14.12. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
13. Witting, H., Nolle, G. (1970). Angewandte Mathematische Statistik, Teubner Stuttgart.
12