Parametric versus Nonparametric Goodness of Fit

(1)

Parametric versus Nonparametric Goodness of Fit

Another View

H.Lauter and M.Nikulin

January 26, 1999

Department of Statistics, Institute of Mathematics, University of Potsdam, Germany Mathematical Statistics, UFR MI2S, University Bordeaux 2, France

& Steklov Mathematical Institute, Saint Petersbourg, Russia

Abstract

We consider chi-squared type tests for testing the hypothesis ^H⁰ that a density ^f of observations ^X¹ ^::: ^Xⁿlies in a parametric class of densities ^F. We consider a version of chi-squared type test using kernel estimates for the density. The main result is, following Liero, Lauter and Konakov (1998), the derivation of the asymptotic behavior of the power of the test under Pitman and "sharp peak" type alternatives. The connection of the rate of convergence of these local alternatives, the bandwidth of the kernel estimator, the parametric estimator, the power of the test are studied.

Keywords andPhrases: Chi-squared test, goodness-of-t test, density, kernel estimators, maximum likelihood estimator, local alternative, Pitman alternative, sharp peak alternative, asymptotic power.

AMS 1991SubjectClassication: Primary 62G10, 62G20.

The research for this paper was carried out within Sonderforschungsbereich 373 and was printed using funds made available by the Deutsche Forschungsgemeinschaft.

1

(2)

1. Introduction

Let X¹ X² Xn be a sample of i.i.d. random variables with density f.

We wish to test whether f belongs to some parametric family ^F of density functions

F = ^ff : f( ) = f( ) ²

R

^k^g (1) against the nonparametric alternative

f =²^F:

We consider the kernel estimator ^fn of the density functionf,

^fn(t) = 1n ^X_i=1ⁿ

h1nK t^;Xi

hn

(2) where the kernel function K( ) is bounded, of bounded variation, has a bounded support and ^Z

K(x)dx = 1:

The random function ^fn(t) may be represented in the form

^fn(t) = f⁽ⁿ⁾(t ) + 1^pn

Z 1

hnK t^;u hn

dWn(u) (3) where we denote for a function ' (may be vector function)

'⁽ⁿ⁾(t) = ^Z 1

hnK t^;u hn

'(u) du (4)

Wn(t) =^pn(Fn(t)^;F(t )):

Here F is the distribution function of the random variables Xi and Fn is its empirical version:

Fn(t) = 1n ^X_j=1ⁿ

1

I^];1t^](Xj):

We suppose also that

hn^!0 and nhn^!¹ when n ^!¹: 2

(3)

It is well-known that in this case

E

^fn(t)^!f(t )

i.e. ^fn(t) is asympotically unbiased for f(t ) at any point of continuity of f, and since

nhn

Var

^fn(t) =

Var

1

hnK t^;Xn¹

hn

= h1n

Z

1

0

K² t^;x hn

f(x )dx^; 1 hn

Z

1

0

K t^;x hn

f(x)dx

2

we have

nlim^!1nhn

Var

^fn(t) = f(t )^Z ¹

0

K²(x)dx = kf(t )

i.e. ^fn(t) is mean squared consistent for f(t ). More about the properties of ^fn and its applications see, for example, in Silverman (1986), Bickel &

Rosenblatt (1973), Hardle & Marron (1990), Hardle & Mammen (1993).

For a function'(x ) and a random point = ^n( where ^nis the maximum likelihood estimator for ) we introduce the notation

'(x) = '(x ) ^r'(x ) = @@¹'(x ) @

@k '(x )

T

r

'(x) = @@¹'(x ) @

@k '(x )

T

:

Further for simplicity of notation we will writef instead of f( ) and f instead off( ^n). For a nonnegative function we denote by L² theL²^;space generated by the measure with density. Let

I

() be the information matrix,

I

() = ^jjIi j^jji j⁼¹k =

Z @

@if(x ) @@jf(x )f^;1(x ) dx:

We denote by j(x ) for j = 1 k the coordinates of the vector- function = (¹ k)^T,

(x ) =

I

^;1⁼²()f^;1(x )^rf(x ):

3

(4)

So we have

(i( ) j( ))_f^;1 = i j

where

( )_f^;1 = ( )¹f

is the inner product in the space L²_f^;1. The same relation is true for the coordinates _j(x)(j = 1 k) of the vector-function =(x ^n): For a function h²L²_f^;1 we put

h]_L (x) = ^X^k

j⁼¹

;h( ) j( )_f^;1 j(x) h]^L = h ^; h]_L: (5) It is clear that h]_Lis the projection ofh in the space L²_f^;1

on the linear space L = Ln spanned by a set ^f_j(x) j = 1 k^g.

As test statistic we suggest a function, which is measuring the normalized deviation of the kernel estimator ^fn(x) from a modied parametric estimator f⁽ⁿ⁾(x ^n). We will consider these estimators as elements of L². However we measure the distance between functions from L² with the help of two seminorms^k ^k¹ and ^k ^k²:

kh^k¹ = ^kh]_L^k_f^;1 ^kh^k² = ^kh]^L^k: The process

(n⁾(t) = ^pn^h^fn(t) ^; f⁽ⁿ⁾(t ^n)ⁱ (6) further will be called the observable empirical process. We dene

T

⁽¹ⁿ⁾ = ^k ⁽ⁿ⁾( )_L^k²_f^;1 and

T

⁽²ⁿ⁾ = ^k ⁽ⁿ⁾( )^L^k²: (7) For nonnegative functions and h we put

(h) =

k

^Z h(x)(x) dx ²(h) =

k

^Z h²(x)²(x) dx (8) where

k

=

Z

K²(x) dx

k

=

Z

(KK)²(x) dx:

4

(5)

As test statistic we suggest the statistic

T

⁽ⁿ⁾

T

⁽ⁿ⁾ =

T

⁽¹ⁿ⁾ +

8

<

:

h^;1n ⁼²

hhn

T

⁽²ⁿ⁾^;(f)ⁱ (f)

9

=

2

: (9)

Now we introduce the assumption under which we plan to investigate the asymptotic behavior of the statistic

T

⁽ⁿ⁾.

A1.

For the kernel holds K ²L¹ and ^R K(x) dx = 1:

A2.

The least even decreasing majorantK(x) of ^jK(x)^jbelongs to the L¹- space.

A3.

The density function f(x ) as function: ^! L²_f^;1, is continuously dierentiable on the open kernel of . That is there exists such vector function

rf = @@¹f @

@kf

that f( ) ^; f( ⁰) = ^hrf ( ⁰) ^;⁰ⁱ +

r

⁰( ) where ^h ⁱ is the inner product in

R

^k,

k kf^;1 ^!0 when ^!⁰:

A4.

The Fisher information matrix

I

() is positive denite.

2. The kernel smoothing

Assuming A1 and letK(x) be the least even decreasing majorant of^jK(x)^j, K(x) = sup

jt^jx ^jK(t)^j x0:

We consider the operators

A^hf(x) = ^Z 1

hK x^;t h

f(t) dt = f^h(x) h > 0 Mf](x) = sup_h>

0 Z 1

hK x^;t h

f(t) dt:

5

(6)

Suppose that the nonnegative function !(t) satises the condition sup_I 1

jI^j

Z

I !(x) dx 1

jI^j

Z

I

!(x) dx <1 ¹ (10) where I is an interval,^jI^jis the length of I.

Theorem 2.1

(see J.B.Garnett (1981)). If K ²L¹ and weight function ! satises the condition (10), then

1. M is a bounded operator on L²!

2. A^h are uniformly bounded (for h > 0) operators on L²!

3. if f ²L²_!, then f^h ^!f in the metric of the space L²_!, when h^!0.

3. Maximum likelihood estimator

Consider a sample

X

= (X¹ X² Xn) of i.i.d. random variables Xi with density function f( ) = f( ) ² IR^m: The maximum likelihood estimator ^n is a measurable solution of the likelihood equation

rLn(

X

) = ^Xⁿ

j⁼¹

r logf (Xj ) =

0

: (11) It is well-known (see, for example, Witting & Nolle (1970), Konakov (1978), Greenwood & Nikulin (1996)), that under H⁰ and some smoothness conditions on function f(x ) we have

pn^n^; = 1^pn

n

X

j⁼¹ I^;1()^rlogf (Xj ) +

r

n (12) where

r

is a random vector such that

r

n ^!^P

0

(we shall write

r

n = o^P(

1

I)).

Thus,

pn^n^; = I^;1() ^Z ^rlogf (x ) dWn(x) + o^P(

1

I): (13) It is clear that under smoothness conditions on the functions f(x )

Z

j^rlogf]^h(x ) ^; ^rlogf (x )^j²f(x ) dx ^!0 6

(7)

when h^!0. Therefore we deduce from (13) that

pn^n^; = I^;1() ^Z ^rlogf]^K_n (x ) dWn(x) + o^P(

1

I): (14) A similar representation holds under the set Kn of nonparametric local alternatives

Kn = ^ff : f( ) = f( n) +Nn ^;c bn

n = 1 2 ^g (15) where

n= + n^;

is a given vector, > 0, is a given function, Nn bn are sequences of positive numbers tending to 0, c is a constant (see e.g. Liero et al. (1998)).

4. The modied empirical process

We consider the smoothed empirical process ⁽ⁿ⁾(t) = ^Z 1

hnK t^;x hn

dWn(x) (16)

where

Wn(x) =^pn(Fn(x)^;F(x ))

and investigate the limiting behavior of theL²^;norm of the projection_L⁽ⁿ⁾(t) of the process ⁽ⁿ⁾(t) on some nite-dimensional subspace L. For a nonnegative function we put, using (8),

k^k² =

Z

j(x)^j²(x) dx

(f) =

k

^Z f(x)(x) dx ²(f) =

k

^Z f²(x)²(x) dx:

It is well-known that under some appropriate conditions (see Bickel and Rosenblatt (1973) and Hall (1984))

P

(h^;1n ⁼²hn^k⁽ⁿ⁾^k² ^;(f)

(f) x

)

! (x): (17)

7

(8)

Let be a function such that

²L² ^k^k = 1

and ⁽ⁿ⁾ be the projection of ⁽ⁿ⁾ on the one-dimensional subspace generated by ,

⁽ⁿ⁾(t) =

Z

⁽ⁿ⁾(x)(x) dx (t):

So ⁽ⁿ⁾=a, where the random coecient a is dened by a = ^Z ⁽ⁿ⁾(x)(x) dx = ^Z ⁽ⁿ⁾(x) dWn(x)

⁽ⁿ⁾(x) = ^Z 1

hnK x^;t hn

(t) dt:

Since

E

a = 0 and

Var

(a) ^Z ^Kⁿ²(x) dx we obtain from Theorem 2.1 the following proposition.

Proposition 4.1.

Suppose that weight function satises the condition (10) and for some C = C() <¹

f(x ) < C(x) for all x: (18)

Then there exists suchC¹, which depends only onC, the weight and kernel K (and does not depend on ), that

E

^k⁽ⁿ⁾^k² < C¹: (19) It is easily deduced from Proposition 4.1 that

Proposition 4.2.

Suppose that the weight function satises the condition (10) and L is a nite dimensional subspace of the space L². Then under condition (18) such C² exists, which depends only on C, weight , kernel K and dimL, that

E

^k_L⁽ⁿ⁾^k² < C² (20) where ⁽_Lⁿ⁾ is the projection of ⁽ⁿ⁾ on L.

Now we denote by Ln a nite dimensional subspace, dimLn = m. Let L^?n

8

(9)

denote the orthogonal complement of Ln in the space L² and ⁽_Lⁿ^?n⁾ be the projection of ⁽ⁿ⁾ on L^?n.

Theorem 4.1.

Suppose that the weight function satises the condition (10),Lnis a subspace of the spaceL², dimLnis xed,hn^!O, when n^!¹. Then under condition (18)

P

8

<

:

h^;1n ⁼²

hhn^k_L^K^?nⁿ k

2 ^;(f)ⁱ

(f) x

9

=

! (x): (21)

Proof. Since

k_L⁽ⁿ^?n⁾ k

2 = ^k⁽ⁿ⁾^k² ^; ^k_L⁽ⁿⁿ⁾^k²

and from Proposition 2 we can conclude that

sup_n

P

ⁿ^k⁽_Lⁿⁿ⁾^k² > y^o ^!0 when y^!¹ therefore from (17) we obtain (21).

Now we consider a case, when

L = Ln = span^f_j(x) j = 1 k^g: It must be remind that (see(7))

T

⁽²ⁿ⁾ = ^k n⁽ⁿ⁾( )^L^k²: From the Theorem 4.1. we obtain the next

Theorem 4.2.

Suppose that weight function satises the condition (10), the density function f(x ) and kernel K(x) satises the conditions

A1

{

A4

. Then under (18)

P

8

<

:

h^;1ⁿ ⁼²^hhn^k

T

⁽²ⁿ⁾^;(f)ⁱ

(f) x

9

=

! (x): (22)

9

(10)

5. The observable empirical process

Really we deal with the observable empirical process

(n⁾(t) = ^pn^h^fn(t) ^; f⁽ⁿ⁾(t ^n)ⁱ

= ^pn^h^fn(t) ^; f⁽ⁿ⁾(t )ⁱ + ^pn^hf⁽ⁿ⁾(t ) ^; f⁽ⁿ⁾(t ^n)ⁱ: (23) It is clear (see (12), (13), (14)) that under smoothness conditions on the function f(x )

(n⁾(t) = ⁽ⁿ⁾ + ⁽ⁿ⁾(t) +

R

n(t ) (24) where the process

R

n(t ) weakly converge to zero, when n^!¹, and

⁽ⁿ⁾(t) = ^rf⁽ⁿ⁾(t )^T I^;1()

Z

^rlogf]⁽ⁿ⁾(x ) dWn(x): (25) Let PLⁿ be the orthoprojector in the space L²_f^;1 on the subspace

Ln = span^f @

@jf⁽ⁿ⁾(x ) j = 1 k^g: We introduce the matrix Rn() =^jjri j^jji j⁼¹k, where

ri j =

Z @

@if⁽ⁿ⁾(x ) @@jf⁽ⁿ⁾(x )f^;1(x ) dx i j = 1 k:

and denote by R^;1⁼²() its square root. Let

'⁽ⁿ⁾(x ) = '⁽¹ⁿ⁾(x ) '⁽_kⁿ⁾(x )= R^;1⁼²()^rf⁽ⁿ⁾(x ):

It is obvious that

'⁽iⁿ⁾( ) '⁽jⁿ⁾( )_f;1 = i j

where ( )_f^;1 is the inner product in the spaceL²_f^;1. So the system^f'⁽iⁿ⁾( ) i = 1 k^g forms the orthonormal basis (in the metric of the space L²_f^;1) of the subspace Ln. Therefore

PLⁿh](t) = ^X^k

i⁼¹

Z

h(x)'⁽_iⁿ⁾(x )f^;1(x ) dx '⁽_iⁿ⁾(t ) h²L²_f^;1: 10

(11)

Now we denote by Ln the subspace L_n = span^f @

@jf⁽ⁿ⁾(x ^n) j = 1 k^g and put

PLⁿh(t) = ^X^k

i⁼¹

Z

h(x)'⁽_iⁿ⁾(x ^n)f^;1 dx '⁽_iⁿ⁾(t ^n) h²L²_f^;1⁽_x^{^}ⁿ⁾: It seems to be reasonable to expect that operators PLⁿ and PLⁿ weakly converge (on an appropriate sense) when n ^! ¹ to the orthoprojector PL

in the space L²_f^;1 on the subspace L, L = span^f @

@jf (x ) j = 1 k^g:

For our needs it is suciently to prove that the asymptotic distribution of the processPLⁿ ⁽n⁾(t) is the same as the asymptotic distribution of the process

PL ⁽n⁾(t). It may be prooved by the usual weak convergence technique.

One can verify that the limiting distribution of the statistic

T

⁽¹ⁿ⁾ is the ²_k- distribution.

REFERENCES

1. Aguirre, N., Nikulin, M. (1994). Chi-squared goodness-of-t test for the family of logistic distributions, Kybernetika,

30

, 214-222.

2. Bickel, P., Rosenblatt, M. (1973). On some global measures of the derivations of density function estimates. Ann. Statist.,

1

, 1071- 1095.

3. Garnett, J.B. (1981). Bounded analytic functions. Academic Press, N.Y.

4. Ghost, B., Wei-Min Huang (1991). The power and optimal kernel of the Bickel- Rosenblatt test for goodness-of-t. Ann. Statist.,

19

, 999-1009.

5. Hardle, W., Mammen, E. (1993). Comparing nonparametric versus parametric regression ts. Ann. Statist.,

21

, 1926-1947.

11

(12)

6. Hardle, W., Marron, J.S. (1990). Semiparametric comparison of regression curves. Ann. Statist.,

18

, 63-89.

7. Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivar. Anal.,

14

, 1-16.

8. Konakov, V. (1978). Complete asymptotic expansions for the maximum deviation of the empirical density function. Theory Probability Appl.,

28

, 495-509.

9. Liero, H., Lauter, H., Konakov, V. (1998). Nonparametric versus parametric goodness-of-t. Statistics,

31

, 115-149.

10. Greenwood, P.E., Nikulin, M. (1996). A Guide to Chi-squared Testing.

J. Wiley & Sons, N.Y.

11. Rosenblatt, M. (1975). A quadratic measure of deviation of two - dimensionaldensity estimatesand a test of independence. Ann. Statist.,

3

, 1- 14.

12. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.

13. Witting, H., Nolle, G. (1970). Angewandte Mathematische Statistik, Teubner Stuttgart.

12