• Keine Ergebnisse gefunden

Parametric versus Nonparametric Goodness of Fit

N/A
N/A
Protected

Academic year: 2022

Aktie "Parametric versus Nonparametric Goodness of Fit"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Parametric versus Nonparametric Goodness of Fit

Another View

H.Lauter and M.Nikulin

January 26, 1999

Department of Statistics, Institute of Mathematics, University of Potsdam, Germany Mathematical Statistics, UFR MI2S, University Bordeaux 2, France

& Steklov Mathematical Institute, Saint Petersbourg, Russia

Abstract

We consider chi-squared type tests for testing the hypothesis H0 that a density f of observations X1 ::: Xnlies in a parametric class of densities F. We consider a version of chi-squared type test using kernel estimates for the density. The main result is, following Liero, Lauter and Konakov (1998), the derivation of the asymptotic behavior of the power of the test under Pitman and "sharp peak" type alternatives. The connection of the rate of convergence of these local alternatives, the bandwidth of the kernel estimator, the parametric estimator, the power of the test are studied.

Keywords andPhrases: Chi-squared test, goodness-of-t test, density, kernel estima- tors, maximum likelihood estimator, local alternative, Pitman alternative, sharp peak alternative, asymptotic power.

AMS 1991SubjectClassication: Primary 62G10, 62G20.

The research for this paper was carried out within Sonderforschungsbereich 373 and was printed using funds made available by the Deutsche Forschungsgemeinschaft.

1

(2)

1. Introduction

Let X1 X2 Xn be a sample of i.i.d. random variables with density f.

We wish to test whether f belongs to some parametric family F of density functions

F = ff : f( ) = f( ) 2

R

kg (1) against the nonparametric alternative

f =2F:

We consider the kernel estimator ^fn of the density functionf,

^fn(t) = 1n Xi=1n

h1nK t;Xi

hn

(2) where the kernel function K( ) is bounded, of bounded variation, has a bounded support and Z

K(x)dx = 1:

The random function ^fn(t) may be represented in the form

^fn(t) = f(n)(t ) + 1pn

Z 1

hnK t;u hn

dWn(u) (3) where we denote for a function ' (may be vector function)

'(n)(t) = Z 1

hnK t;u hn

'(u) du (4)

Wn(t) =pn(Fn(t);F(t )):

Here F is the distribution function of the random variables Xi and Fn is its empirical version:

Fn(t) = 1n Xj=1n

1

I];1t](Xj):

We suppose also that

hn!0 and nhn!1 when n !1: 2

(3)

It is well-known that in this case

E

^fn(t)!f(t )

i.e. ^fn(t) is asympotically unbiased for f(t ) at any point of continuity of f, and since

nhn

Var

^fn(t) =

Var

1

hnK t;Xn1

hn

= h1n

Z

1

0

K2 t;x hn

f(x )dx; 1 hn

Z

1

0

K t;x hn

f(x)dx

2

we have

nlim!1nhn

Var

^fn(t) = f(t )Z 1

0

K2(x)dx = kf(t )

i.e. ^fn(t) is mean squared consistent for f(t ). More about the properties of ^fn and its applications see, for example, in Silverman (1986), Bickel &

Rosenblatt (1973), Hardle & Marron (1990), Hardle & Mammen (1993).

For a function'(x ) and a random point = ^n( where ^nis the maximum likelihood estimator for ) we introduce the notation

'(x) = '(x ) r'(x ) = @@1'(x ) @

@k '(x )

T

r

'(x) = @@1'(x ) @

@k '(x )

T

:

Further for simplicity of notation we will writef instead of f( ) and f in- stead off( ^n). For a nonnegative function we denote by L2 theL2;space generated by the measure with density. Let

I

() be the information matrix,

I

() = jjIi jjji j=1k =

Z @

@if(x ) @@jf(x )f;1(x ) dx:

We denote by j(x ) for j = 1 k the coordinates of the vector- function = (1 k)T,

(x ) =

I

;1=2()f;1(x )rf(x ):

3

(4)

So we have

(i( ) j( ))f;1 = i j

where

( )f;1 = ( )1f

is the inner product in the space L2f;1. The same relation is true for the coordinates j(x)(j = 1 k) of the vector-function =(x ^n): For a function h2L2f;1 we put

h]L (x) = Xk

j=1

;h( ) j( )f;1 j(x) h]L = h ; h]L: (5) It is clear that h]Lis the projection ofh in the space L2f;1

on the linear space L = Ln spanned by a set fj(x) j = 1 kg.

As test statistic we suggest a function, which is measuring the normalized deviation of the kernel estimator ^fn(x) from a modied parametric estimator f(n)(x ^n). We will consider these estimators as elements of L2. However we measure the distance between functions from L2 with the help of two seminormsk k1 and k k2:

khk1 = kh]Lkf;1 khk2 = kh]Lk: The process

(n)(t) = pnh^fn(t) ; f(n)(t ^n)i (6) further will be called the observable empirical process. We dene

T

(1n) = k (n)( )Lk2f;1 and

T

(2n) = k (n)( )Lk2: (7) For nonnegative functions and h we put

(h) =

k

Z h(x)(x) dx 2(h) =

k

Z h2(x)2(x) dx (8) where

k

=

Z

K2(x) dx

k

=

Z

(KK)2(x) dx:

4

(5)

As test statistic we suggest the statistic

T

(n)

T

(n) =

T

(1n) +

8

<

:

h;1n =2

hhn

T

(2n);(f)i (f)

9

=

2

: (9)

Now we introduce the assumption under which we plan to investigate the asymptotic behavior of the statistic

T

(n).

A1.

For the kernel holds K 2L1 and R K(x) dx = 1:

A2.

The least even decreasing majorantK(x) of jK(x)jbelongs to the L1- space.

A3.

The density function f(x ) as function: ! L2f;1, is continuously dierentiable on the open kernel of . That is there exists such vector function

rf = @@1f @

@kf

that f( ) ; f( 0) = hrf ( 0) ;0i +

r

0( ) where h i is the inner product in

R

k,

k kf;1 !0 when !0:

A4.

The Fisher information matrix

I

() is positive denite.

2. The kernel smoothing

Assuming A1 and letK(x) be the least even decreasing majorant ofjK(x)j, K(x) = sup

jtjx jK(t)j x0:

We consider the operators

Ahf(x) = Z 1

hK x;t h

f(t) dt = fh(x) h > 0 Mf](x) = suph>

0 Z 1

hK x;t h

f(t) dt:

5

(6)

Suppose that the nonnegative function !(t) satises the condition supI 1

jIj

Z

I !(x) dx 1

jIj

Z

I

!(x) dx <1 1 (10) where I is an interval,jIjis the length of I.

Theorem 2.1

(see J.B.Garnett (1981)). If K 2L1 and weight function ! satises the condition (10), then

1. M is a bounded operator on L2!

2. Ah are uniformly bounded (for h > 0) operators on L2!

3. if f 2L2!, then fh !f in the metric of the space L2!, when h!0.

3. Maximum likelihood estimator

Consider a sample

X

= (X1 X2 Xn) of i.i.d. random variables Xi with density function f( ) = f( ) 2 IRm: The maximum likelihood esti- mator ^n is a measurable solution of the likelihood equation

rLn(

X

) = Xn

j=1

r logf (Xj ) =

0

: (11) It is well-known (see, for example, Witting & Nolle (1970), Konakov (1978), Greenwood & Nikulin (1996)), that under H0 and some smoothness condi- tions on function f(x ) we have

pn^n; = 1pn

n

X

j=1 I;1()rlogf (Xj ) +

r

n (12) where

r

is a random vector such that

r

n !P

0

(we shall write

r

n = oP(

1

I)).

Thus,

pn^n; = I;1() Z rlogf (x ) dWn(x) + oP(

1

I): (13) It is clear that under smoothness conditions on the functions f(x )

Z

jrlogf]h(x ) ; rlogf (x )j2f(x ) dx !0 6

(7)

when h!0. Therefore we deduce from (13) that

pn^n; = I;1() Z rlogf]Kn (x ) dWn(x) + oP(

1

I): (14) A similar representation holds under the set Kn of nonparametric local al- ternatives

Kn = ff : f( ) = f( n) +Nn ;c bn

n = 1 2 g (15) where

n= + n;

is a given vector, > 0, is a given function, Nn bn are sequences of positive numbers tending to 0, c is a constant (see e.g. Liero et al. (1998)).

4. The modied empirical process

We consider the smoothed empirical process (n)(t) = Z 1

hnK t;x hn

dWn(x) (16)

where

Wn(x) =pn(Fn(x);F(x ))

and investigate the limiting behavior of theL2;norm of the projectionL(n)(t) of the process (n)(t) on some nite-dimensional subspace L. For a nonneg- ative function we put, using (8),

kk2 =

Z

j(x)j2(x) dx

(f) =

k

Z f(x)(x) dx 2(f) =

k

Z f2(x)2(x) dx:

It is well-known that under some appropriate conditions (see Bickel and Rosenblatt (1973) and Hall (1984))

P

(h;1n =2hnk(n)k2 ;(f)

(f) x

)

! (x): (17)

7

(8)

Let be a function such that

2L2 kk = 1

and (n) be the projection of (n) on the one-dimensional subspace generated by ,

(n)(t) =

Z

(n)(x)(x) dx (t):

So (n)=a, where the random coecient a is dened by a = Z (n)(x)(x) dx = Z (n)(x) dWn(x)

(n)(x) = Z 1

hnK x;t hn

(t) dt:

Since

E

a = 0 and

Var

(a) Z Kn2(x) dx we obtain from Theorem 2.1 the following proposition.

Proposition 4.1.

Suppose that weight function satises the condition (10) and for some C = C() <1

f(x ) < C(x) for all x: (18)

Then there exists suchC1, which depends only onC, the weight and kernel K (and does not depend on ), that

E

k(n)k2 < C1: (19) It is easily deduced from Proposition 4.1 that

Proposition 4.2.

Suppose that the weight function satises the condition (10) and L is a nite dimensional subspace of the space L2. Then under condition (18) such C2 exists, which depends only on C, weight , kernel K and dimL, that

E

kL(n)k2 < C2 (20) where (Ln) is the projection of (n) on L.

Now we denote by Ln a nite dimensional subspace, dimLn = m. Let L?n

8

(9)

denote the orthogonal complement of Ln in the space L2 and (Ln?n) be the projection of (n) on L?n.

Theorem 4.1.

Suppose that the weight function satises the condition (10),Lnis a subspace of the spaceL2, dimLnis xed,hn!O, when n!1. Then under condition (18)

P

8

<

:

h;1n =2

hhnkLK?nn k

2 ;(f)i

(f) x

9

=

! (x): (21)

Proof. Since

kL(n?n) k

2 = k(n)k2 ; kL(nn)k2

and from Proposition 2 we can conclude that

supn

P

nk(Lnn)k2 > yo !0 when y!1 therefore from (17) we obtain (21).

Now we consider a case, when

L = Ln = spanfj(x) j = 1 kg: It must be remind that (see(7))

T

(2n) = k n(n)( )Lk2: From the Theorem 4.1. we obtain the next

Theorem 4.2.

Suppose that weight function satises the condition (10), the density function f(x ) and kernel K(x) satises the conditions

A1

{

A4

. Then under (18)

P

8

<

:

h;1n =2hhnk

T

(2n);(f)i

(f) x

9

=

! (x): (22)

9

(10)

5. The observable empirical process

Really we deal with the observable empirical process

(n)(t) = pnh^fn(t) ; f(n)(t ^n)i

= pnh^fn(t) ; f(n)(t )i + pnhf(n)(t ) ; f(n)(t ^n)i: (23) It is clear (see (12), (13), (14)) that under smoothness conditions on the function f(x )

(n)(t) = (n) + (n)(t) +

R

n(t ) (24) where the process

R

n(t ) weakly converge to zero, when n!1, and

(n)(t) = rf(n)(t )T I;1()

Z

rlogf](n)(x ) dWn(x): (25) Let PLn be the orthoprojector in the space L2f;1 on the subspace

Ln = spanf @

@jf(n)(x ) j = 1 kg: We introduce the matrix Rn() =jjri jjji j=1k, where

ri j =

Z @

@if(n)(x ) @@jf(n)(x )f;1(x ) dx i j = 1 k:

and denote by R;1=2() its square root. Let

'(n)(x ) = '(1n)(x ) '(kn)(x )= R;1=2()rf(n)(x ):

It is obvious that

'(in)( ) '(jn)( )f;1 = i j

where ( )f;1 is the inner product in the spaceL2f;1. So the systemf'(in)( ) i = 1 kg forms the orthonormal basis (in the metric of the space L2f;1) of the subspace Ln. Therefore

PLnh](t) = Xk

i=1

Z

h(x)'(in)(x )f;1(x ) dx '(in)(t ) h2L2f;1: 10

(11)

Now we denote by Ln the subspace Ln = spanf @

@jf(n)(x ^n) j = 1 kg and put

PLnh(t) = Xk

i=1

Z

h(x)'(in)(x ^n)f;1 dx '(in)(t ^n) h2L2f;1(x^n): It seems to be reasonable to expect that operators PLn and PLn weakly converge (on an appropriate sense) when n ! 1 to the orthoprojector PL

in the space L2f;1 on the subspace L, L = spanf @

@jf (x ) j = 1 kg:

For our needs it is suciently to prove that the asymptotic distribution of the processPLn (n)(t) is the same as the asymptotic distribution of the process

PL (n)(t). It may be prooved by the usual weak convergence technique.

One can verify that the limiting distribution of the statistic

T

(1n) is the 2k- distribution.

REFERENCES

1. Aguirre, N., Nikulin, M. (1994). Chi-squared goodness-of-t test for the family of logistic distributions, Kybernetika,

30

, 214-222.

2. Bickel, P., Rosenblatt, M. (1973). On some global measures of the derivations of density function estimates. Ann. Statist.,

1

, 1071- 1095.

3. Garnett, J.B. (1981). Bounded analytic functions. Academic Press, N.Y.

4. Ghost, B., Wei-Min Huang (1991). The power and optimal kernel of the Bickel- Rosenblatt test for goodness-of-t. Ann. Statist.,

19

, 999-1009.

5. Hardle, W., Mammen, E. (1993). Comparing nonparametric versus parametric regression ts. Ann. Statist.,

21

, 1926-1947.

11

(12)

6. Hardle, W., Marron, J.S. (1990). Semiparametric comparison of re- gression curves. Ann. Statist.,

18

, 63-89.

7. Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivar. Anal.,

14

, 1-16.

8. Konakov, V. (1978). Complete asymptotic expansions for the maxi- mum deviation of the empirical density function. Theory Probability Appl.,

28

, 495-509.

9. Liero, H., Lauter, H., Konakov, V. (1998). Nonparametric versus para- metric goodness-of-t. Statistics,

31

, 115-149.

10. Greenwood, P.E., Nikulin, M. (1996). A Guide to Chi-squared Testing.

J. Wiley & Sons, N.Y.

11. Rosenblatt, M. (1975). A quadratic measure of deviation of two - dimensionaldensity estimatesand a test of independence. Ann. Statist.,

3

, 1- 14.

12. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.

13. Witting, H., Nolle, G. (1970). Angewandte Mathematische Statistik, Teubner Stuttgart.

12

Referenzen

ÄHNLICHE DOKUMENTE

Für Personen, die ein konventionelles aerobes Training nicht durchführen können oder wollen, könne Yoga eine Alternative sein, heisst es in einer Pressemitteilung der Euro pean

In this Chapter, we attempt to generalize the theory of score tests. The situation is similar to the one in estimation theory. There is a classical estimation method based on the use

waltung - abzustützen. Selbstorganisation hat Vorrang vor Ausschau nach Fremdhilfe. Damit wird die Staatsintervention an den Charakter einer subsidiären

Alcatel belongs to the most frequently traded stocks on the Paris Stock Exchange (Paris Bourse). On average the shares of Alcatel are exchanged every 52 seconds. The data consist

As discussed above, the major sources of terms include the OECD Guidance Document on the Validation and International Accept- ance of New or Updated test Methods for Hazard Assessment

Another important price is that the basic idea that a cause is positively relevant to its effect under the obtaining circumstances, though useful for explicating direct causation,

The third section applies the test to the observed GLD data and presents a power curve that shows the ability of the test to correctly reject the monofractal

tiven für spezifische Situationen, meist Entscheidungen, was für die Situation selbst gelten oder auch danach stattfinden kann; als Beispiel mag die eingangs behandelte