• Keine Ergebnisse gefunden

How to Improve Accuracy of Estimation

N/A
N/A
Protected

Academic year: 2022

Aktie "How to Improve Accuracy of Estimation"

Copied!
46
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

HOW TO IMPROVE ACCURACY OF ESTIMATION.

LEPSKI O.V.

Humboldt Universitat zu Berlin.

Spandauer str. 1, 10178 Berlin, Germany

Abstract. The new approach, allowed to take into account some additional in- formation, coming from datas, is proposed. The main idea is to obtain from datas some information about structure of the model in order to improve accuracy of estimation. It seems to be important, since standard nonparametric accuracy of estimation is usually very low. To improve one statisticians often impose some additional structure on considerable model, that can lead to inadequate model.

To avoid both these disadvantages special form of estimation procedure, based on some combination of adaptive technique and hypothesis testing, is applied. From mathematical point of view it leads to the consideration of new kind of minimax risks. From practical point of view it allows to improve accuracy of estimation procedures even for the cases when guess on special structure of a model turns out to be wrong.

1.

Introduction

The paper deals with the approach, allowing to improve the quality of estimation procedures. This approach is general and one can be applied to any statistical model and to an arbitrary structural assumptions. By this reason, it is convenient to present all general de nitions and to explain approach itself in terms of abstract statistical model, in other words, in terms of the sequence of statistical experiments, see Ibragimov and Khasminskii (1981). Let X" B"

P

"f f()2 be the statistical experiment, generated by the observation X". Here (X" B") be some measurable space,

P

"f is the probability measure, de ned on this space, " 2 (0 1) is the small parameter and later on the asymptotics will be studied w.r.t. " ! 0. The set is some given set of functions, determinedon the Euclidean space

R

s,f() :

R

s!

R

1. Here and later we assume that

s(L) =

(

f() : supt

201]sjf(t)jL <1

)

for someL > 0. For 81p1 denote

kf()kp =

8

<

: R

01]sjf(t)jp

d

t1p 1p <1 supt201]sjf(t)j p =1 and consider the minimax risk on the set :

The paper was printed using funds made available by the Deutsche Forschungsgemeinschaft.

(2)

R" ~f" '"()= supf

()2

E

"fn';1" ()k~f"();f()kp

oq (1.1)

where ~f"(t) = ~f"(t X") t20 1]sis some estimator, i.e. a function, de ned on 0 1]s and being measurable w.r.t. observation X"

E

"f is mathematical expectation w.r.t. measure

P

"f q > 0 is some xed number, and '"() > 0 is normalizing factor (n.f.), i.e. '"() !0, as " !0.

Normalizing factor '"()is called minimax rate of convergence (MRC) if

liminf"!0inff~"R ~f" '"()> 0,

limsup"!0R ^f" '"()<1

for some estimator ^f", which is called asymptotically ecient (a.e.) in minimax sense.

Along the paper, the set is assumed to be such that MRC '"() and a.e.

^f"() exist and they are known. Note, that MRC can be treated as an accuracy of

estimation. As it follows from the de nition of MRC, this accuracy is attainable and unimprovable in minimax sense. However, what should a statistician do in the situation, when the accuracy of estimation is bad, say, '"() tends to zero too slowly? Can one be improved? The answer on last question is, of course, negative under consideration of minimax risks of type (1.1). On the other hand, the consideration of other types of minimax risks would possibly lead to the positive solution of this problem. Before than to answer this question, we should answer another one: why do we hope that such improvement is possible on principle? One of the answers, which seems to be reasonable, consists in the following. Let us suppose we have strong suspicion (hypothesis)

H

0 : f()20 0 :

It is supposed to be known that there exist an estimator ^f"(0) and the n.f. '"(0), being MRC on 0, such that

limsup"!0R" ^f"(0) 0 '"(0)<1

'(0)

'() !0 as "!0

where for an arbitrary estimator ~f"

R" ~f" 0 '"(0)= sup

f()20

E

"f n';1" (0)k~f"();f()kp

oq: (1.2) Thus, the hope on improvement of accuracy of estimation is based on the hypothesis on belonging an estimated function to the set 0, where more precise estimation procedures are available. In this context there are, at least, two possibilities to give the mathematical sense to the words "improvement of accuracy of estimation". The rst one is to use so-called adaptive approach. This approach is very popular last time and there exist a lot of publications on this topic. One could mention the papers 1]-12], 18]-24], 27] and 31] among others.

(3)

1.1.

Adaptive approach.

The setup of adaptive estimation, reduced to our pur- poses, consists in the following: one needs to nd an estimator f"(a)() such that

limsup"

!0

R" f"(a) '"()<1 (1.3)

limsup"

!0

R" f"(a) 0 '"(0)<1:

Any estimator, satisfying (1.3), is called adaptive. The de nition of the adaptive estimator can be done in the following, equivalent (1.3), form. Put

%^"(f) =

8

<

:

'"(0) f()20

'"() f()2n0

and consider the risk, which could be called adaptive.

R("a) ~f" = supf

()2

E

"fn%^;1(f)k~f"();f()kp

oq: (1.4)

Then adaptive estimator f"(a)() is an estimator, providing the niteness of the risk (1.4), i.e.

limsup"

!0

R("a) f"(a) <1 (1.5)

see Lepski and Spokoiny (1995a) for more details on risks of type (1.4). Obviously, conditions (1.3) and (1.5) are equivalent. Note also two important facts.

First, considering risks of type (1.4), we leave the frameworks of standard minimax approach, because now n.f. can depend on function to be estimated. However, any estimator f"(a), satisfying (1.5), is a.e. estimator on the set w.r.t. the minimax risk (1.1). It follows from the facts that '"() is MRC on , and ^%"(f) '"() for 8f()2:

Next, consider the set of pairs (a" b"), where 0< a"< b"'"().

Denote by M" the set of n.f. ^%"(f) f()2, represented as

%"(f) =

8

<

:

a" f()20

b" f()2n0: (1.6)

For 8%"()2M" introduce the risk

R("a) ~f" %"()= sup

f()2

E

"fn%"();1k~f"();f()kp

oq:

Note that ^%"(f) 2 M" and R("a) ~f" = R("a) ~f" ^%"(): The following two statements are the simplest consequence of the fact that '"() and '"(0) are MRC on the sets and 0 respectively.

For 8%"()2M" such that b"='"()!0, as "!0

liminf"

!0

inf

f~" R("a) ~f" %"()= +1

(4)

For 8%"()2M" such that a"='"(0)!0, as "!0 liminf"!0 inf

f~" R("a) ~f" %"()= +1

Two last results together with (1.5) allow to make the conclusion, that n.f. ^%"() is optimal among n.f.'s of type (1.6).

Unfortunately, even having constructed an adaptive estimator f"(a)(), satisfying (1.5), one can say nothing on its accuracy of estimation. It is because, the n.f. ^%"(f), describing this accuracy, depends on an estimated function and, therefore, one is unknown. More exactly, n.f. ^%"() depends on information whether function f() belongs to the set 0 or not. Since such sort of information can not be obtained from noisy datas exactly, one can only state the "theoretical optimality" of the estimator f"(a)(), which follows from mentioned above optimality of n.f. ^%"().

On the other hand, it seems reasonable to test the hypothesis H0 (hypothesis on belonging of an estimated function to the set 0) and then to use obtained results for the construction of estimators and for study their properties, in particular, the accuracy of estimation. It is evident, a result of a testing is random, one may be true, may be not, but we might expect the receiving some additional information the use of which would be natural.

These reasonings lead us to the following idea (second possibility) how to improve accuracy of estimation. This idea consists in the consideration of minimax risks with random normalizing factors of special type.

1.2.

Minimax risks with random normalizing factor.

By analogy with (1.6), for 8"2(0 1) let us consider the family of bounded, measurable w.r.t. X" random variables %", taking two values f'"()g and fa"g, where 0 < a" < '"(). For every such %" and for every estimator ~f"() introduce the risk

R("r) ~f" %"

= supf

()2

E

"f n%";1k~f"();f()kp

oq: (1.7)

Note, that considering risks of type (1.7), we understand an improvementof accuracy of estimation as the ful llment of the event f%" 6='"()g. Since it is random event, we have to be sure, that one holds in somehow sense. Otherwise, the following arti cial example

%"=

8

<

:

0

'"() otherwise

shows the formality of the notion "improvement". We had no such diculties un- der consideration of the adaptive risk (1.4), since ^%"(f) = '"(0) << '"() for

8f() 2 0. Remember, however, that theoretical possibility to improve the accu- racy of estimation is connected with the acceptance of the hypothesis H0. So, let us demand, that event f%"6='"()g holds for functions, belonging to the set 0. In order to give these words mathematical sense, introduce the following subfamily "

of random normalizing factors (r.n.f.). Let 0 < < 1 be some given number and

let " "2(0 1) be some xed function such that 0 < "1; for 8"2(0 1).

(5)

We will say that r.n.f. %"2 " if limsup"

!0

;1" fsup

()20

P

"ff%"='"()g1:

Note, that we do not require " ! 0 as " ! 0, in particular one can take " = > 0 for 8" 2 (0 1). In fact " = "("), but we will omit the dependence on () in the notation, because further all statements are formulated for an arbitrary but xed (). The sense of the condition %" 2 " is rather natural. One means that for 8f()20 and for 8%"2 "

P

"ff%"6='"()g1;" 1; > 0

for all enough small " 2 (0 1). The last inequality means that the probability of the "improvement of accuracy of estimation" is positive, at least, for each function, belonging to the set 0. On rst look, it seems reasonable to choose ", tending to zero, "!0 as fast as possible. It would guarantee

P

"ff%" 6='"()g!1 " !0 8f()20:

However, as we will see later on, the choice of " is delicate problem, and at a moment we require only 0 < " < 1; . Note also, that the introduction of the subfamily " allows us to de ne an optimal r.n.f. and asymptotically ecient w.r.t.

risk (1.7) estimator.

Let 0 < a" < '"() " 2 (0 1) be some xed function. Denote by F(a()) the set of function b" " 2(0 1) such that

lim"!0 b"

a" = 0

and put

" a()=n%"= (f'"()g fb"g)2 " :b()2F(a())o:

Thus, " a() consists of the r.n.f., belonging to the family ", and, having second value, which is "better in order" than some given function a().

Denition 1.

The r.n.f. %^" = f'"()g n'" ()o 2 " is called optimal (asymptotically optimal) if

1. There exist an estimator f"() and a constant M < 1, independent on

" 2(0 1) and () , such that limsup"

!0

R("r)(f" ^%")M: (1.8)

2. For 8%"2 " '" () liminf"

!0

inf

f~" R("r) ~f" %"

= +1 (1.9)

where inf is taken over all possible estimators.

Denition 2.

Let %^" be an optimal r.n.f. Then an estimator f"(), satisfying (1.8), is called ()-adaptive.

(6)

Remark 1.

As it follows from (1.9), the function '" (), being the second value of optimal r.n.f. %^" can not be improved in order. The rst value '"() can not be improved in order as well, because one is MRC. Both these facts together with (1.8) explain why %^" is called optimal.

Remark 2.

By denition %^" '"() for 8"2(0 1) and for any (). Therefore, from (1.8), any ()-adaptive estimator is a.e. estimator on the set w.r.t. the risk (1.1). It means, that considering risks of type (1.7), we, in fact, do not leave the frameworks of the standard minimax approach.

Remark 3.

We know only that by denition '" ()< '"(). So, if '" ()

'"(), as "!0 then from (1.9) any improvement of the accuracy of estimation is

impossible in this case. Such statistical setups exist, f.e., it is typical for cases, when minimax risk is described by uniform norm (p = 1). However, the study of such kind of problems lies beyond the scope of the paper. We refer to the recent paper of Low (1996), where similar results for the case p =1 were obtained. Certainly, the

case '" () = o('"()), as " ! 0 is much more interesting for our purposes,

and only such setups are studied later on.

Another interesting question is: what is a connection between '" () and

'"(0)? It is intuitivelyclear, that '" () can not be better in order than '"(0):

The exact statement is given by the following Proposition 1.

Proposition 1.

Let %^" = f'"()g n'" ()o be some optimal r.n.f. Suppose, there exist an estimator !f"() such that for some q1 > q

limsup"

!0

f()2sup0

E

"fn';1" (0)k!f"();f()kp

oq1

R <1: Then for 8() such that 0< " 0 "2(0 1),

liminf"

!0 0

@

'" ()

'"(0)

1

A

l0

2M

!1

q

where

0< l0 = liminf"!0 inf

f~" R" ~f" 0 '"(0) 0 = 2;1l0q1q1;q R;q1q;q:

Remark 4.

The assumption of the proposition means that '"(0) is MRC on the set 0 not only for the loss function jjq , but for the loss function jjq1 as well.

It is typical for the asymptotical statistics, that one and the same function is MRC for wide class of loss functions, see, f.e., Ibragimov, Khasminskii (1981). Note also that l0 > 0 by denition of MRC.

2.

Application to the construction of condence sets

In this section we show how to apply the notions of optimal r.n.f. and ()- adaptive estimator to the construction of con dence sets. Fix some 0< < 1 and let ^%" and f"() be the optimal r.n.f. and ()-adaptive estimator respectively. The

(7)

function () is supposed to be chosen on an arbitrary way. Then, from (1.8) and Markov inequality, one obtains for all small enough " > 0

fsup()2

P

"f

(

%^;1" kf"();f()kp > M

)

: This is equivalent

P

"f

(

kf"();f()kp M

^%"

)

1;:

uniformly w.r.t. f()2. It means, that with given probability 1; an estimated function lies inside theM Lp-ball, with center in the "point" f"() and of the "radios"

%^". By de nition, pair (f"() ^%") is computable by observation X" (measurable

w.r.t. X"), and, therefore, if event n%^" ='" ()o holds, we guarantee essentially more precise coverage of an estimated function.

3.

Relations to the adaptive estimation and to the hypothesis testing

In this section we brie"y discuss the relations between the problem of nding of an optimal r.n.f. and an ()-adaptive estimator and problems, arising in the adaptive estimation and in the hypothesis testing.

Let ^f"() and ^f"(0)() be a.e. estimators on the sets and 0 respectively.

Further we will see that ()-adaptive estimator f"() can be often represented as follows.

f"() =

8

<

:

^f"(0)() ^%" ='" ()

^f"() ^%" ='"() (3.1)

Proposition 2.

Let " =O(f'"(0)gq), as " !0 and let f"() be "-adaptive estimator, represented by (3.1).

Then f"() is adaptive estimator, i.e risk (1.5) (or (1.2)) of this estimator is

nite.

As it follows from proposition 2, if " tends to zero rather quickly, then ()- adaptive estimator is, at the same time, adaptive estimator.

Let us now consider the relations to the hypotheses testing problems. The ful- llment of the event f%^" = '" ()g can be treated as the acceptance of the hypothesis

H

0 : f() 2 0. Then, the assumption ^%" 2 " means that rst type error probability is bounded by ().

Proposition 3.

Let %^" be optimal r.n.f. and let ()-adaptive estimator f"() be represented by (3.1). Suppose, also, that ^f"(0)()20 for 8"2(0 1):

Then for 8 0< < 1 9 H > 0 such that limsup"

!0

f()2sup"(H)

P

"f n%^"='" ()o where for 8H > 0

"(H) =f()2 : inff

0 2

0

kf();f0()kp H'" ():

(8)

The statement of Proposition 3 means, that hypothesis H0 can be tested versus the family of local alternative

H

" : f() 2 "(H) with prescribed 1-st and 2-d type errors probabilities.

4.

Examples of the statistical models and of the hypotheses, corresponding to them

In this section we consider 3 particular models: white Gaussian noise model, multivariate regression model and probability density model. For each model we discuss several hypotheses H0, which is seemed naturally to investigate in context of the problems, presented in the paper. All these hypotheses possess the following property: more precise estimation procedures are available under them. We also want to mention that all mathematical results, presented in the examples, are valid under some additional assumptions. We will not describe and discuss them, because it is not required for further consideration, and we give only references on the papers, where exact results can be found.

4.1.

White Gaussian Noise Model.

Let statistical experiment be generated by the observation X", which is the sample path of the stochastic process X"(), satisfying on the interval 0 1] the stochastic di#erential equation

d

X"(t) = f(t)

d

t + "

d

w(t)

where w(t) is standard Wiener process. Thus, X" = (X"(t) 0t1): Remind, that " is the small parameter, and the case "!0 is of our interest. Let > 0 Q >

0 be some given constants, and let = m+ , where m0 is integer and 0<

1: Let H( Q) be Holder space, i.e. the set of m-times continuously di#erentiable functions f() whose m-th derivative, satis es on 0 1] Holder condition with exponent and constant Q, i.e.

jf(m)(t1);f(m)(t2)jQjt1;t2j 8t1 t2 20 1]:

Here f(m)() denotes m-th derivative of f(): In this model we consider univariate case, hence s = 1 kkp 1 p < 1 and kk1 are usual Lp-norm or C-norm respectively, which are determined on the 0 1].

Let = H( Q)\1(L) for some given Q and L: It is well-known (Ibragimov and Khasminskii (1980)) that

'"() =

8

>

<

>

:

"22+1 1p < 1

"qln1"22+1 p = 1:

This rate is attained by linear estimators, f.e., by kernel one with properly chosen kernel and bandwidth. Other classes of smooth functions, such as Sobolev and Besov ones, can be used for the description of the set as well. Now let us consider some possible hypotheses.

(9)

4.1.1. Hypothesis on parametric subfamily. Consider the following hypothesis

H

0 : f()20 =ff(t) = f0(t #) 8t20 1] #2$

R

sg

where function f0( ), set $ and integer s 1 are given. Under some regularity assumptions, Ibragimov and Khasminskii (1981, ch.2) the MRC on the set 0 is

'"(0) = " and a.e. estimator can be constructed as ^f"(0)() = f0( ^#"), where ^#"

is the maximum likelihood estimator. Note that " = '"(0)<< '"() ="22+1 (p <

1) or "qln1"22+1 (p = 1) for all 2 (0 1): One of the classical examples of parametric subfamily is yielded polynomial regression, i.e.

f0(t #) =Xs

i=1#iti;1 (#1 :: #s)2$

R

s:

4.1.2. Hypothesis on smoothness. Let

H

0 : H( P)\

where > and H( P) P > 0 is another Holder space. In this case the set 0 consists of the functions, which are smoother than functions, belonging to the set : The MRC '"(0) is given by formulae

'"(0) =

8

>

<

>

:

"22+1 1p <1

"qln1"22+1 p =1 : and we see that again '"(0)<< '"():

For the sets 0, described in examples 4.1.1 and 4.1.2, the method of the con- struction of adaptive estimators is the simple application of the results, obtained in Lepskii (1991), for all 1p1 0< < <1 and q > 0.

In the section 5, for hypotheses, presented in examples 4.1.1 and 4.1.2, we nd optimal r.n.f. and construct ()-adaptive estimator.

4.2.

Multivariate regression.

Let statistical experiment be generated by the ob- servation Xn (here and in the next example 1n plays role "), obtained in the multivariate regression model, i.e. Xn =f(y1 Z1) :: (yn Zn)g where

yi =f(Zi) +"i i = 1 :::n:

Here Zi = (zi(1) :: zi(s)) i = 1 ::: n are i.i.d random vectors with common prob- ability density p() de ned on the unit cube 0 1]s "i i = 1 ::: n are i.i.d.

random variables,

E

"1 = 0

E

"21 =2 <1:

Let Hs( Q) = m + Q > 0 be isotropic Holder space on the unit cube 0 1]s. There are several equivalent de nitions of isotropic (anisotropic) Holder (or Sobolev and Besov) spaces, see, f.e., Nikolskii (1975). We will use the follow- ing one. Fix some i 2 f1 2 :: sg and denote fi(m)(Z) = @m@zfmi(Z): Put Z(l) =

z(1l) ::: z(il) ::: z(sl)

l = 1 2: We say that function f() belongs to the Holder space Hs( Q), if

sup

(z1::zi;1zi+1::zs)201]s;1

fi(m) Z(1);fi(m) Z(2)Qzi(1);zi(2)

(10)

for 8i = 1 ::: s and 8zi(1) zi(2)20 1]:

Roughly speaking, for each xed (z1 :: zi;1zi+1 :: zs)20 1]s;1 functions gif(z) = f(z1 :: zi;1 z zi+1 :: zs) belong, as function of z , to the Holder space H( Q) on the interval 0 1] for 8i = 1 ::: s: In other words, any function f()2 Hs( Q) has one and the same smoothness = m + in each direction.

Let =Hs( Q)\s(L) for some given Q L and s: Put also p = 2 i.e.

we will consider the losses, being L2-norm on the unit cube 0 1]s: The MRC is given by formulae 'n() =n;2+s which is obtained by Nussbaum (1986).

Now let us describe some possible hypotheses. Certainly, the hypothesis on para- metric subfamily and the hypothesis on smoothness are of interest in this case as well. However, as it seems to us, hypotheses, which could be called "hypotheses on structure", are more important under consideration of multidimensional statistical models.

4.2.1. Dimensionality hypothesis. This hypothesis consists in the assumption that a regression function actually depends on t < s signi cant variables. Thus, formally,

H

0 : 9 1t < s i1 ::: it F :

R

t!

R

1 : f (z1 ::: zs) =F (zi1 ::: zit): Evidently, that the implication f() 2 Hs( Q) =) F() 2 Ht( Q) takes place, and, hence, 'n(o) =n;2+t and we see that 'n(o)<< 'n() for 8 1t < s:

4.2.2. Hypothesis on additive structure. This hypothesis consists in the assumption, that multivariate regression function can be represented as the sum of univariate functions.

H

0 : 9 fk :

R

1 !

R

1 k = 1 ::: s : f(z1 ::: zs) =Xs

k=1fk(zk):

A lot of papers are devoted to the estimation problem under additivity hypothesis.

It is because, the MRC under this hypothesis coincides with MRC for univariate case, i.e. 'n(0) =n;2+1 << '():

Apparently, the rst paper, where this result has been obtained, is Stone (1985).

We also mention the paper of Linton and Nielsen (1995), where the same result has been proved under rather mild assumptions.

It is also reasonable to combine the hypothesis on additive structure with the hypothesis on smoothness.

~H

0 : 9fk()2H( P) k = 1 ::: s > : f(z1 ::: zs) =Xs

k=1fk(zk):

By the same reasons, under this hypothesis

'n(0) = n;2+1:

(11)

4.2.3. Hypothesis on single index structure. This hypothesis consists in the assump- tion that there exists some direction, where a multivariate function is an univariate one.

H

0 : 9# 2

R

s k#k= 1 and F :

R

1 !

R

1 : f(z1 ::: zs) =F

Xs

k=1#kzk

!

: As it follows, f.e., from Speckman (1988), the MRC under this hypothesis is the same as for the univariate case,

'n(0) =n;2+1 and we see, that 'n(0) << 'n():

It is also possible to combine the hypothesis on single index structure with the hypothesis on parametric subfamily. To do this, it is enough to suppose that func- tion F() is known. For example, putting F(x) = x 8x 2

R

1 we arrive to the

"linearity" hypothesis

~H

0 : f()20 =ff() : f(z) = #1z1+::: + #szsg: It is clear, that under this hypothesis

'n(0) =n;12:

4.3.

Probability density estimation.

Let statistical experiment be generated by the observation Xn = (X1 ::: Xn) where Xi = Xi(1) ::: Xi(s)

i = 1 ::: n are i.i.d. random vectors with common probability density f(x1 ::: xs): Let f()2 where, as in example 4.2., = Hs( Q)\s(L) for some xed Q L and s. Let again p = 2: Then, Nussbaum (1986), the MRC on the set is

'n() =n;2+s:

4.3.1. Hypothesis of independence. This hypothesis is classical in the theory of hy- pothesis testing. One consists in the assumption that for 8i = 1 n the components of the vector Xi = Xi(1) ::: Xi(s)

are independent random variables.

H

0 : 9fk() :

R

1 !

R

1 Z 1

;1

fk(x)

d

x = 1 k = 1 s : f(x1 ::: xs) = Ys

k=1fk(xk):

Under this hypothesis, for 8k = 1 s each univariate density fk() can be esti- mated separately, using only the corresponding observations X1(k) X2(k) ::: Xn(k). Let ^fk() = ^fk X1(k) X2(k) ::: Xn(k)

be, for example, Nadaraya-Watson estima- tor, providing univariate MRC - n;2+1: Under hypothesis H0 ^fk() are i.i.d. ran- dom variables, and, therefore, the estimator ^f(0)() =Qsk=1 ^fk() provides univariate MRC of estimation for f(x1 ::: xs) =Qsk=1fk(xk): Thus,

'n(0) =n;2+1:

As regards to the problem of adaptive estimation for multivariate statistical mod- els, there is much less known in comparison with univariate ones. In the case of isotropic spaces, apparently the general theory, developed in Lepski (1991,1992a),

(12)

could be applied. In the case of anisotropic spaces we know only recent papers of Neumann(1995) and Birge and Massart (1995), concerning with adaptive estimation over scales of anisotropic Besov spaces and of anisotropic Holder spaces respectively.

The consideration of minimax risks with r.n.f. for multidimensional models is the subject of a series of separated papers. Here we would like only to present one conjecture in this direction.

Conjecture 1.

Let = Hs( Q)\s(L) and let we believe the hypothesis on additive structure or the hypothesis on single index structure. Then the function

'" (), being the second value of optimal r.n.f., and which is understood as the

"improvement of accuracy of estimation", is given by the following formulae.

'n(n) =

n

s

ln 1n

!

; 2

4+s

:

As we see, if our conjecture is true, then the "improvement" always exists, and for the dimensionality s = 2 this "improvement" di#ers from MRC under hypothesis only by factor ln1n4+2: In particular, if n then the function 'n () coincide with MRC on hypothesis set, and, due to Proposition 1, this is the best possible improvement of the accuracy of estimation.

5.

White Gaussian Noise Model

Let we observe the sample path of the stochastic process X"(), satisfying on the interval 0 1] the stochastic di#erential equation

d

X"(t) = f(t)

d

t + "

d

w(t) (5.1)

where w() is standard Wiener process, " !0 is the small parameter.

Let > 0 Q > 0 be some given constants, and let = m + , where m 0 is integer and 0 < 1: Here and later we suppose that function f(), generated the equation (5.1), belongs to the space =H( Q)\1(L).

5.1.

Hypothesis on parametric subfamily.

Let us suppose that we believe the following hypothesis

H

0 : f()20 =ff()2 : f() =f0(t #) 8t20 1] # 2$

R

sg

where function f0( ), set $ and s1 are given.

For each integer l 1 and for each vector z 2

R

l denote kzk = Pli=1zi2

1

2

and we will omit the dependence of l in the notation of kk: We suppose that the following assumptions ful ll.

A1.

The set $ is bounded, closed subset in

R

s s1:

A2.

There exist > 0 and L > 0 such that

tsup201]jf(t #1);f(t #2)jL k#1;#2k for 8#1 #2 2$:

A3.

f0( #)2H( Q)\1(L) for 8#2$:

A4.

9#0 2$ and 9Q1< Q such that f0( #0)2H( Q1):

(13)

Put '"() = "22+1 and let kf()k2 = R01f2(t)

d

t12 be L2-norm on the interval 0 1]. Remind that '"() is MRC on the set H( Q), in particular, when minimax risk is described by L2-losses.

Now let us introduce the minimax risk with r.n.f., corresponding to the estimation problem on the space and to the hypothesis H0:

Fix some 0 < < 1 and let 0 < " < 1; " 2 0 1] be some given function.

Denote by " the family of measurable w.r.t. (X"(t) 0t1) random variables, taking two values '"() and a", where 0< a"< '"() and satisfying the following inequality

limsup"

!0

";1sup#

2

P

#f%"='"()g1: (5.2)

Here

P

# denotes the measure, generated by the process X"(t) 0 t 1 when function f() in (5.1) belongs to the set 0, i.e. f() = f0( #) for some # 2 $:

For 8%" 2 " and for every an arbitrary estimator ~f"() consider the risk

R("r) ~f" %"

= supf

()2

E

f %";1k~f"();f()k2q (5.3)

where q > 0 is some xed constant.

Now let us construct the r.n.f. ^%" 2 ", being optimal, in accordance with the de nition 1, and ()-adaptive estimator f"():

Let % = %m=fgij=1m+1 be the matrix with the elements ij = (i+j;1);1: Put for an arbitrary integer N and k = 1 N tk =tk(N) = k=N &k = tk;1 tk):

Also, for 8k = 1 N introduce the vectors (

x

(k))0= x(1k) ::: x(mk+1)

and (

y

(k))0 = y1(k) ::: ym(k+1)

as

x(jk) =NjZ

k(t;tk)j;1

d

X"(t) j = 1 m + 1

y

(k) = %;12

x

(k):

Here sign "0 " means transposition. The vectors

y

(k) k = 1 N are well-de ned, since matrix % is strictly positively de ned for 8m1 . For 8k = 1 N and 8t 2

&k put

d

(k)(t)0 = (1 N(t;tk;1) ::: (N(t;tk;1))m): Denote for 8t20 1]

fN(t) = XN

k=1(

x

(k))0%;1

d

(k)(t)Ift2&kg (5.4) or in equivalent form

fN(t) = XN

k=1(

y

(k))0%;12

d

(k)(t)Ift2&kg: (5.5)

Referenzen

ÄHNLICHE DOKUMENTE

The proper way of doing model selection for a new data point involves considering the whole data (t; t 0 ) instead of using just the new datum t 0 to update the prior.. For a feeling

In Section 5 we show that the algorithms we developed for the Gaussian case can also be used to solve an approximate sparse maximum likelihood problem for multivariate binary

ƒ Java element delta information for all changes: class path changes, added/removed elements, changed source, change to buffered state (working copy). ƒ Changes triggered by

Comparisons with in situ measurements of the differential scattering cross section of cloud particles performed with the airborne polar nephelometer allowed to test the

Recent research into conceptual combination has produced several competing theories (including CARIN, Gagné, 2000; constraint theory, Costello and Keane, 2000; Dual- Process

The survival model with quadratic mortality rate, problem of parameter estimation and corresponding numerical algorithms are discussed.. The respec- tive problems arise

Regarding the similarities of existing reference models in the area of business information systems this article discusses which information can be gained, or is missing, from

While the direct effect of preference heterogeneity is to reduce country border stability, by increasing the gap between government and the people and hence the welfare cost