Two adaptive rates of convergence in pointwise density estimation

(1)

Two adaptive rates of convergence in pointwise density estimation

Cristina BUTUCEA

Humboldt-Universitat zu Berlin, SFB373, Spandauer Strasse 1 D 10117 Berlin, Germany

Universite Paris VI, UPRES - A 7055 CNRS, 4, Place Jussieu, F 75005 Paris, France

(E-mail: butucea@wiwi.hu-berlin.de, butucea@ccr.jussieu.fr)

Abstract

We consider density pointwise estimation and look for best attainable asymptotic rates of convergence. The problem is adaptive, which means that the regularity parameter, , describing the class of densities, varies in a set^B. We shall consider, suc- cessively, two classes of densities, issued from a generalization of^L² Sobolev classes:

W( ^p^L) and^M( ^p^L).

Keywords: nonparametric density estimation, adaptive rates, Sobolev classes

1 Introduction

1.1 Adaptivity

We want to estimate the common probability density

f

:

IR

^!0 +¹) of

n

independent, identically distributed random variables

X

1

::: X

ⁿ, at a real point

x

0. We assume that

f

belongs to a large nonparametric class of functions,

H

=

H

(

p L

), characterized by its smoothness (e.g., order of derivability),

, a norm ^L^p,

p >

1 and a radius

L >

0.

For any estimator

f

^bⁿ of

f

, xed

x

0 real and

q >

1 we consider a sequence

'

ⁿ of positive numbers and dene the

maximal risk

over the class

H

:

R

ⁿ (

f

^bⁿ

'

ⁿ

H

) = sup

f2H

'

^;qⁿ

E

^f^h

f

^bⁿ(

x

0)^;

f

(

x

0) ^qⁱ (1.1) where

E

^f() is the expectation with respect to the distribution

P

^f of

X

1

::: X

ⁿ, when the underlying probability density is

f

.

We say that

'

ⁿ is an

optimal rate of convergence

over the class

H

=

H

(

p L

), if the maximal risk over this class stays positive, for all possible estimators, asymptotically and if there is an estimator whose maximal risk stays nite asymptotically. Minimax theory is concerned with nding the estimators attaining the optimal rates, which are given by minimizing the maximal risk, over all estimators.

We are interested in adaptive estimation, which means that the regularity parameter

is supposed unknown within a given set. An estimator

f

^b is called

optimal rate adaptive

(2)

if, for the optimal rate of convergence over that class,

'

ⁿ , and a constant

C >

0, we have limsup

n!1

sup

2B

R

ⁿ

f

^bⁿ

'

ⁿ

H

C <

¹ (1.2) where

B

is a non-empty set of values.

We shall prove here, that over two dierent classes of probability density functions, to be dened below, commonly denoted by

H

=

H

(

p L

), we can nd no optimal rate adaptive estimator. Similar results were obtained by Lepskii 20], Brown and Low 3] (on Holder classes of functions) and Tsybakov 27] (on ^L2 Sobolev classes), for the Gaussian white noise model. They are characteristic for the pointwise (not global) estimation. We shall then introduce the denition of adaptive rate of convergence, which is a modication by Tsybakov 27] of the denition of Lepskii 20] (see also Lepskii 21] and 22]). We also compute the adaptive rate over the same classes of functions as well as the corresponding rate adaptive estimators.

More precisely, let us dene the considered classes of densities. At rst, we dene for integer

1,

L >

0 and 1

< p <

¹ the class of functions in^L^p

W

(

p L

) =

f

:

IR

^!0 +¹) :

Z

IR

f

= 1

Z

IR

f

^{( )}(

x

) ^p

dx

L

^p

where

f

^{( )}, the derivative of order

of

f

, is supposed to exist.

Secondly, let us introduce for any absolutely integrable function

f

:

IR

^!0 +¹) its Fourier transform ^F(

f

)(

x

) = ^R^I^R

f

(

y

)

e

^;ixy

dy

, for any

x

in

IR

. We dene now for real

>

1^; ¹^p and 2

p <

¹ the class of absolutely integrable functions whose Fourier transforms belong to ^L^p and

M

(

p L

) =

f

:

IR

^!0 +¹) :

Z

IR

f

= 1

Z

IR

jF(

f

)(

x

)^j^p^j

x

^j^p

dx

L

^p

:

From the results of optimal recovery of Donoho and Low 9], it is straightforward to obtain the optimal pointwise rates of convergence over these classes:

'

ⁿ (

W

(

p L

)) =

1

n

;1^=p 2( ^;1^=p)+1

and

'

ⁿ (

M

(

p L

)) =

1

n

+1^=p;1 2( +1^=p)^;1

:

(1.3) In this paper, we prove that no optimal adaptive estimator can be found and we look for the adaptive rates of convergence on previously dened classes

W

(

p L

) and

M

(

p L

), for

belonging to a set

B

^Nⁿ to be dened for each class. We prove that the adaptive rates of convergence are within a factor log

n

slower than the optimal rates.

Remark 1.1

If

p

⁰ is the conjugate of

p

(i.e. 1

=p

+ 1

=p

⁰ = 1), then the optimal rates of convergence (1

:

3) coincide for integer

, on the classes

W

(

p L

) and

M

(

p

⁰

L

) (as well as the adaptive rates (2

:

5) below). Moreover, by a result of Stein and Weiss 24] we have that for integer

1 and 1

< p

2

M

^;

p

⁰

L

W

(

p L

)

:

Thus, parts of our results on a scale of classes can be deduced from the results on the other scale, for certain values of the parameters. Nevertheless, our setups are considerably larger and the classes

W

and

M

are not compatible except in the particular case described above. For these reasons, we prefer the above notation and give independent proofs for both setups.

(3)

1.2 Previous results

The asymptotic study of minimax risks of estimators in the nonparametric framework, was developed considerably since the rst results of Stone 25] and 26], Bretagnolle et Huber 2], Ibragimov et Hasminskii 15] and 16]. Beside the density model, nonparametric regression and Gaussian white noise models were studied. Estimation was done over Holder, Sobolev or Besov classes. For an overview of the results in this area see Korostelev and Tsybakov 19] and Hardle, Kerkyacharian, Picard and Tsybakov 14].

Almost optimal rates of convergence in density pointwise estimation over ^L^p Sobolev classes,

W

(

p L

), were obtained by Wahba 28], where technics of Farrel 11] for the proof of the lower bounds issued a rate of (1

=n

)²⁽ ^;^;¹¹^=s^=s⁾⁺¹, for

s

=

p

+

"

,

" >

0 arbitrary small.

Note that the optimal rate for

W

(

p L

), as noted in (1

:

3), is given by this expression with

"

= 0.

Technics of optimal recovery of Donoho 5], Donoho and Liu 8], Donoho and Low 9]

allow to compute optimal rates of convergence for dierent risks, in dierent setups. In these papers the classes

M

(

p L

) and the corresponding rate in (1

:

3) rst appear.

Lepskii 20], Brown and Low 3] showed that for pointwise estimation on the Holder classes optimal rate adaptive estimators can not be found, both in Gaussian white noise and density models. In the Gaussian white noise model, Lepskii 20] rst considered the problem of nding the adaptive rates. He showed that a loss of logarithmic order is un- avoidable and introduced a procedure providing the adaptive estimator. For a detailed overview of adaptive rates of convergence we refer to Donoho, Johnstone, Kerkyacharian, Picard 6], Hardle, Kerkyacharian, Picard, Tsybakov 14] who give adaptive rates over Besov classes using the wavelet thresholding procedure. Lepski, Mammen and Spokoiny 23], Goldenshluger and Nemirovski 12], Juditsky 17] gave also adaptive rates of convergence using Lepski's scheme of adaptation. Most of these results are obtained for the Gaussian white noise model.

In density estimation, wavelet techniques were used in the minimax adaptive setup for Besov classes and^L^p risk, by Donoho, Johnstone, Kerkyacharian, Picard 7], Kerkyachar- ian, Picard and Tribouley 18] and Juditsky 17]. Sharp results, where the asymptotic value of the maximal risk was found explicitely, were obtained over ^L2 Sobolev classes in

L2 risk by Efromovich 10] and Golubev 13] and pointwise in Butucea 4].

In this paper, we are interested in adaptive rates in pointwise density estimation over

L

p Sobolev classes,

W

(

p L

) and

M

(

p L

).

2 Results

We consider adaptive density estimation problem, at xed real point

x

0, over the classes

H

=

H

(

p L

), when

belongs to the discrete set

B

^Nⁿ =^f

1

:::

^Nⁿ^g.

Assumption (A)

The set

B

^Nⁿ is such that

1

< ::: <

^Nⁿ

<

¹, for a non-decreasing sequence of positive integers

N

ⁿ. From now on, we shall consider two setups. When

H

=

W

(

p L

) the set

B

^Nn contains positive integer values of

(

1 1) and

p >

1, while

H

=

M

(

p L

) implies that

can take real values, (

1

>

1^;1

=p

) and

p

2. Moreover, we suppose that lim

n!1

^Nn =¹ and if ⁿ = min

i=1^::Nn;1^j

ⁱ+1^;

ⁱ^j

(4)

we assume that it satises

limsup

n!1

ⁿ

<

+¹ (2.1)

together with

lim

n!1

ⁿlog

n

^Nn² log log

n

⁼¹

:

(2.2)

The following denition of an adaptive rate of convergence was introduced by Lepski, see Tsybakov 27]. The original denition of adaptive rate of convergence by Lepskii 20]

is not used here since it has a more special form.

Denition 2.1

The sequence

ⁿ is an

adaptive rate of convergence

over the scale of classes ^f

H

²

B

^Nⁿ^g, if

1. There exists an estimator

f

ⁿ, independent of

over

B

^Nⁿ, which is called

rate adaptive estimator

, such that

limsup

n!1

sup

2B

N

n

R

ⁿ (

f

ⁿ

H

)

<

¹ (2.3) 2. If there exist another sequence of positive reals ⁿ and an estimator

f

ⁿ such that

limsup

n!1

sup

2B

Nn

R

ⁿ (

f

ⁿ ⁿ

H

)

<

¹ and, at some

⁰ in

B

^Nn, ⁿ ⁰

n 0

!

n!1

0, then there is another

^{0 0} in

B

^Nn such that

n 0

n 00

!

n!1

+¹

:

Note that condition (2

:

3) introduces a wide class of rates. We choose between those rates by a criterion of uniformity over the set

B

^Nn, expressed in the second part of De- nition 2

:

1. If some other rate satises a condition similar to (2

:

3) and if this rate is faster at some point

⁰ then the loss at some other point

^{0 0} has to be innitely greater for large sample sizes

n

.

Remark 2.2

If an optimal adaptive estimator exists, it is also rate adaptive.

Indeed, an optimal adaptive estimator satises (2

:

3) by denition, for the optimal rate of convergence

ⁿ =

'

ⁿ . We can easily verify that in this case condition 2 in Denition 2

:

1 is redundant, since such a sequence ⁿ can not exist.

In what follows we assign to any

in

B

^Nⁿ the value

e=

^e(

H

) =

^;1

=p

+ 1

=

2, if

H

=

W

+ 1

=p

^;1

=

2, if

H

=

M

^(2.4)

where equalities

H

=

W

and

H

=

M

denote the cases when we consider the scales of classes^f

W

(

p L

)

²

B

^Nn^gor^f

M

(

p L

)

²

B

^Nn^g, respectively. We remark that in both setups:

>

^e 1

=

2.

(5)

Let us dene

B

^;=

B

^Nn ⁿ^f

^Nn^g and

ⁿ =

ⁿ (

H

) =

8

<

:

(log

n=n

)^e^;²¹^e⁼², if

²

B

^;

(1

=n

)^e^;²¹^e⁼², if

=

^Nⁿ

:

(2.5) Then the rate

ⁿ (

H

) is slower than the optimal rate of convergence, except for the last point

^Nⁿ. As by our hypothesis lim

n!1

^Nⁿ =¹, this asymptotic phenomenon is not characteristic and we can use the set

B

^; instead of

B

^Nⁿ.

2.1 The adaptive procedure

Let us proceed to the construction of the estimator

f

ⁿ called adaptive estimator. We start for each

in

B

^Nⁿ with the corresponding kernel estimator

f

ⁿ (

x

0) = 1

nh

ⁿ

n

X

i=1

K

X

ⁱ^;

x

0

h

ⁿ

:

(2.6)

Here the kernel

K

is dened in the next section (dierently for each setup) and the bandwidth is in both problems

h

ⁿ =

log

n n

1

2^e, if

²

B

^; and

h

ⁿ ^Nⁿ =

1

n

1 2^e^Nn

where

^e=

^e(

H

) and

^e^Nn =

^e^Nn^;

H

^Nn in (2

:

4). We shall evaluate the regularity

of the estimated density and replace it into the kernel estimator

f

ⁿ in order to obtain

f

ⁿ, the adaptive estimator, in the spirit of Lepskii 20].

More precisely, let

a >

0 be a suciently large constant and

ⁿ =

a

log

n n

e

;1⁼2 2^e

:

Then, we dene

b=

^b(

H

) = max^f

²

B

^Nn :^j

f

ⁿ (

x

0)^;

f

ⁿ(

x

0)^j

ⁿ ⁸

<

²

B

^Nn^g

:

In the sequel, ^e

(appearing in

ⁿ) is dened as in (2

:

4). Finally,

f

ⁿ(

x

0) =

f

ⁿ^b(

x

0)

:

(2.7)

2.2 Statement of results

Theorem 2.3

In both pointwise density estimation problems described above, we can nd no optimal rate adaptive estimators (see De nition 1

:

2) over the scale of classes

f

H

(

p L

)

²

B

^g, as soon as

B

has at least two dierent elements and

B

^Nn, where

B

^Nⁿ satis es

Assumption (A)

.

Theorem 2.4

The estimator

f

ⁿ(

x

0) of

f

(

x

0), in (2

:

7), is rate adaptive estimator and

ⁿ (

H

) in (2

:

5) is the adaptive rate of convergence in the sense of De nition 2

:

1, over the scale

H

(

p L

)

B

, where the set

B

satis es

Assumption (A)

.

(6)

The proof is organized as follows. In Section 3 we prove that

f

ⁿ(

x

0) in (2

:

7) satises, for a constant

C >

0,

limsup

n!1

sup

2B

Nn

R

ⁿ (

f

ⁿ

H

)

C <

¹

:

(2.8) This result will be called the upper bound. Section 4 is devoted to the proof of the lower bound

liminf

n!1

inf

f

n

sup

2f g

R

ⁿ(

f

ⁿ

H

)

c >

0

where

and

are in

B

^Nn, arbitrary chosen elements such that

<

,

c >

0 and the inmum is taken over all possible estimators

f

ⁿof

f

. These relations, Theorem 2

:

3 (proved in Section 5) and the fact that

ⁿ (

H

) is the adaptive rate of convergence over the set

B

^Nn (see also Section 5) imply Theorem 2

:

4.

3 Upper bounds

We shall prove that the estimator

f

ⁿ, independent of

in

B

^Nn, dened in (2

:

7), is such that the upper bound (2

:

8) holds. Throughout this section,

C

,

c

ⁱ and

C

ⁱ,

i

= 1 2

:::

, denote positive constants, depending possibly on xed

q

,

1 and

L

.

3.1 Auxiliary results

Denition 3.1

Let the density

f

belong to the class

H

=

H

(

p L

). De ne for any kernel estimator

f

ⁿ of

f

(see(2

:

6)), with

,

in

B

^Nⁿ such that

its bias term

B

ⁿ =

B

ⁿ(

x

0

H

) =^j

E

^f

f

ⁿ(

x

0)]^;

f

(

x

0)^j and its stochastic term

Z

ⁿ =

Z

ⁿ(

x

0

H

) =^j

f

ⁿ(

x

0)^;

E

^f

f

ⁿ(

x

0)]^j

:

Besov, Il'in and Nikol'skii 1], Theorem 15.1 implies the following:

Lemma 3.2

Let

,

be integers and 0

<

, 1

p

0

p

1

p

¹,

>

1

=p

. If there exists

²(

=

1) such that

p

10 ^;

= (1^;

) 1

p

1 +

¹

p

^;

(3.1) then any function

f

²^L ₁ (

IR

) with

f

^{( )}^p

<

¹ satis es

f

⁽⁾

p0

C

^k

f

^k¹^p^;₁

f

^{( )}

p

where

C

is a constant that depends only on

p

0,

p

1,

p

,

.

(7)

Lemma 3.3

There exists a nite constant depending on

L

,

and

p

only such that sup

f2H( ^pL)^k

f

^k¹

:

Proof.

For

f

²

W

(

p L

), we apply the previous result with

= 0,

p

0 =¹,

p

1 = 1.

Then (3

:

1) takes the form

0 = (1^;

) +

¹

p

^;

which implies

= 1

=

(

+ 1^;1

=p

). Then

²(

=

1) if

>

1

=p

which holds by hypothesis. Thus, we apply the previous result, Lemma 3

:

2, and get

k

f

^k¹

C

^k

f

^k¹₁^;

f

^{( )}

p

CL

for all

f

in

W

(

p L

).

If

f

²

M

(

p L

), then ^{k F}(

f

)^k¹1 since

f

is a density. We have

k

f

^k¹ ₂¹

Z

IR

jF(

f

)(

y

)^j

dy

= 12

Z

IR

jF(

f

)(

y

)^j1 +^j

y

^j

dy

1 +^j

y

^j

21

Z

IR

jF(

f

)(

y

)^j^p1 +^j

y

^j ^p

dy

¹^=p

0

B

@ Z

IR

dy

1 +^j

y

^j ^p⁰

1

C

A

1^=p⁰

where 1

=p

+ 1

=p

⁰ = 1. This is less than a constant (

L p

)

>

0, for

f

in the class

M

(

p L

). ²

Lemma 3.4

If

f

²

H

(

p L

) and

is in

B

^Nⁿ such that

<

, then

f

²

H

(

p L

⁰), where

L

⁰

>

0 depends only upon

L

and

p

.

Proof.

For classes

W

(

p L

) put

p

0 =

p

,

p

1 = 1 in the auxiliary Lemma 3

:

2. Then (3

:

1) takes form

1

p

^;

=1^;

^e+

^e

1

p

^;

which gives

^e= (

+ 1^;1

=p

)

=

(

+ 1^;1

=p

) and thus

^e²(

=

1) if

>

1

=p

(true, by hypothesis). By Lemma 3

:

2 we get

f

⁽⁾

p

C

e^k

f

^k¹₁^;^e

f

^{( )}^e

p

CL

e ^e

for all

f

in

W

(

p L

).

For

f

²

M

(

p L

), as

p >

1 and ^kF(

f

)^k¹1, we write

Z

IR

jF(

f

)(

y

)^j^p^j

y

^j^p

dy

^Z

jyj1^{j F}(

f

)(

y

)^j^p

dy

+

Z

jyj>1^jF(

f

)(

y

)^j^p^j

y

^j^p

dy

1 +

L

^p

:

(8)

Lemma 3.5

If

and

are in

B

^Nⁿ such that

and if

f

belongs to

H

=

H

(

p L

) then there exists

b

(

H

)

>

0 (given in the proof and depending also on

L

and

p

), such that

B

ⁿ(

x

0

H

)

b

(

H

)

h

ⁿ^;¹^=p, if

H

=

W

(

p L

),

B

ⁿ(

x

0

H

)

b

(

H

)

h

ⁿ^;¹⁺¹^=p, if

H

=

M

(

p L

), and

E

^f

Z

ⁿ(

x

0

H

)]² ^k

K

^k²₂

nh

ⁿ ^Def⁼

s

²ⁿ. (3.2)

Moreover, for the kernels ^f

K

²

B

^Nⁿ^g used in the proof, we can nd constants

K

max,

k

min,

k

max and

b

max depending possibly on xed

p

and

1, such that

k

K

^k¹

K

max

k

min^k

K

^k₂

k

max

for all

in

B

^Nn and

b

(

H

)

b

max for all

and

in

B

^Nn such that

.

Remark 3.6

From now on, ^e

= ^e

(

H

) is obtained as in (2

:

4). Then Lemma 3

:

5 says that

B

ⁿ(

x

0

H

)

b

h

ⁿ^e⁽^H ⁾^;¹⁼².

Proof.

If

H

=

W

=

W

(

p L

), let us introduce a kernel

K

of order

, in the expression of the kernel estimator (2

:

6). Such a kernel must be bounded uniformly in

(^k

K

^k¹

K

max, for all

in

B

^Nn), absolutely integrable, with a bounded ^L2 norm (

k

min ^k

K

^k₂

k

max, for all

in

B

^Nⁿ), such that^R^I^R

K

(

y

)

dy

= 1, ^R^I^R

y

^j

K

(

y

)

dy

= 0 for

j

= 1

:::

^;1 and

Z

IR

j

K

(

y

)^j^j

y

^j^;¹^=p

dy

L

0

<

¹ (3.3) where

L

0 depends only on xed

p

and

1. It is not dicult to nd examples of such kernels. For example, the kernel

K

having Fourier transform ^F(

K

)(

u

) = 1

=

(1 +^j

u

^j^p) satisfy these conditions and the proofs are given later on.

From now on we denote^R =^R^I^R. Then the bias can be bounded as follows

B

ⁿ(

x W

) =

Z

K

(

y

)

f

(

x

+

yh

ⁿ)^;

f

(

x

)]

dy

Z

K

(

y

)^;^X¹

j=1

(

yh

ⁿ)^j

j

!

f

⁽^j⁾(

x

)

dy

+

Z

K

(

y

)

Z

x+^yhⁿ

x

(

x

+

yh

ⁿ^;

u

)^;¹

(

^;1)!

f

⁽⁾(

u

)

dudy

;1

X

j=1

h

^jⁿ

j

!

f

⁽^j⁾(

x

)

Z

y

^j

K

(

y

)

dy

+ +

Z

j

K

(

y

)^j

f

⁽⁾

p

j

yh

ⁿ^j^;¹^=p

(

^;1)!((

^;1)

p

⁰+ 1)¹^=p⁰

dy

(9)

where the rst term is zero by hypotheses on the kernel and we applied the Holder inequality with 1

=p

+ 1

=p

⁰ = 1 for the second term. This gives

B

ⁿ(

x W

)

L

⁰

(

^;1)!

h

^;ⁿ¹^=p ((

^;1)

p

⁰+ 1)¹^=p⁰

Z

j

K

(

y

)^j^j

y

^j^;¹^=p

dy

b

(

W

)

h

^eⁿ⁽^W ⁾^;¹⁼² where

b

(

W

) =

L

⁰ (

^;1)!

R

j

K

(

y

)^j^j

y

^j^;¹^=p

dy

((

^;1)

p

⁰+ 1)¹^=p⁰

:

We can also see that

b

(

W

)

b

max,

b

max depending only on

p

,

L

and

1, for all

and

in

B

^Nn,

.

If

H

=

M

=

M

(

p L

), let us choose the kernel

K

dened by its Fourier transform as follows

F(

K

)(

u

) = 1 1 +^j

u

^j^p

:

This kernel has, by Plancherel's formula:

k

K

^k₂ = 1^p2

^{k F}⁽

K

)^k₂ = 1^p2

Z

du

(1 +^j

u

^j^p)²

1

p2

Z

juj1

du

1 +^j

u

^j^p ¹² ⁼

k

min(

p

1) also

k

K

^k₂ 1 + 1^p2

Z

juj>1

du

1 +^j

u

^j^p ¹² ⁼

k

max(

p

1) and

k

K

^k¹ ₂¹

Z

j F(

K

)(

u

)^j

du

_{1 + 12}

Z

juj>1

du

1 +^j

u

^j^p ¹ ⁼

K

max(

p

1) since

p

1

>

1, in our setting. Then the bias is

B

ⁿ(

x M

) = ^Z 1

h

ⁿ

K

y

^;

x h

ⁿ

f

(

y

)

dy

^;

f

(

x

)

= 12

Z

F(

f

)(

y

)

e

^ixy^F(

K

)(

h

ⁿ

y

)^;1]

dy

21

Z

jF(

f

)(

y

)^j ^j

h

ⁿ

y

^j^p 1 +^j

h

ⁿ

y

^j^p

dy:

(10)

Then we apply Holder's inequality for 1

=p

+ 1

=p

⁰ = 1 as follows

B

ⁿ(

x M

)

h

ⁿ

2

Z

j F(

f

)(

y

)^j^j

y

^j ^j

h

ⁿ

y

^j⁽^p;¹⁾ 1 +^j

h

ⁿ

y

^j^p

dy

L

⁰

h

^;ⁿ¹^=p⁰ 2

Z

j

y

^j^p (1 +^j

y

^j^p)^p⁰

dy

!1^=p⁰

=

b

(

M

)

h

^eⁿ⁽^M⁾^;¹⁼² where

L

⁰ is the constant from Lemma 3

:

4 and

b

(

M

) =

L

⁰ 2

Z

j

y

^j^p (1 +^j

y

^j^p)^p⁰

dy

!1^=p⁰

:

This term is bounded as follows

b

(

M

)

L

⁰ 2

0

B

@ Z

jyj1

dy

1 +^j

y

^j^p ¹^p⁰ ⁺

Z

jyj>1

dy

j

y

^j^p⁰ ¹

1

C

A

1^=p⁰

=

b

max(

p L

1)

:

Let us check at last that condition (3

:

3) is fullled:

Z

IR

j

K

(

y

)^j^j

y

^j^;¹^=p

dy

^Z

jyj1^j

K

(

y

)^j

dy

+^Z

jyj>1^j

K

(

y

)^j^j

y

^j^;¹^=p

dy

K

max+

Z

jyj>1^j

K

(

y

)^j²^j

y

^j²

dy

!1⁼2^Z

jyj>1^j

y

^j^;²^=p

dy

!1⁼2

L

0(

p

1)

:

For the variance term, we write, using Lemma 3

:

3

E

^f

Z

ⁿ(

x H

)]² 1

nh

ⁿ

Z 1

h

ⁿ

K

²

y

^;

x h

ⁿ

f

(

x

)

dx

^k

K

^k²₂

nh

ⁿ

:

2

Let us recall the following inequalities (see e.g. Hardle, Kerkyacharian, Picard, Tsy- bakov 14]).

Lemma 3.7 Rosenthal's inequality

^{: Let}

q

2 and

Y

1

::: Y

ⁿ be independent random variables such that

E Y

ⁱ] = 0,

E

^j

Y

ⁱ^j^q]

<

¹. Then there exists

C

(

q

) a constant depending on

q

such that

E

"

n

X

i=1

Y

ⁱ

q

#

C

(

q

)

8

<

: n

X

i=1

E

^j

Y

ⁱ^j^q] +

n

X

i=1

E

Y

ⁱ²

!

q=2⁹=

:

Bernstein's inequality

: Let

Y

1

::: Y

ⁿ be i.i.d. random variables such that ^j

Y

ⁱ^j

M

,

E Y

ⁱ] = 0 and denote

b

²ⁿ=^Pⁿⁱ₌₁

E

Y

ⁱ². Then for any

>

0,

P

"

n

X

i=1

Y

ⁱ

#

2exp

;

² 2(

b

²ⁿ+

M=

3)

:

(11)

Lemma 3.8

If

f

belongs to

H

=

H

(

p L

) and

<

, if

K

is the kernel function and

Z

ⁿ(

x

0

H

) = 1

nh

ⁿ

n

X

i=1

K

X

ⁱ^;

x

0

h

ⁿ

;

E

^f

K

X

ⁱ^;

x

0

h

ⁿ

then for any

u >

0

P

^f

Z

ⁿ(

x

0

H

)

u

]2exp

;

u

² 2

s

²ⁿ(1 +

c

0

u

)

where

c

0

>

0 does not depend on

.

Proof.

Indeed, we can apply Bernstein's inequality for

=

nu

and the i.i.d., centered variables

Y

ⁱ = 1

h

ⁿ

K

X

ⁱ^;

x

0

h

ⁿ

;

E

^f

K

X

ⁱ^;

x

0

h

ⁿ

bounded as follows: ^j

Y

ⁱ^j 2^k

K

^k¹

=h

ⁿ. Then

b

²ⁿ

s

²ⁿ = ^k

K

^k²₂

=

(

nh

ⁿ) by (3

:

2) and, by Lemma 3

:

5, 2^k

K

^k¹

=

^k

K

^k²₂2

K

max

=

^;

k

_2min=

c

0, which does not depend on

.

2

Remark 3.9

For

q >

1, we can nd a constant

c

(

q

)

>

0 such that the stochastic term of the kernel estimator satis es

E

^f

Z

ⁿ(

x

0

H

)]^q

c

(

q

)

s

^qⁿ where we denoted

s

²ⁿ =^k

K

^k²₂

=

(

nh

ⁿ).

Indeed, for

q >

2, we apply Rosenthal's inequality to the previous centered variables

Y

ⁱ, bounded as follows: ^j

Y

ⁱ^j2^k

K

^k¹

=h

ⁿ. Then we can nd a constant depending on

q

,

c

⁰(

q

), such that

E

^f

"

n

1

n

X

i=1

Y

ⁱ

#

q

c

⁰(

q

)

(

2^k

K

^k¹

nh

ⁿ

q;2 1

nE

^f

Y

₂₁+

1

nE

^f

Y

₂₁^q=²

)

and this leads to our result for some constant

c

(

q

), because of the inequality (3

:

2). We can easily deduce this result by standard convexity inequalities, for 1

< q

2, from (3

:

2).

Let us introduce the sequence

ⁿ² =

C

qs

²ⁿ₂¹

e

^; ₂¹

^e

log

n

where

<

are in

B

^Nn,

C

>

0 and^e

and

^eare dened by (2

:

4).

(12)

Lemma 3.10

1. If the set

B

^Nn satis es conditions (2

:

1) and (2

:

2), then log

n

e^N ^n!1^! ¹^, ^log

^e^N

s

e^N

log

n

^n!1^! 0 and log 1ⁿ

s

e^N

log

n

^n!1^! ⁰ where

^e^N =

^e^Nn is de ned by the transformation (2

:

4).

2. If

,

are in

B

^Nⁿ such that

<

then there exist constants

C

1,

C

2 depending only on previously xed constants such that

sup

f2H

B

ⁿ^q (

x

0

H

) +

s

^qⁿ

C

1

sup

f2H

B

ⁿ^q (

x

0

H

) +

ⁿ^q

^qⁿ

C

2^plog

n

1

n

;12 12^e^;1 2^e

: Proof.

1. The limits are easy consequences of hypotheses (2

:

1) and (2

:

2).

2. By Lemma 3

:

5, there exist

b

max and

k

max not depending on

, such that

b

max

and ^k

K

^k₂

k

max, for any

in

B

^Nⁿ. Thus, for

²

B

^; and

<

:

B

ⁿ (

x

0

H

)

ⁿ

b

max,

s

ⁿ

k

max

s log

n

and

B

ⁿ(

x

0

H

)

ⁿ

b

max

log

n n

;12 12^e^;1 2^e

:

Finally,

ⁿ

qC

a

^e1

q

e^N

log

n n

;12 12^e^;1 2^e

:

Because

^e^N

=

log

n

^!0 when

n

^!¹ we get the lemma for

²

B

^;. For the case

=

^Nn, denoted

^N:

B

ⁿ ^N(

x

0

H

)

ⁿ ^N

b

max,

s

ⁿ ^N

k

max

p

:

Moreover,

B

ⁿ(

x

0

H

)

ⁿ ^N

b

max^plog

n

¹

;1212^e^; 1 2^e^N

ⁿ

ⁿ ^N ²

qC

a

s

e^Nn

log

n

1

n

;12 12^e^; 1 2^e^N

:

2