• Keine Ergebnisse gefunden

Optimal Confidence Bands for Shape-Restricted Curves

N/A
N/A
Protected

Academic year: 2022

Aktie "Optimal Confidence Bands for Shape-Restricted Curves"

Copied!
28
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

https://doi.org/10.7892/boris.70416 | downloaded: 1.2.2022

Optimal Confidence Bands for Shape-Restricted Curves

Lutz D¨umbgen

Department of Mathematical Statistics and Actuarial Science University of Bern

Sidlerstrasse 5, CH-3012 Bern, Switzerland

E-mail: duembgen@stat.unibe.ch, URL: www.imsv.unibe.ch August 2002, updated December 2013

This paper has been published inBernoulli9, 423–449 (2003).

The present version corrects two typos in the published version and contains updated references.

Abstract

LetY be a stochastic process on[0,1]satisfyingdY(t) =n1/2f(t)dt+dW(t), wheren≥1 is a given scale parameter (“sample size”),W is standard Brownian motion andf is an unknown function. Utilizing suitable multiscale tests we construct confidence bands forf with guaranteed given coverage probability, assuming that f is isotonic or convex. These confidence bands are computationally feasible and shown to be asymptotically sharp optimal in an appropriate sense.

Running title.Confidence Bands for Shape-Restricted Curves

Keywords and phrases.adaptivity, concave, convex, isotonic, kernel estimator, local smoothness, minimax bounds, multiscale testing

(2)

1 Introduction

Nonparametric statistical models often involve some unknown functionfdefined on a real interval J. For instancef might be the probability density of some distribution or a regression function.

Nonparametric point estimators for such a curvef are abundant. The available methods are based on kernels, splines, local polynomials, or orthogonal series, including wavelets; see Hart (1997) and references cited therein. In order to quantify the precision of estimation, one often wants to replace a point estimator with a confidence band(ˆ`,u)ˆ forf. The latter consists of two functions

`ˆ= ˆ`(·,data)anduˆ = ˆu(·,data)onJ with values in[−∞,∞]such that, hopefully,`ˆ≤ f ≤uˆ pointwise. More precisely, one is aiming at a confidence band such that

(1) IP{`ˆ≤f ≤u} ≥ˆ 1−α

for a given levelα∈]0,1[, while`ˆanduˆshould be as close to each other as possible.

Unfortunately, curve estimation is an ill-posed problem, and usually there are no nontrivial bands(ˆ`,u)ˆ satisfying (1) for arbitraryf; see Donoho (1988). Therefore one has to impose some additional restrictions on f. One possibility are smoothness constraints on f, for instance an upper bound on a certain derivative off. Under such restrictions, (1) can be achieved approxi- mately for large sample sizes; see for example Bickel and Rosenblatt (1973), Knafl et al. (1985), Hall and Titterington (1988), H¨ardle and Marron (1991), Eubank and Speckman (1993), Fan and Zhang (2000), and the references cited therein.

A problem with the aforementioned methods is that smoothness constraints are hard to justify in practical situations. More precisely, even if the underlying curve f is infinitely often differ- entiable, the actual coverage probabilities of the confidence bands mentioned above depend on quantitative properties of certain derivatives off which are difficult to obtain from the data.

In many applications qualitative assumptions about f such as monotonicity, unimodality or concavity/convexity are plausible. One example are growth curves in medicine, e.g. wheref(x)is the mean body height of newborns at agex. Here isotonicity off is a plausible assumption. An- other example are so-called Engel curves in econometrics, wheref(x)is the mean expenditure for certain consumer goods of households with annual incomex. Here one expectsf to be isotonic and sometimes concave as well. Under such qualitative assumptions it is possible to construct (1−α)–confidence sets for f based on certain goodness-of-fit tests without relying on asymp- totic arguments. Examples for such procedures can be found in Davies (1995), Hengartner and Stark (1995) and D¨umbgen (1998). In particular, these papers present confidence bands(ˆ`,u)ˆ for

(3)

f such that

(2) IP{`ˆ≤f ≤u} ≥ˆ 1−α wheneverf ∈ F.

HereF denotes the specified class of functions. Given a suitable distance measure D(·,·) for functions, the goal is to find a band(ˆ`,u)ˆ satisfying (2) such that eitherD(ˆu,`)ˆ orD(ˆ`, f) and D(ˆu, f) are as small as possible. The phrase “as small as possible” can be interpreted in the sense of optimal rates of convergence to zero as the sample sizentends to infinity. The papers of Hengartner and Stark (1995) and D¨umbgen (1998) contain such optimality results.

In the present paper we investigate optimality of confidence bands in more detail. In addi- tion to optimal rates of convergence we obtain optimal constants and discuss the impact of local smoothness properties off. Compared to the general confidence sets of D¨umbgen (1998), the methods developed here are more stringent and computationally simpler. They are based on mul- tiscale tests as developed by D¨umbgen and Spokoiny (2001), who considered tests of qualitative assumptions rather than confidence bands. For further results on testing in nonparametric curve estimation see Hart (1997), Fan et al. (2001), and the references cited there.

2 Basic setting and overview

For mathematical convenience we focus on a continuous white noise model: Suppose that one observes a stochastic processY on the unit interval[0,1], where

Y(t) = n1/2 Z t

0

f(x)dx+W(t).

Heref is an unknown function inL2[0,1],n≥1is a given scale parameter (“sample size”), and W is standard Brownian motion. In this context the bounding functions`,ˆuˆare defined on[0,1], but for notational convenience the functionf is tacitly assumed to be defined on the whole real line with values in[−∞,∞]. From now on we assume that

f ∈ G ∩L2[0,1], whereGdenotes one of the following two function classes:

G :=

n

non-decreasing functionsg:R→[−∞,∞]o , Gconv := n

convex functionsg:R→]−∞,∞]o .

The paper is organized as follows. In Section 3 we treat the caseG = G and measure the quality of a confidence band(ˆ`,u)ˆ by quantities related to the Levy distancedL(ˆ`,u). Generally,ˆ

dL(g, h) := inf n

>0 :g≤h(·+) +andh≤g(·+) +on[0,1−] o

(4)

for isotonic functionsg, h: [0,1]→[−∞,∞]. It turns out that a confidence band which is based on a suitable multiscale test as introduced by D¨umbgen and Spokoiny (2001) is asymptotically optimal in a strong sense. Throughout this paper asymptotic statements refer ton → ∞, unless stated otherwise.

In Section 4 we treat both classesGandGconv simultaneously. We discuss the construction of confidence bands(ˆ`,u)ˆ satisfying (2) such thatD(ˆ`, f) andD(f,u)ˆ are as small as possible wheneverf satisfies some additional smoothness constraints. HereD(g, h)is a distance measure of the form

D(g, h) := sup

x∈[0,1]

w(x, f)(h(x)−g(x))

for some weight functionw(·, f) ≥0reflecting local smoothness properties off. Again it turns out that suitable multiscale procedures yield nearly optimal procedures without additional prior information onf.

In Section 5 we present some numerical examples for the procedures of Section 4. The proofs are deferred to Sections 6, 7 and 8. In particular, Section 7 contains a new minimax bound for confidence rectangles in a gaussian shift model, which may be of independent interest.

As for the white noise model, the results of Brown and Low (1996), Nussbaum (1996) and Grama and Nussbaum (1998) on asymptotic equivalence can be used to transfer the lower bounds of the present paper to other models. Moreover, one can mimick the confidence bands developed here in traditional regression models under minimal assumptions; see D¨umbgen and Johns (2004) and D¨umbgen (2007).

3 Optimality for isotonic functions in terms of L´evy type distances

In this section we consider the classG. For isotonic functionsg, h: [0,1]→[−∞,∞]and >0 let

D(g, h) := inf n

λ≥0 :g≤h(·+) +λandh≤g(·+) +λon[0,1−] o

.

Then the L´evy distancedL(g, h)is the infimum of all >0such thatD(g, h)≤. We use these functionalsD(·,·)in order to quantify differences between isotonic functions. Figure 1 depicts one such functiong, and the shaded areas represent the set of all functionshwithD0.05(g, h) ≤ 0.1andD0.05(g, h)≤0.025, respectively.

The next theorem provides lower bounds forD(ˆ`,u),ˆ 0 < ≤ 1. Here and throughout the sequel the dependence of probabilities, expectations and distributions on the functional parameter

(5)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.4

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 1: TwoD0.05(·,·)–neighborhoods of some functiong.

f is sometimes indicated by a subscriptf.

Theorem 3.1. There exists a universal functionbon]0,1]withlim↓0b() = 0such that

f∈Ginf∩L2[0,1] IPf (

`ˆ≤f ≤uˆandD(ˆ`,u)ˆ < (8 log(e/))1/2−b() (n)1/2

)

≤ b()

for any confidence band(ˆ`,u)ˆ and arbitrary∈]0,1].

Theorem 3.1 entails a lower bound fordL(ˆ`,u). For letˆ =n:= c(log(n)/n)1/3−δn−1/3 with any fixedc, δ >0. Then one can show that for sufficiently largen,

(8 log(e/))1/2−b()

(n)1/2 =

8 3c

1/2logn n

1/3

+o(n−1/3) ≥ , provided thatcequals(8/3)1/3 ≈1.387.

Corollary 3.2. For eachn≥1there exists a universal constantβnsuch thatβn→0and inf

f∈G∩L2[0,1] IPf

`ˆ≤f ≤uˆanddL(ˆ`,u)ˆ <8 3

1/3logn n

1/3

−βnn−1/3

≤ βn

for any confidence band(ˆ`,u).ˆ

It is possible to get close to these lower bounds forD(ˆ`,u)ˆ simultaneouslyfor all∈ ]0,1]

(6)

while (2) is satisfied. For letκαbe a real number such that IP

|W(t)−W(s)|

(t−s)1/2 ≤Γ(t−s) +καfor0≤s < t≤1

≤ α, where

Γ(u) := (2 log(e/u))1/2 for0< u≤1.

The existence of such a critical value κα follows from D¨umbgen and Spokoiny (2001, Theo- rem 2.1). With the local averages

Ff(s, t) := 1 t−s

Z t s

f(x)dx off and their natural estimators

Fˆ(s, t) := Y(t)−Y(s) n1/2(t−s) it follows that

IPf

Fˆ(s, t)−Ff(s, t)

≤ Γ(t−s) +κα

(n(t−s))1/2 for0≤s < t≤1

≥ 1−α.

But for0≤s < t≤1,

f(s) ≤ Ff(s, t) ≤ f(t) wheneverf ∈ G. This implies the first assertion of the following theorem.

Theorem 3.3. With the critical valueκαabove let

`(x)ˆ := sup

0≤s<t≤x

Fˆ(s, t)−Γ(t−s) +κα pn(t−s)

, ˆ

u(x) := inf

x≤s<t≤1

Fˆ(s, t) +Γ(t−s) +κα

pn(t−s)

.

This defines a confidence band(ˆ`,u)ˆ forf satisfying (2) withF = G ∩L2[0,1]. Moreover, in case of`ˆ≤u,ˆ

D(ˆ`,u)ˆ ≤ (8 log(e/))1/2+ 2κα

(n)1/2 for0< ≤1, dL(ˆ`,u)ˆ ≤ 8

3

1/3logn n

1/3

+o(n−1/3).

Proof.The preceding upper bound forD(ˆ`,u)ˆ follows from the fact that for anyx∈[0,1−], ˆ

u(x)−`(xˆ +) ≤

Fˆ(x, x+) +Γ() +κα

(n)1/2

Fˆ(x, x+)−Γ() +κα

(n)1/2

= 2Γ() + 2κα (n)1/2

= (8 log(e/))1/2+ 2κα (n)1/2 .

Letting=n= (8/3)1/3(log(n)/n)1/3yields the upper bound fordL(ˆ`,u).ˆ

(7)

4 Bands for potentially smooth functions

A possible criticism of the preceding results is the fact that the minimax bounds are attained at special step functions. On the other hand one often expects the underlying curve f to be smooth in some vague sense. Therefore we aim now at confidence bands satisfying (2) with F =G ∩L2[0,1], which are as small as possible wheneverfsatisfies some additional smoothness conditions. ThroughoutGstands forGorGconv.

In the sequel let hg, hi := R

−∞g(x)h(x)dxandkgk := hg, gi1/2 for measurable functions g, hon the real line such that these integrals are defined. The confidence bands to be presented here can be described either in terms of kernel estimators forf or in terms of tests. Both viewpoints have their own merits.

4.1 Kernel estimators forf

Letψ be some kernel function inL2(R). For technical reasons we assume that ψ satisfies the following three regularity conditions:

(3)

ψhas bounded total variation;

ψis supported by[−a, b], wherea, b≥0;

h1, ψi > 0.

For any bandwidthh >0and location parametert∈Rlet ψh,t(x) := ψx−t

h

.

Thenhg, ψh,ti=hhg(t+h·), ψiandkψh,tk=h1/2kψk. A kernel estimator forf(t)with kernel functionψand bandwidthhis given by

h(t) := ψY(h, t) n1/2hh1, ψi, where

ψY(h, t) :=

Z 1 0

ψh,t(x)dY(x).

From now on suppose thatah≤t≤1−bh. Thenψh,tis supported by[0,1]and one may write IE ˆfh(t) = hf, ψt,hi

hh1, ψi = hf(t+h·), ψi h1, ψi , Var( ˆfh(t)) = kψt,hk2i

nh2h1, ψi2 = kψk2 nhh1, ψi2.

(8)

The random fluctuations of these kernel estimators can be bounded uniformly inh > 0. For that purpose we define the multiscale statistic

T(±ψ) := sup

h>0

sup

t∈[ah,1−bh]

±ψW(h, t)

h1/2kψk −Γ((a+b)h)

= sup

h>0

sup

t∈[ah,1−bh]

±fˆh(t)−IE ˆfh(t)

Var( ˆfh(t))1/2 −Γ((a+b)h) ,

similarly as in D¨umbgen and Spokoiny (2001). It follows from Theorem 2.1 in the latter paper, that 0 ≤T(±ψ) <∞almost surely. In particular,|fˆh(t)−IE ˆfh(t)| ≤ (nh)−1/2log(e/h)1/2Op(1), uniformly inh >0andah≤t≤1−bh.

It is well-known that kernel estimators are biased in general. But our shape restrictions may be used to construct two kernel estimators whose bias is always non-positive or non-negative, respectively. Precisely, letψ(`) andψ(u) be two kernel functions satisfying (3) with respective supports[−a(`), b(`)]and[−a(u), b(u)]. In addition suppose that

hg, ψ(`)i ≤ g(0)h1, ψ(`)i for allg∈ G ∩L2[−a(`), b(`)], (4)

hg, ψ(u)i ≥ g(0)h1, ψ(u)i for allg∈ G ∩L2[−a(u), b(u)].

(5)

These inequalities imply that the corresponding kernel estimators satisfy the inequalitiesIE ˆfh(`)(t)≤ f(t)≤IE ˆfh(u)(t), and the definition ofT(±ψ)yields that

f(t) ≥ fˆh(`)(t)−

(`)k

Γ(d(`)h) +T(ψ(`)) h1, ψ(`)i(nh)1/2 , (6)

f(t) ≤ fˆh(u)(t) +

(u)k

Γ(d(u)h) +T(−ψ(u)) h1, ψ(u)i(nh)1/2 . (7)

Hered(z) := a(z) +b(z). Now letκα be the(1−α)–quantile of the combined statistic T :=

max

T(ψ(`)), T(−ψ(u))

, i.e. the smallest real number such thatIP{T ≤κα} ≥1−α. Then

`(t)ˆ := sup

h>0 :t∈[a(`)h,1−b(`)h]

h(`)(t)−kψ(`)k(Γ(d(`)h) +κα) h1, ψ(`)i(nh)1/2

! ,

ˆ

u(t) := inf

h>0 :t∈[a(u)h,1−b(u)h]

h(u)(t) +kψ(u)k(Γ(d(u)h) +κα) h1, ψ(u)i(nh)1/2

!

defines a confidence band(ˆ`,u)ˆ forf satisfying (2).

Equality holds in (2) ifG = G andf is constant, or ifG = Gconv andf is linear, provided that κα > 0. For then it follows from (4) and (5) with g(x) = ±1 or g(x) = ±x that the kernel estimators are unbiased. Thus`ˆ≤ f ≤ uˆ is equivalent to T > κα. Moreover, using

(9)

general theory for gaussian measures on Banach spaces one can show that the distribution ofT is continuous on]0,∞[.

Sufficient conditions for requirements (4) and (5) in general are provided by Lemma 8.1 in Section 8. The confidence band presented in Section 3 is a special case of the one derived here, if we defineψ(`)(x) := 1{x ∈[−1,0]}andψ(u)(x) := 1{x ∈[0,1]}and apply postprocessing as described below.

4.2 Postprocessing of confidence bands

Any confidence band(ˆ`,u)ˆ forf can be enhanced, if we replace`(x)ˆ andu(x)ˆ with ˆˆ

`(x) := infn

g(x) :g∈ G,`ˆ≤g≤uˆo

and u(x) := supˆˆ n

g(x) :g∈ G,`ˆ≤g≤uˆo , respectively. Here we assume tacitly that the set{g∈ G: ˆ`≤g≤u}ˆ is nonempty.

In case ofG=Gone can easily show that ˆˆ

`(x) = sup

t∈[0,x]

`(t)ˆ and u(x) =ˆˆ inf

s∈[x,1] u(s).ˆ

Note also that`ˆˆanduˆˆare isotonic, whereas the raw functions`ˆanduˆneed not be.

In case ofG =Gconvthe modified upper bounduˆˆis the greatest convex minorant ofuˆand can be computed (in discrete models) by means of the pool-adjacent-violators algorithm (cf. Robertson et al. 1988). The modified lower bound`(x)ˆˆ can be shown to be

ˆˆ

`(x) = max (

sup

0≤s<t≤x

u(s) +ˆˆ `(t)ˆ −u(s)ˆˆ

t−s (x−s) , sup

x≤s<t≤1

u(t)ˆˆ − u(t)ˆˆ −`(s)ˆ

t−s (t−x) )

.

This improved bound`ˆˆis not a convex function, though more regular than the raw function `.ˆ Figure 2 depicts some hypothetical confidence band(ˆ`,u)ˆ for a function f ∈ Gconv and its im- provement(`,ˆˆu).ˆˆ

4.3 Adaptivity in terms of rates

Whenever we construct a band following the recipe above we end up with a confidence band adapting to the unknown smoothness off in terms of rates of convergence. For β, L > 0 the H¨older smoothness classHβ,Lis defined as follows: In case of0< β≤1let

Hβ,L :=

n

g:|g(x)−g(y)| ≤L|x−y|βfor allx, y o

.

(10)

Figure 2: Improvement(`,ˆˆu)ˆˆ of a band(ˆ`,u)ˆ ifG =Gconv. In case of1< β≤2let

Hβ,L := n

g∈ C1 :g0 ∈ Hβ−1,Lo .

Theorem 4.1. Suppose thatf ∈ G ∩ Hβ,L, where eitherG =G andβ ≤1, orG =Gconv and 1≤β≤2. Let(ˆ`,u)ˆ be the confidence band forf based on test functionsψ(`), ψ(u)as described previously. Then there exists a constant∆depending only on(β, L)and(ψ(`), ψ(u))such that

sup

t∈[n,1−n]

ˆ

u(t)−`(t)ˆ

≤ ∆ρn

1 +κα+T(ψ(u)) +T(−ψ(`)) log(en)1/2

, wheren:=ρ1/βn and

ρn :=

log(en) n

β/(2β+1)

.

Using the same arguments as Khas’minskii (1978) one can show that for any0≤r < s≤1,

f∈G∩Hinfβ,L IPf (

sup

t∈[r,s]

(ˆu(t)−`(t))ˆ ≤∆ρn )

→ 0,

provided that ∆ > 0 is sufficiently small. Thus our confidence bands adapt to the unknown smoothness off.

4.4 Testing hypotheses aboutf(t)

In order to find suitable kernel functionsψ(`), ψ(u)we proceed similarly as D¨umbgen and Spokoiny (2001, Section 3.2). That means we consider temporarily tests of the null hypothesis

Fo :=

n

f ∈ G ∩L2[0,1] :f(t)≤r−δ o

(11)

versus the alternative hypothesis

FA := n

f ∈ G ∩ Hk,L :f(t)≥ro . Heret∈[0,1],r ∈RandL, δ >0are arbitrary fixed numbers, while (8) (G, k) = (G,1) or (G, k) = (Gconv,2).

Note that Fo andFA are closed, convex subsets of L2[0,1]. Suppose that there are functions fo∈ FoandfA∈ FAsuch that

Z 1 0

(fo−fA)(x)2dx = min

go∈Fo, gA∈FA

Z 1 0

(go−gA)(x)2dx.

Then optimal tests ofFo versusFAare based on the linear test statisticR1

0(fA−fo)dY, where critical values have to be computed under the assumptionf = fo. The problem of finding such functionsfo, fAis treated in Section 8. Here is the conclusion: Let

(9) ψ(`)(x) :=

( 1{x∈[−1,0]}(1 +x) ifG=G, 1{x∈[−2,2]}

1−(3/2)|x|+x2/2

ifG=Gconv. Then the functions

(10) fA(s) :=

r+L(s−t) ifG=G r+L(s−t)2/2 ifG=Gconv and

fo := fA−δψ(`)h,t withh:= (δ/L)1/k

solve our minimzation problem, provided that a(`)h ≤ t ≤ 1−b(`)h. Thus the optimal linear test statistic may be written asR1

0 ψh,tdY =ψY(h, t). Elementary considerations show that the inequality

h(`)(t)−kψ(`)k(Γ(d(`)h) +κα) h1, ψ(`)i(nh)1/2 ≤ ro

is equivalent to

ψY(h, t) ≤ n1/2hroh1, ψ(`)i+h1/2(`)k(Γ(d(`)h) +κα)

= IEfo(ψY(h, t)) + Var(ψY(h, t))1/2(Γ(d(`)h) +κα).

Thus our lower confidence bound`ˆmay be interpreted as a multiple test of all null hypotheses {f ∈ G :f(t)≤ro}witht∈[0,1]andro∈R.

Analogous considerations yield a candidate forψ(u): Let Fo :=

n

f ∈ G ∩L2[0,1] :f(t)≥r+δ o

(12)

and

FA := n

f ∈ G ∩ Hk,L :f(t)≤ro . Then the functionfAin (10) and

fo := fA+δψ(u)h,t withh:= (δ/L)1/k form a least favorable pair(fo, fA)inFo× FA, where

(11) ψ(u)(x) :=

1{x∈[0,1]}(1−x) ifG=G, 1{x∈[−21/2,21/2]}(1−x2/2) ifG=Gconv. Figures 3 and 4 depict the functionsψ(`)in (9) andψ(u)in (11).

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Figure 3: Kernel functionsψ(`), ψ(u)forG.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1

Figure 4: Kernel functionsψ(`), ψ(u)forGconv.

(13)

4.5 Optimal constants and local adaptivity

Now we are going to show that our multiscale confidence band (ˆ`,u), if constructed with theˆ kernel functions in (9) and (11), is locally adaptive in a certain sense. Precisely, we consider an arbitrary fixed functionfo ∈ G ∩ Ckwith(G, k)as specified in (8). We analyze quantities such as

k(ˆu−fo)wk+r,s and k(fo−`)wkˆ +r,s, wherewis some positive weight function on the unit interval and

kgk+r,s := sup

t∈[r,s]

g(t).

The functionwshould reflect local smoothness properties offo in an appropriate way. The fol- lowing theorem demonstrates that the k–th derivative of fo, denoted by ∇kfo, plays a crucial role.

Theorem 4.2. For arbitrary fixed numbers0≤r < s≤1let L := max

t∈[r,s]kfo(t).

Then for anyγ ∈]0,1[, inf

`,ˆu)

IPfon

kf−`kˆ +r,s≥γ∆(`)L1/(2k+1)ρno

≥ 1−α+o(1), inf

`,ˆu)

IPfo n

kˆu−fk+r,s≥γ∆(u)L1/(2k+1)ρn

o

≥ 1−α+o(1),

where both infima are taken over all confidence bands(ˆ`,u)ˆ satisfying (2), and

(z) :=

(k+ 1/2)kψ(z)k2−k/(2k+1)

, ρn :=

log(en) n

k/(2k+1)

.

In case of G = G, the critical constants are ∆(`) = ∆(u) = 21/3 ≈ 1.260. In case of G=Gconv,

(`) = (3/4)2/5 ≈ 0.891 and ∆(u) = 32/5/1281/5 ≈ 0.588.

This indicates that bounding a convex function from below is more difficult than finding an upper bound.

In view of Theorem 4.2 we introduce for arbitrary fixed >0the weight function w :=

max(∇kfo, )

−1/(2k+1)

(14)

reflecting the local smoothness of fo. The next theorem shows that our particular confidence band(ˆ`,u)ˆ attains the lower bounds of Theorem 4.2 pointwise. Suprema such ask(fo−`)wˆ k+r,s andk(ˆu−fo)wk+r,sattain their respective lower bounds∆(`),∆(u)up to a multiplicative factor 2k/(k+1/2)+op(1).

Theorem 4.3. Let(ˆ`,u)ˆ be the confidence band based on the kernel functions in (9) and (11). If f =fo, then for arbitrary >0and anyt∈]0,1[,

(fo−`)(t)wˆ (t) ≤

(`)+op(1) ρn, (ˆu−fo)(t)w(t) ≤

(u)+op(1) ρn. Moreover,

k(fo−`)wˆ k+,1−

2k/(k+1/2)(`)+op(1)

ρn, k(ˆu−fo)wk+,1−

2k/(k+1/2)(u)+op(1) ρn.

If we used kernel functions differing from (9) and (11), then pointwise optimality would be lost, and the constants for the supremum distances would get worse.

5 Simulations and numerical examples

Here we demonstrate the performance of the procedures in Section 4. We replace the continuous white noise model with a discrete one: Suppose that one observes a random vectorY~ ∈Rnwith components

(12) Yi = f(xi) +i,

wherexi := (i−1/2)/n, and the random errorsi are independent with Gaussian distribution N(0, σ2). Our kernel functionsψ(`)andψ(u)are rescaled as follows:

ψ(`)(x) :=

( 1{x∈[−1,0]}(1 +x) ifG =G, 1{x∈[−1,1]}

1−3|x|+ 2x2

ifG =Gconv, ψ(u)(x) :=

1{x∈[0,1]}(1−x) ifG=G, 1{x∈[−1,1]}(1−x2) ifG=Gconv.

Note that nowa(`), a(u), b(`), b(u) ∈ {0,1}. For convenience we compute kernel estimators and confidence bounds forf only on the gridTn := {1/n,2/n, . . . ,1−1/n}, while the bandwidth parameterhis restricted to

Hn :=

{1/n,2/n, . . . ,1} ifG=G, {1/n,2/n, . . . ,bn/2c/n} ifG=Gconv.

(15)

Letψ stand forψ(`) orψ(u) with support[−a, b]. Then forh ∈ Hn andt ∈ Tnwithah ≤ t ≤ 1−bhwe define

ψ ~Y(h, t) :=

n

X

i=1

ψxi−t h

Yi =

bnh

X

j=1−anh

ψj−1/2 nh

Ynt+j

and

h(t) := ψ ~Y(h, t) Snh , where Sd stands for Pd

j=1−dψ((j −1/2)/d). The standard deviation of fˆh(t) equals σh :=

σR1/2nh/Snh, whereRd := Pd

j=1−dψ((j−1/2)/d)2. Tedious but elementary calculations show that in case ofG=G,

Sd = d/2 and Rd = d/3−1/(12d).

In case ofG =Gconv,

Sd(`) = d/3−1/(3d) and R(`)d = 4d/15−1/(2d) + 7/(30d3), Sd(u) = 4d/3 + 1/(6d) and R(u)d = 16d/15 + 7/(120d3).

Note that here S1(`) = 0 = ψ(`)Y~(1/n,·), whence the bandwidth 1/n is excluded from any computation involvingψ(`).

As for the bias of these kernel estimators, one can deduce from Lemma 8.1 thatIE ˆfh(`)(t) ≤ f(t) and IE ˆfh(u)(t) ≥ f(t) whenever f ∈ G. Here is a discrete version of our multiscale test statistic:Tn := max

Tn(`)), Tn(−ψ(u))

, where Tn(±ψ) := max

h∈Hn

t∈Tn∩[ah,1−bh]max

±σ−1Rnh−1/2ψ ~E(h, t)−Γ((a+b)h)

withE~ := (i)ni=1. Letκα,nbe the(1−α)–quantile ofTn. Then

`(t)ˆ := max

h∈Hn:t∈[a(`)h,1−b(`)h]

h(`)(t)−σh(`)(Γ(d(`)h) +κα,n)

, ˆ

u(t) := min

h∈Hn:t∈[a(u)h,1−b(u)h]

h(u)(t) +σh(u)(Γ(d(u)h) +κα,n) , defines a confidence band forfsuch that

IP

n`ˆ≤f ≤uˆonTno

≥ 1−α wheneverf ∈ G.

Equality holds ifG = G andf is constant, or ifG =Gconv andf is linear. If the noise variance σ2is unknown, it may be estimated as described in D¨umbgen and Spokoiny (2001). Then, under moderate regularity assumptions onf, our confidence bands haveasymptoticcoverage probability at least1−αasntends to infinity.

(16)

Critical values. For various values ofnwe estimated several quantilesκα,n in 9999 Monte- Carlo simulations; see Table 1. One can easily show that the critical value κα,n converges to the corresponding quantileκα for the continuous white noise model as n → ∞. Software for the computation of critical values as well as confidence bands may be obtained from the author’s URL.

G Gconv

n κ0.5,n κ0.1,n κ0.05,n κ0.5,n κ0.1,n κ0.05,n

100 0.330 1.092 1.349 0.350 1.053 1.283 200 0.433 1.146 1.392 0.430 1.121 1.342 300 0.475 1.169 1.416 0.470 1.126 1.342 400 0.507 1.204 1.446 0.489 1.128 1.340 500 0.526 1.222 1.450 0.512 1.143 1.358 700 0.570 1.252 1.492 0.536 1.162 1.380 1000 0.585 1.250 1.483 0.552 1.178 1.393 Table 1: Some critical values for the discrete white noise model

Two numerical examples. Figure 5 shows a simulated data vectorY~ withn= 500compo- nents together with the corresponding95%–confidence band(ˆ`,u)ˆ after postprocessing, wheref is assumed to be isotonic. The latter function is depicted as well. Note that the band is compara- tively narrow in the middle of]0,1/3[, on whichf is constant. On]1/3,1]the widthuˆ−`ˆtends to inrease, as does∇f. These findings are in accordance with Theorem 4.3.

An analogous plot for a convex function f can be seen in Figure 6. Note that the deviation f−`ˆis mostly greater thanuˆ−f, as predicted by Theorem 4.3.

6 Proofs

Proof of Theorem 3.1. In order to prove lower bounds we construct unfavorable subfamilies of G similarly as Khasminski (1978). For a given integer m > 0 we defineI1 := [0,1/m]and Ij := ](j−1)/m, j/m]for1< j≤m. Then we define step functionsgandhξforξ∈Rmvia

g(t) := 2j−1 and hξ(t) := ξj fort∈Ij,1≤j≤m.

For anyδ > 0andξ ∈ [−δ, δ]m the functionδg+hξ is isotonic on[0,1]. Now we restrict our attention to the parametric submodelFo = n

δg+hξ : ξ ∈ [−δ, δ]mo

of G ∩L2[0,1]. Any confidence band(ˆ`,u)ˆ forf =δg+hξdefines a confidence setS=S1×S2× · · · ×Smforξvia

Sj :=

h sup

t∈Ij

`(t)ˆ −δ(2j−1),inf

t∈Ij

ˆ

u(t)−δ(2j−1) i

.

(17)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4

-2 0 2 4 6 8

Figure 5: DataY~ and95%–confidence band forf ∈ G.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-2 0 2 4 6 8

Figure 6: DataY~ and95%–confidence band forf ∈ Gconv.

(18)

Here`ˆ≤f ≤uˆif, and only if,ξ∈S. Moreover, D(ˆ`,u)ˆ ≥ max

j=1,...,mlength(Sj) for1/(m+ 1)≤ <1/m.

However,

logdIPδg+hξ

dIPδg (Y) = n1/2 Z 1

0

hξdY˜ −n Z 1

0

hξ(t)2dt/2

=

m

X

j=1

(n/m)1/2ξjXj −(n/m)ξj2/2

= logdN((n/m)1/2ξ, I) dN(0, I) (X), whereY˜(t) :=Y(t)−n1/2Rt

0δg(s)dsandX := (Xj)mj=1with components Xj := m1/2

Y˜(j/m)−Y˜((j−1)/m)

.

In case off = δg these random variables are independent and standard normal. Consequently, Xis a sufficient statistic for the parametric submodelFowith distributionNm((n/m)1/2ξ, I)in case off =δg+hξ. In particular, the conditional distribution ofSgivenXdoes not depend on ξ. Hence lettingδ = (n/m)−1/2cmwithcm := (2 logm)1/2 it follows from Theorem 7.1 (b) in Section 7 that for1/(m+ 1)≤ <1/m,

f∈Ginf∩L2[0,1] IPf

n`ˆ≤f ≤uˆandD(ˆ`,u)ˆ ≤2cm−bm

(n/m)1/2 o

≤ min

ξ∈[−δ,δ]m IPξ n

ξ∈Sand max

j=1,...,mlength(Sj)≤2cm−bm

(n/m)1/2 o

≤ bm,

whereb1, b2, b3, . . . are universal positive numbers such thatlimm→∞bm = 0. This entails the assertion of Theorem 3.1 withlog(1/)in place oflog(e/)and

b() := (2 log(1/))1/2−(m)1/2(cm−bm) for1/(m+ 1)≤ <1/m.

Finally note thatlog(e/)1/2 = log(1/)1/2+o(1)as↓0.

Proof of Theorem 4.1. Instead of an upper bound foruˆ−`ˆwe prove an upper bound foruˆ−f, because analogous arguments apply tof −`. In what follows letˆ ψ =ψ(u) with support[−a, b].

Fort∈[0,1]andh >0withah≤t≤1−bh, ˆ

u(t)−f(t) ≤ fˆh(t)−f(t) +kψk(Γ((a+b)h) +κα) h1, ψi(nh)1/2

= D

f(t+h·)−f(t), ψE

h1, ψi + ψW(h, t)

n1/2hh1, ψi +kψk(Γ((a+b)h) +κα) h1, ψi(nh)1/2

≤ D

f(t+h·)−f(t), ψE h1, ψi +

kψk

2Γ((a+b)h) +κα+T(ψ) h1, ψi(nh)1/2 . (13)

(19)

For any functiong∈ Hβ,L,

g(x)−g(0)

≤ L|x|β ifβ≤1,

g(x)−g(0)−g0(0)x

≤ L|x|β if1< β≤2.

Sincef(t+h·)∈ Hβ,Lhβ iff ∈ Hβ,L, this implies that D

f(t+h·)−f(t), ψ E

h1, ψi ≤ LhβRb

−a|x|β|ψ(x)|dx

h1, ψi ≤ ∆hβ.

Here and subsequently∆denotes a generic constant depending only on(β, L)andψ. Its value may vary from one place to another. In case of t ∈ [n,1−n] andh = n/max(a, b) the right-hand side of (13) is not greater than

βn+

log(en)1/2α+T(ψ)

(nn)1/2 = ∆ρn

1 +κα+T(ψ) log(en)1/2

.

Proof of Theorem 4.2.We prove only the lower bound forfo−`, becauseˆ uˆ−focan be treated analogously. It suffices to consider the caseL >0and to show that for any fixed numberγ ∈]0,1[,

IPfo

n

kfo−`kˆ +r,s≥γ∆(`)L1/(2k+1)ρn

o

≥ 1−α+o(1)

for arbitrary confidence bands(ˆ`,u) = (ˆˆ `n,uˆn)satisfying (2). Without loss of generality one may assume that

kfo ≥ L on[r, s].

Otherwise one could increaseγ and decreaseLwithout changing γL1/(2k+1), and replace[r, s]

with some nondegenerate subinterval. Letψstand for ψ(`) with support [−a, b]. For 0 < h ≤ (s−r)/(a+b)and positive integersj ≤m:=b(s−r)/((a+b)hclet

tj := s+ah+ (j−1)(a+b)h and fj := fo−Lhkψh,tj.

It follows from Lemma 8.4 that these functionsfj belong toG ∩L2[0,1]. Thus (2) implies that the event

A := n

`ˆ≤fj for somej≤mo

satisfies the inequalityIPfj(A) ≥ 1−αfor allj ≤ m. Since kfo−fjk+r,s ≥ δ, this entails the inequality

IPfo n

kfo−`kˆ +r,s≥Lhk o

≥ IPfo(A) ≥ 1−α−min

j≤m

IPfj(A)−IPfo(A)

.

(20)

Now leth := (cρn)1/k so thatLhk = Lcρn, wherec > 0is some number to be specified later.

For sufficiently largenthis bandwidthhis smaller than(s−r)/(a+b). Then logdIPfj

dIPfo(Y) = n1/2hk+1/2LkψkXj−nh2k+1L2kψk2/2, whereXj :=h−1/2kψk−1R1

0 ψh,tjdY˜ andY˜(t) :=Y(t)−n1/2Rt

0fo(x)dx. ThusX:= (Xj)mj=1 is a sufficient statistic for the restricted model{fo, f1, f2, . . . , fm}, where Lfo(X) is a standard normal distribution on Rm. Thus it follows from Theorem 7.1 (a) and a standard sufficiency argument that

n→∞lim min

1≤j≤m

IPfj(A)−IPfo(A)

= 0 if lim

n→∞

nh2k+1L2kψk2 2 logm < 1.

Sincelogm= (1 +o(1)) log(n)/(2k+ 1), the limit on the right hand side is equal to c(2k+1)/kL2kψk2(k+ 1/2)

and smaller than one ifcequalsγ∆(`)L−2k/(2k+1). In that case, the lower boundLhk=Lcρnfor

kfo−`kˆ +r,sequalsγ∆(`)L1/(2k+1)ρnas desired.

Proof of Theorem 4.3. Again we restrict our attention tofo−`ˆand letψ := ψ(`) with support [−a, b]. For any fixed >0and arbitraryt∈[0,1]letht>0and

Lt := max

s∈[t−aht,t+bht]∩[0,1]max(∇kfo(s), ).

In case ofaht≤t≤1−bhtthe inequality(fo−`)(t)ˆ ≥Lthkt implies that fˆht(t)−

kψk

Γ((a+b)ht) +κα

(nht)1/2h1, ψi ≤ fo(t)−Lthkt. Sincef =fo, this can be rewritten as

ψW(ht, t) h1/2t kψk

≤ −(nht)1/2 kψk

D

fo(t+ht·)−fo(t) +Lthkt, ψ E

+ Γ((a+b)ht) +κα

≤ −n1/2Lthk+1/2t kψk+ Γ((a+b)ht) +κα, where the latter inequality follows from Lemma 8.4 (c). Specifically let

ht := cw(t)2ρ1/kn

for some positive constantcto be specified later. By continuity of∇kfo, the weight functionwis bounded away from zero and infinity. Henceht→ 0andLtmax(∇kfo(t), )−1 →1, uniformly

(21)

int∈[0,1]. In particular,

Γ((a+b)ht) ≤ (k+ 1/2)−1/2log(en)1/2 forn≥no, n1/2Lthk+1/2t kψk ≥ ck+1/2kψklog(en)1/2,

Lthkt ≤ w(t)−1ck(1 +bnn,

wherenoandbnare positive numbers depending only onfo,andcsuch thatbn → 0. Conse- quently, forn≥no,

aht≤t≤1−bht and (fo−`)(t)wˆ (t) ≥ ck(1 +bnn

implies that

ψW(ht, t) h1/2t kψk

≤ −

ck+1/2kψk −(k+ 1/2)−1/2

log(en)1/2α.

Wheneverc > (∆(`))1/k, the right-hand side of the preceding inequality tends to minus infinity, while the random variable on the left-hand side has mean zero and variance one. Since the limit ofck(1 +bn)can be arbitrarily close to∆(`), these considerations show that(fo−`)(t)wˆ (t) ≤ (∆(`)+op(1))ρnfor any fixedt∈]0,1[.

Ifnis sufficiently large, thenaht≤t≤1−bhtand ψW(ht, t)

h1/2t kψk

≥ −T(−ψ)−Γ((a+b)ht) for allt∈[,1−]. Consequently,

sup

t∈[,1−]

(fo−`)(t(wˆ (t) ≥ ck(1 +bn) implies that

T(−ψ) ≥ n1/2Lthk+1/2t kψk −2Γ((a+b)ht)−κα

ck+1/2kψk −2(k+ 1/2)−1/2

log(en)1/2−κα.

Wheneverc >21/(k+1/2)(∆(`))1/k, the right hand side of the preceding inequality tends to infinity.

Since the limit ofck(1 +bn)can be arbitrarily close to2k/(k+1/2)(`), these considerations reveal thatk(fo−`)wˆ k+,1−is not greater than

2k/(k+1/2)(`)+op(1)

ρn.

7 Some decision theory

LetX = (Xi)mi=1 be a random vector with distribution Nm(θ, I). In what follows we consider testsφ:Rm →[0,1]and confidence sets

S = S1×S2× · · · ×Sm

(22)

forθwith random intervalsSj ⊂R. The conditional distribution ofS, givenX, does not depend onθ. The possibility of randomized confidence setsS, i.e. confidence sets not just being a function ofX, has to be included for technical reasons. Unless specified differently, asymptotic statements in this section refer tom→ ∞.

Theorem 7.1. Letcm := (2 logm)1/2. There are universal positive numbersbm with bm → 0 such that the following two inequalities are satisfied:

(a)For arbitrary testsφ,

j=1,...,mmin IE(cm−bm)ejφ(X)−IE0φ(X) ≤ bm, wheree1, e2, . . . , emdenotes the standard basis ofRm.

(b)For arbitrary confidence setsS as above, min

θ∈[−cm,cm]m IPθn

θ∈Sand max

j=1,...,mlength(Sj)<2(cm−bm)o

≤ bm.

Proof of Theorem 7.1.Part (a) is classical and can be proved by a Bayesian argument; see for in- stance Ingster (1993) or D¨umbgen and Spokoiny (2001). In order to prove part (b) we also consider a Bayesian model: Letθhave independent components each of which is uniformly distributed on the three-point setKm :={−κm,0, κm}, whereκm:=cm−bmwith constantsbm∈[0, cm]to be specified later on. LetL(X|θ) =Nm(θ, I). LetIP(·),IE(·)denote probabilities and expectations in this Bayesian context, whereasIPθ(·),IEθ(·) are used in case of a fixed parameterθ. For any confidence setS,

θ∈[−cminm,cm]m IPθ

n

θ∈Sand max

j=1,...,mlength(Sj)<2κm

o

≤ IP n

θ∈Sand max

j=1,...,mlength(Sj)<2κm

o

≤ IP{θ∈S},˜ where

S˜ :=

( S if max

j=1,...,mlength(Sj)<2κm, {0} × · · · × {0} else.

The conditional distribution ofθgiven(X, S)is also a product ofm probability measures: For anyη∈Kmm,

IP(θ=η|X, S) =

m

Y

i=1

g(ηi|Xi) with g(z|x) := exp(−(x−z)2/2) P

y∈Kmexp(−(x−y)2/2).

(23)

Since each factorS˜j ofS˜contains at most two points fromKm, IP{θ∈S}˜ = IEIP(θ∈S˜|X, S)

≤ IE max

η∈Kmm IP(θi6=ηifori= 1, . . . , m|X, S)

= IE

m

Y

i=1

1− min

z∈Km

g(z|Xi)

=

1−IE min

z∈Km

g(z|X1)m

1−3−1IE min

z∈Km

exp(−(X1−z)2/2)m

. The latter expectation can be bounded from below as follows:

3−1IE min

z∈Km

exp(−(X1−z)2/2)

≥ 3−1IP{|X1| ≤bm/2}exp(−(κm+bm/2)2/2)

≥ 3−1IP{|θ1|= 0,|X1| ≤bm/2}exp(−(cm−bm/2)2/2)

= 9−1(2π)−1/2(bm+O(b2m)) exp(cmbm/2−b2m/8)m−1.

In case of bm := 1{m > 1}c−1/2m = o(1) the latter bound is easily seen to be amm−1 with am =am(bm)→ ∞. Thus

IP{θ∈S} ≤˜ (1−amm−1)m → 0.

Replacingbmwithmax{bm,(1−amm−1)m}yields the assertion of part (b).

8 Related optimization problems

As in Section 4 let(G, k) be either(G,1)or(Gconv,2). In view of future applications to other regression models we extend our framework slightly and considerhg, hi := R

gh dµ, kgk :=

hg, gi1/2for some measureµon the real line such thatµ(C)<∞for bounded intervalsC⊂R.

Letψbe some bounded function on the real line withψ(x) = 0forx6∈[−a, b]andh1, ψi ≥0, where a, b ≥ 0. The next lemma provides sufficient conditions for one of the following two requirements:

hg, ψi ≤ g(0)h1, ψi wheneverg∈ G,1[−a,b]g∈L1(µ), (14)

hg, ψi ≥ g(0)h1, ψi wheneverg∈ G,1[−a,b]g∈L1(µ).

(15)

Lemma 8.1. LetG = G andψ ≥ 0. Thenb = 0entails condition (14), whilea = 0implies condition (15).

(24)

LetG =GconvandR

−∞xψ(x)µ(dx) = 0. Condition (15) is satisfied ifψ≥0. On the other hand, condition (14) is a consequence of the following two requirements: R

x±ψ(x)µ(dx) = 0 and

ψ

≥0 on[c, d]

≤0 onR\[c, d]

for some numbersc < 0 < d, where µ([−a, c]), µ([d, b]) > 0. (Herey+ := max(y,0) and y := max(−y,0).)

With Lemma 8.1 at hand one can solve two mimimization problems leading to the special kernels in (9) and (11). In both cases we consider two disjoint convex sets Go,GA ⊂ G and construct functionsGo∈ Go,GA∈ GAsuch that

(16) kGo−GAk = min

go∈Go, gA∈GA kgo−gAk.

Theorem 8.2. LetGo:=n

g∈ G :g(0)≤ −1o

andGA:=n

g∈ G ∩ Hk,1 :g(0)≥0o

. In case ofG =GletGA(x) :=xand

Go(x) :=

−1 ifx∈[−1,0], GA(x) else.

In case ofG =GconvletGA(x) :=x2/2and Go(x) :=

−1 + (a/2 + 1/a)x+ (b/2 + 1/b)x+ ifx∈[−a, b],

GA(x) else,

wherea, b≥21/2 are chosen such thatR

x±(GA−Go)(x)µ(dx) = 0.

Then equation (16) holds in both cases. More precisely, the functionψ:=GA−Gosatisfies the inequalitiesh1, ψi ≥ kψk2, (14) and

(17) hg, ψi ≥ kψk2− h1, ψi wheneverg∈ Hk,1, g(0)≥0.

In case ofµbeing Lebesgue measure,ψ=GA−Gocoincides with the functionψ(`)in (9), wherea=b= 2.

Theorem 8.3. LetGo :=

n

g ∈ G :g(0) ≥1 o

,GA:=

n

g ∈ G ∩ Hk,1 :g(0)≤0 o

, and define GAas in Theorem 8.2. In case ofG=Glet

Go(x) :=

0 ifx∈[0,1], GA(x) else.

In case ofG =Gconvsuppose thatµ(]−∞,0[), µ(]0,∞[)>0and let Go(x) :=

1 +cx ifx∈[−a, b], GA(x) else,

(25)

where a := −c+ (c2 + 2)1/2, b := c+ (c2 + 2)1/2, and c is chosen such that R

x(Go − GA)(x)µ(dx) = 0.

Then equation (16) is satisfied in both cases. More precisely, the function ψ := Go −GA satisfies the inequalitiesh1, ψi ≥ kψk2, (15) and

(18) hg, ψi ≤ h1, ψi − kψk2 wheneverg∈ Hk,1, g(0)≥0.

In case ofµbeing Lebesgue measure,ψ=Go−GAcoincides with the functionψ(u)in (11), wherec= 0anda=b= 21/2.

The following lemma summarizes essential properties of the optimal kernelsψ(`)andψ(u). Lemma 8.4. Letψ(`)andψ(u)be the kernel functions in (9) and (11), and leth, L >0andt∈R.

(a)IfG=G, thenh1, ψ(`)i=h1, ψ(u)i = 1/2andkψ(`)k2 =kψ(u)k2 = 1/3. Iff :R→R satisfiesf(y)−f(x)≥L(y−x)for allx < y, then

f−Lh−1ψ(`)h,t, f +Lh−1ψh,t(u) ∈ G.

(b)IfG = Gconv, thenh1, ψ(`)i = 2/3, kψ(`)k2 = 8/15,h1, ψ(u)i = 22.5/3andkψ(u)k2 = 24.5/15. Let f :R → Rbe absolutely continuous with derivativef0 such thatf0(y)−f0(x) ≥ L(y−x)for allx < y. Then

f −Lh−2ψh,t(`), f+Lh−2ψ(u)h,t ∈ Gconv. (c)In general, for any functionf ∈ Hk,L,

D

f(t+h·)−r+Lhk, ψ(`)E

≥ Lhk(`)k2 iff(t)≥r, D

f(t+h·)−r−Lhk, ψ(u)E

≤ −Lhk(u)k2 iff(t)≤r.

Proof of Lemma 8.1.The assertions forG=Gare a simple consequence ofg≤g(0)on]−∞,0]

andg≥g(0)on[0,∞[.

Now let G = Gconv. If ψ ≥ 0 andR

xψ(x)µ(dx) = 0, then Condition (15) follows from Jensen’s inequality applied to the probability measureP(dx) =h1, ψi−1ψ(x)µ(dx).

On the other hand, suppose thatψ≥0on[c, d]andψ≤0onR\[c, d], wherec <0< dand µ([−a, c]), µ([d, b])>0. Forg∈ Gconv with1[−a, b]g ∈L1(µ), bothg(c)andg(d)have to be finite, and we define

˜

g(x) := g(x)−

d−1(g(d)−g(0))x ifx≥0, c−1(g(c)−g(0))x ifx≤0.

Referenzen

ÄHNLICHE DOKUMENTE

However, information campaigns seldom stem migration, primarily because they are based on two wrong assumptions: First, that aspiring migrants are ignorant about the risks

In Figure 13 the simultaneous confidence bands are constructed for the P-P plot using smoothed two sample plug-in empirical likelihood method.. As we could already expect from

Then if people are rational, the price p t of a stock at time t should equal the expected discounted dividend d t+1 and the expected discounted value of the asset next period

The Monte Carlo simulation results to compare the finite sample properties of the proposed tests with the previous tests such as Cox test and J-test show that the proposed

bands, we explored two indicators to measure risk aversion over time and connected it with DAX index, one is the coverage probability measuring the proportion of the BS fitting

1 These interi- ority (exteriority) conditions can be formulated in terms of test statistics and critical values, can be easily checked in practice, and have been made explicit

Condence intervals and tests for the location parameter are considered for time series generated by F EXP models.. An exact closed form expresssion for the asymptotic variance of ^ d

Instead, we propose a stochastic optimization process to compute a strategy x, with the property that its cost is a good approximation for the optimal value of the problem with a