• Keine Ergebnisse gefunden

The test is mimicked in the con- text of nonparametric density estimation, nonparametric regression and interval-censored data

N/A
N/A
Protected

Academic year: 2022

Aktie "The test is mimicked in the con- text of nonparametric density estimation, nonparametric regression and interval-censored data"

Copied!
27
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

1998, Vol. 26, No. 1, 288–314

NEW GOODNESS-OF-FIT TESTS AND THEIR APPLICATION TO NONPARAMETRIC CONFIDENCE SETS1

By Lutz D ¨umbgen

Universit¨at Heidelberg and Medizinische Universit¨at zu L ¨ubeck

Suppose one observes a processVon the unit interval, wheredV= fo+dWwith an unknown parameterfoL101and standard Brownian motionW. We propose a particular test of one-point hypotheses aboutfo which is based on suitably standardized increments ofV. This test is shown to have desirable consistency properties if, for instance,fo is restricted to various H¨older classes of functions. The test is mimicked in the con- text of nonparametric density estimation, nonparametric regression and interval-censored data. Under shape restrictions on the parameter, such as monotonicity or convexity, we obtain confidence sets forfoadapting to its unknown smoothness.

1. Introduction. Suppose one observes a stochastic process Vn =Fn+ n−1/2Won01, whereFnis an unknown parameter in⺓01withFn0 = 0,Wis standard Brownian motion on01, andn >1 is a known scale param- eter. Estimation within this model is closely related to estimation of regression functions or densities based on samples of sizen; see Brown and Low (1996) and Nussbaum (1996). LetCnVn αbe a confidence set forFnwith coverage probability 1−α ∈ 01. Given a model ⺝ ⊂ ⺓01 for Fn and any func- tion φ on⺝, the set φCnVn α ∩⺝is obviously a 1−α-confidence set forφFn. Numerous applications of this type are described, for instance, by Donoho (1988), Davies (1995) and Hengartner and Stark (1995). The first two authors investigate setsCnVn αbased on standard goodness-of-fit tests such as the Kolmogorov–Smirnov test. In the context of density estimation, Hen- gartner and Stark (1995) utilize a special test criterion which may, but need not, give optimal confidence bands. The present paper introduces a new type of goodness-of-fit test such that the resulting confidence setsφCnVn α∩⺝ have optimal size in terms of rates of convergence simultaneously for various classes⺝ and functionalsφ.

Suppose we want to test the hypothesis “Fn = 0” versus “Fn = 0.” For fixed numbers 0 ≤s < t ≤ 1 andr ∈ R\0 consider the special alternative

“Fn· = ±rLebs t ∩ 0·.” Then an optimal test rejects for large values of nVns t2/t−s; generally hs t stands for the increment ht −hs of a functionhon the line. Now we combine these special test statistics. Suppose that the tripletr s tabove has an improper prior distribution with Lebesgue

Received May 1996; revised October 1997.

1Research supported in part by European Union Human Capital and Mobility Program ERB CHRX-CT 940693.

AMS1991subject classifications. 62G07, 62G15

Key words and phrases. Adaptivity, conditional median, convexity, distribution-free, interval censoring, modality, monotonicity, signs of residuals, spacings.

288

(2)

density 1s t ∈ δt−s1/2, where δ = s t ∈ 0120 < t−s ≤ δ and 0 < δ ≤ 1. Then the corresponding Bayes test statistic is given by Tn1/2Vn, where

Th =

δexp

hs t2 2t−s

ds dt

This statistic is reminiscent of a goodness-of-fit statistic proposed by Shorack and Wellner (1982). The main difference is the exponential function in the inte- grand, which is essential for our results. It is true, but not obvious, thatTW is finite almost surely with continuous distribution; see Section 5. (Note that ETW = ∞.) Thus we reject the hypothesis “Fn =0” at levelα ifTn1/2Vn exceeds cα, the 1−α-quantile of ⺜TW. The corresponding 1−α- confidence set forFn is given by

CnVn α =

F∈⺓01F0 =0 Tn1/2Vn−F ≤cα

As for the power of this test and the size of CnVn α, note that T· is convex on⺓01. Hence

2Tn1/2G−F/2 ≤Tn1/2Vn−G +Tn1/2Vn−F (1)

for arbitrary F G∈ ⺓01. In particular, letting G=0 andF=Fn shows that Tn1/2Vn → ∞ in probability whenever Tn1/2Fn/2 tends to infinity.

For any fixed Fo = 0, it follows from Fatou’s lemma that Tn1/2Fo/2 →

∞. HenceT· implies an omnibus test. Unless stated otherwise, asymptotic statements refer ton→ ∞.

In order to investigate the power ofT· more thoroughly, we consider the set

C1n Vn α =

f∈L101

fxdx∈CnVn α

This is certainly a 1−α-confidence set for the L1-derivative fn of Fn (if existent). We consider the intersection ofC1n Vn αwith H¨older smoothness classes: letI⊂Rbe an interval andβ=k+γ with a nonnegative integerk and 0 < γ≤ 1. Then ⺖β LIstands for the set of all real functions fthat arek times differentiable onIsuch that

fkx −fky ≤Lx−yγ for allx y∈I

herefk denotes thekth derivative off(where f0=f). Further we define the supremum normfI=supx∈Ifx. All subsequent consistency results are formulated in terms of

ρn=logn/n

Theorem 1.1. For arbitrary fixed β L >0, let In ⊂ 01 be an interval with length

LebIn ≥ρ1/2β+1n

(3)

Then there exists a constantR=Rβ Lsuch that sup

f−gInf g∈C1n Vn α f−g∈⺖β LIn

≤Rρβ/2β+1n

for any fixedα∈ 01and sufficiently largen.

This result has two straightforward consequences. Suppose that Fn is dif- ferentiable with derivativefn∈⺖β L01. When testing “fn=0” versus

“fn

f∈⺖β L01 f01≥εn

at fixed levelα, it was shown by Ingster (1993) that the maximin power con- verges to one or α as ρ−β/2β+1n εn tends to infinity or zero, respectively. In fact,

Tn1/2Vnp∞ provided that fn01 ρβ/2β+1n

→ ∞

Thus our test 1Tn1/2Vn ≥cαis asymptotically optimal in terms of rates of consistency for arbitrary H¨older classes. Another interesting reference in the context of nonparametric testing is Spokoiny (1996).

A second implication is that sup

f−fn01f∈C1n Vn α ∩⺖β L01

≤Op

ρβ/2β+1n Note that the confidence set C1n Vn α ∩⺖β L01 may be empty, where sup⵰ = −∞. In that case⺖β L01 is regarded as a questionable model forfn. The rateOpρβ/2β+1n was shown by Khas’minskii (1978) to be optimal for estimatingfn∈⺖β L01under sup-norm loss.

Smoothness assumptions such as “fn ∈⺖β L01” are difficult to justify in practice. It would be desirable to have a1−α-confidence set forfnwhose size is automatically of the right order of magnitude, depending on the un- known smoothness of fn. As pointed out by Low (1997), this is essentially impossible. However, some adaptivity is possible iffn satisfies shape restric- tions such as monotonicity. Restrictions of this type are indeed plausible in many applications. Precisely, we shall investigate the classes

I = fnondecreasing onI and ⺖I = −⺖I

convI = fconvex onI

ccI = fconvex–concave or concave–convex onI

Rather than doing so in the present white noise model, we propose and analyze modifications ofTfor three different models.

Section 2 investigates tests for distribution functions on the line and their application to density estimation. Let Xn = X1n X2n Xnn be the or- der statistic of n independent random variables with unknown distribution functionFnin⺠, the set of all continuous distribution functions on the line.

(4)

Recall that FnXin1≤i≤n is distributed as the order statistic of nindepen- dent random variables with uniform distribution on 01 [cf. Shorack and Wellner (1986), Chapter 1]. Thus

F∈⺠ FXin1≤i≤n∈Bn

defines a confidence set forFnwhose coverage probability depends only on the set Bn ⊂ 01n. Hengartner and Stark (1995) constructed confidence bands for shape-restricted densities (monotonicity or unimodality) with the help of simultaneous confidence bounds forFnXi−1K n XiK n, 1≤i≤n/K, where X0n = −∞, Xn+1 n = ∞and K =Kn is a bandwidth parameter. One can get rid of the tuning parameter K by considering (essentially) all spacings Xjn Xkn, 0≤j < k≤n+1, in a suitable way. Our particular modification results in greater computational complexity involving convex rather than lin- ear progamming, the reward being (almost) optimal rates of convergence for several functions ofFn.

Section 3 is concerned with nonparametric regression. Suppose that one observes Yin = fntin +Ein, 1 ≤ i ≤ n, with an unknown function fn on Rd, fixed design points tin ∈ Rd and independent errors Ein with median zero. Davies (1995) obtained tests and confidence sets for (functions of)fnvia inversion of the runs test, applied to the random vector

signYn f = signYin−ftin1≤i≤n

wherefis a candidate forfn. For a different application of sign tests in non- parametric regression, see M ¨uller (1991). We propose a test criterion, also based on signYn f, that yields adaptively optimal confidence bands for fn. These results complement the literature on point estimation under shape re- strictions [cf. Mammen (1991) and the references therein]. Some numerical examples for our confidence bands are given.

A possible application of the present methods to interval-censored obser- vations is discussed briefly in Section 4. For a detailed treatment of efficient estimation within this model, see Groeneboom and Wellner (1992).

All proofs are deferred to Section 5.

2. Distribution functions and density estimation. The idea is to re- place the processn1/2Vn−Fin Section 1 with the process

t→n1/2FXn+1t n −t

Let D¯n denote the set of pairsj kof integers such that 0≤j < k≤n+1.

Note that FnXjn Xkn has a Beta-distribution with parametersk−j and n+1−k+j [cf. Shorack and Wellner (1986), Chapter 3.1]. We utilize the following bounds for tail probabilities of the Beta-distribution.

Proposition 2.1. For0< p <1, define .x p =plogp

x + 1−plog1−p 1−x

(5)

if x ∈ 01, and .x p = ∞ otherwise. Let B be a random variable with distribution betamp m1−p,m >0. Then

PB≥x ≤exp−m .x p forx≥p PB≤x ≤exp−m .x p forx≤p

The function .· p is strictly convex on01 with minimum .p p = 0.

For anyc≥0, .x p ≤cimplies that

2p1−pc− 1−2pc≤x−p≤

2p1−pc+ 1−2p+c Withδjkn = k−j/n+1the precise definition of our test statistic is

TnXn F =n−2

j k∈Dn

expn.FXjn Xkn δjkn

where Dn = j k ∈ ¯Dn δmin n ≤ δjkn ≤ δn is a nonvoid subset of D¯n determined by numbers 0 < δmin n ≤ δn ≤ 1. A possible reason for using a lower bound δmin n >1/n+1 for δjkn are discretization errors in the data Xin. It is assumed throughout that

δmin n=Oρn and δn→δ

Using an upper boundδn<1 reduces computational complexity and empha- sizes smaller intervals. With the 1−α-quantile bnα of TnXn Fn, the set

CnXn α = F∈⺠TXn F ≤bnα

is a 1−α-confidence set for Fn. The next proposition summarizes some properties ofTnXn FnandCnXn α.

Proposition 2.2.

a TnXn Fn

δexp

Bs t2 2t−s1−t+s

ds dt whereBis a Brownian bridge on01.

(b) There is a constantKodepending only onDnn such that the following inequalities hold for anyα∈ 01andngreater than some integernoα ≥2:

FJ −GJ ≤

KoρnFJ +Koρn for arbitraryF,G∈CnXn αand intervalsJ⊂R.

Part (b) is the key to various consistency results. One particular application are confidence bands for monotone densities. Similarly to Section 1, we define C1n Xn αto be the set of all probability densities on the line whose distribu- tion function belongs toCnXn α. A possible notion of consistency is in terms of Hausdorff distance between graphs [cf. Marron and Tsybakov (1995)]. In

(6)

case of monotone functions this is essentially equivalent to considering a L´evy distance: for functionsf g∈⺖I, define

df gI =inf

ε >0 fx+ε −gx ∨ gx+ε −fx ≤ε wheneverx x+ε∈I

Theorem 2.3(Monotone densities). Let f and g be arbitrary probability densities inC1n Xn α ∩⺖Ifor some intervalI⊂R. WithKoandnoαas in Proposition 2.2(b),

df gI∩ a∞ ≤ Kofaρn1/3+ Koρn1/2 for alla∈I provided that n≥noα.

In addition suppose thatf∈⺖β LIfor someβ∈ 01. Then there exists a constantK1=K1Ko β Lsuch that

gx −fx ≤ K1fxρnβ/2β+1+ K1ρnβ/β+1

forx∈Iwithx− fxρn1/2β+1−ρ1/β+1n ∈I fx −gx ≤ K1fxρnβ/2β+1+ K1ρnβ/β+1

forx∈Iwithx+

infy∈Ifyρn1/2β+1

∈I provided that n≥noα.

Analogous inequalities hold in case of f g being nondecreasing on some interval. Theorem 2.3 also applies to unimodal or piecewise monotone densi- ties. For instance, let⺠unibe the class of unimodal distributions. That means, F ∈ ⺠uni if it has a density which is nondecreasing on − ∞ mF and nonincreasing on mF∞ for some real number mF, a mode of F. Let Fn=Fo∈⺠uni with unique modemFoand densityfo. Theorem 2.4 below shows that for any fixed neighborhood s t of mFo, with high asymptotic probability the modemFof anyF∈CnXn α ∩⺠uni is contained ins t.

In that case Theorem 2.3 applies to the two intervals − ∞ s and t∞, respectively, so thatCnXn α ∩⺠unigives nontrivial confidence bands forfo. Theorem 2.4(Inference about the mode). Suppose that Fn = Fo ∈ ⺠uni with unique modemFo. Then for anyα∈ 01,

sup

mF −mFo F∈CnXn α ∩⺠uni

p0 In particular, suppose that the densityfoof Fo satisfies

x→mFlimo

fomFo −fox

mFo −x2 =γ >0 (2)

Then

sup

mF −mFoF∈CnXn α ∩⺠uni

=Opρ1/5n

(7)

The rateOρ1/5n for estimatingmFois close to the optimal rateOn−1/5 [cf. Khas’minskii (1979) and Romano (1988)].

3. Confidence sets for regression functions. Given an index set⺤n= t1n t2n tnn of n points in Rd, let Yn be a random vector in Rn with components

Yin=fntin +Ein

for some unknown function fn on Rd and a random error En ∈ Rn having independent components Ein with median zero. That means PEin ≥ 0 ∧ PEin≤0 ≥1/2. For a functionfonRd, define

signYn f =

s∈ −11n signYin−ftin ∈ 0 si for 1≤i≤n This somewhat unusual definition is made in order to deal with possibly discrete error distributions. Let 3n be uniformly distributed on −11n. If all components of Yn have a continuous distribution, then signYn fn = signEn0consists almost surely of one point whose distribution is ⺜3n. In general, one can easily couple the random vectors3nandEnin such a way that

3n∈signYn fn almost surely (3)

Now we define the test statistic TnYn f =min

τnss∈signYn f τns = #⺑n−1

A∈⺑n

exp n

i=11tin∈Asi 2 2#A

where ⺑n is a family of nonvoid subsets of ⺤n. A corresponding 1−α- confidence set forfnis given by

C1n Yn α =

fTnYn f ≤cnα

withcnαdenoting the1−α-quantile of⺜τn3n. Note thatTYn fn ≤ τn3nalmost surely if (3) holds.

Example. Letn=mdfor some integersm d >0. Then define

nd= 1/m2/m 1d

ndδn =

x y ∩⺤ndx y∈⺤ndwith 0< yi−xi≤δn for alli where x y = d

i=1xi yi and δn → δ. Here #⺑n ≤ mm−1/2d ≤ n2. Table 1 gives some Monte Carlo estimates forcnαin dimension one.

Here is the analogue to Proposition 2.2.

Proposition 3.1. Suppose that#⺑n→ ∞.

(a) In general,

τn3n =Oplog#⺑n

(8)

Table 1

Estimated quantilescnαofτn3nforn1n1δn(20000simulations)

n05 n025

n cn05 cn01 cn005 cn05 cn01 cn005

50 1.85 3.85 5.56 1.97 3.51 4.47

100 1.87 4.13 6.17 2.02 3.96 5.43

150 1.92 4.29 6.53 2.05 4.12 5.77

200 1.93 4.44 6.67 2.08 4.22 5.82

250 1.95 4.50 6.73 2.08 4.34 6.37

300 1.96 4.53 7.03 2.10 4.35 6.43

400 1.96 4.64 7.15 2.12 4.55 6.60

500 1.98 4.65 7.19 2.13 4.55 6.66

Suppose thatn⊂ D∩⺤nD∈⺔ for some Vapnik–Cervonenkis classof subsets ofRd, and let

lim sup

n→∞ #⺑n−1

A∈⺑n

logn/#A<∞

Thenτn3n =Op1. In particular, for ⺤nn = ⺤n1n1δn, δ−δ2/2τn3n TW

(b) Let H 01 → 0∞ be a fixed nondecreasing function such that all random vectorsEnsatisfy

PEin≤Hu ∧PEin≥ −Hu ≥ 1+u

2 for1≤i≤n u∈ 01 Then for any fixedα∈ 01, the probability of

A∈⺑n

sup

f∈C1n Ynα

mint∈Aft −fnt ∨min

t∈Afnt −ft> H

3

log#⺑n

#A

tends to zero, whereHu = ∞foru >1.

As for part (b), if all components of En are Gaussian with variance not greater than τ, one may take Hu = τ8−11+u/2 with the standard normal quantile function8−1. A key condition on His

lim sup

u↓0

Hu u <∞ (4)

Theorem 3.2(Isotonic regression). Let allfnbelong to the class01d of functionsfon01d such thatfs ≤ft whenevers≤t componentwise.

(9)

Forf g∈⺖01d, define a L´evy distance df g =inf

ε >0lfs −gs+ε1 ∨ gs −fs+ε1 ≤ε whenevers s+ε1∈ 01d

where1= 11 1! ∈Rd. Letnn = ⺤ndndδn as above. Sup- pose that the assumption of Proposition 3.1(b) holds with a functionHsatis- fying (4). Then

sup

df fnf∈C1n Yn α ∩⺖01d

≤Op

ρ1/2+dn

Theorem 3.3(Convex–concave regression). Letnn = ⺤!n1n1δn Suppose that allfnbelong tocc∩⺖β La bfor some0≤a < b≤1,β∈ 02 and L > 0. Further suppose that the assumption of Proposition 3.1(b) holds with a functionHsatisfying (4). Then

supf−fn

a+ρ1/2β+1n b−ρ1/2β+1n

f∈C1n Yn α ∩⺖cc

=Op

ρβ/2β+1n Numerical examples. In all subsequent examples we consider the pair

n1n1025. We simulated dataYin(shown as dots) having logistic dis- tribution with mean fni/n (shown as dotted line) and standard deviation vin. Point estimators and confidence bands are shown as solid lines.

Figure 1 shows two data vectorsYn withn=200 andvin=04. We mini- mizedTnYn fover allf∈⺖conv01. In both examples the minimum turned out to be unique, although this is not necessarily the case. This led to a sign vectorsmin, and the solid lines represent the functions

t→

min

ftf∈⺖conv01 smin∈signYn f max

ftf∈⺖conv01 smin∈signYn f The regression functionfnwas taken to be

fnt = 1−5t/2 ∨ 5t/3−2/32 and

fnt = 1−5t/2 ∨ 5t/3−2/32+12/5< t <4/5sin5πt/2 respectively. The corresponding observed p-value Pτn3n ≥ τnsmin was estimated in 40000 Monte Carlo simulations. In the first example it turned out greater than 0.99. In fact, the distance between estimator andfnis small in comparison with the noise levelvin. In the second example the Monte Carlo p-value was 0027, so that the nonconvexity offn is detected at level 0.05.

Figures 2, 3 and 4 depict examples of confidence bands, that is, the envelope functions

t→



 min

ftf∈C1n Yn01 ∩⺖ max

ftf∈C1n Yn01 ∩⺖

(10)

Fig. 1. Point estimators forfnconv01.

Precisely, in Figure 2 the parameters aren=250,vin=05 and fnt =1t≥1/2 ⺖ =⺖01 In Figure 3 we haven=250,vin=03 and

fnt = 1−3t ∨ 3t/2−1/22 ⺖ =⺖conv01

(11)

Fig. 2. Envelope ofC1n Yn01 ∩01.

Fig. 3. Envelope ofC1n Yn01 ∩conv01.

(12)

Fig. 4. Envelopes ofC1n Yn01 ∩01andC1n Yn01 ∩01 ∩cc01.

Figure 4 depicts heteroscedastic dataYin withn=200 and fnt = 3x−1+∧1 vin= 1+fni/n/4

The two plots show the envelopes ofC1n Yn01 ∩⺖01andC1n Yn01∩

01 ∩⺖cc01, respectively. Here the additional constraint “fn∈⺖cc01”

led to considerably smaller confidence bands.

(13)

4. Interval censoring. Let X˜1n, X˜2nnn be independent, iden- tically distributed random variables with distribution function Fn. Rather than X˜in, one only observes Zin = 1 ˜Xin ≤ rin, 1 ≤ i ≤ n, where r1n ≤ r2n≤ · · · ≤rnnare given censoring times (viewed as fixed). Given a hypothet- ical distribution functionF, let

ZntF =2n−1/2

i≤nt

Zin−Frin t∈ 01 Then our test statistic for “Fn=F” isTnZn· F, where

Tnh =n−2

s t∈nδn

exp

hs t2 2t−s

with nδn = δn ∩ 01/n2/n 12 and δn → δ. Unfortunately the 1−α-quantile dnαFn of the distribution of TnZn· Fn depends on the unknown function Fn. However, the caseFnr1n =Fnrnn =1/2 is the worst case asymptotically. The corresponding quantile is denoted by dnα.

We define CnZn α to be the set of all distribution functions F such that TnZn· F ≤dnα.

Proposition 4.1. (a) For any fixedα∈ 01,

n→∞lim dnα =cα and lim sup

n→∞ PTnZn· Fn ≥dnα ≤α (b) For any fixedα∈ 01,

n→∞lim P

F∈CsupnZn αFs−Fnt∨Fns−Ft ≥

−1ρn

µns t for somes < t

=0 whereµn· =n−1n

i=11rin∈ ·.

Part (b) implies consistency of CnZn α in various senses under certain conditions on the sequence of distributions µn. We mention only two simple consequences:

Theorem 4.2. (a) Suppose that Fn =Fo is continuous and thatµn con- verges weakly to a probability measureµsuch thatsupportµ ⊃supportFo. Then

sup

F−FoRF∈CnZn α

p0

(b) Suppose that that Fn0 =0 andFn∈ ⺖β L0∞for some β∈ 01 andL >0. Further letr1n,r2n rnnbe the order statistic of independent, identically distributed random variables R1, R2 Rn with distribution µ having continuous densityµ1 on0∞. Then

sup

F−FnIF∈CnZn α

=Op

ρβ/2β+1n

for any compact subsetIof t≥0µ1t>0.

(14)

5. Proofs. In order to verify the finiteness ofTW, let TW =ˇ sup

s t∈1

Ws t2

2t−sloge/t−s

According to L´evy’s theorem on W’s modulus of continuity, 1 ≤ ˇTW < ∞ almost surely [cf. Shorack and Wellner (1986), Theorem 14.1.1]. Now the key point is that

E1 ˇTW ≤MTW

δE1

Ws t2

2t−s ≤Mlog e

t−s

exp

Ws t2 2t−s

ds dt

δ

1+2Mlog e

t−s

ds dt

<∞

for any constantM >1; see Lemma 5.1. Continuity of⺜TWfollows from the fact that T is strictly convex on the set G ∈ ⺓01 G0 = 0. For Wt = Bt +ξt, 0 ≤ t ≤ 1, with a Brownian bridge B and a standard Gaussian variableξsuch thatBandξare independent. Thus conditional onB, the test statisticTWis a strictly convex function ofξ, whence continuously distributed. This consideration shows in addition that the support of⺜TW is connected.

Lemma 5.1. LetXbe a nonnegative random variable such thatPX≥r ≤ 2 exp−rfor allr≥0(e.g.,X=Z2/2withZ∼⺞01). Then for allγ,l >0,

E1X≤lexpγX ≤

1+2l ifγ=1 1+2expγ−1l −1/1−1/γ ifγ=1 Proof. The expectation of 1X≤lexpγXequals

0 PX≤land expγX> rdr

≤1+expγl

1 PexpγX> rdr

≤1+2expγl

1 r−1/γdr

=

1+2l ifγ=1

1+2expγ−1l −1/1−1/γ ifγ=1 The proofs of Theorems 1.1 and 3.3 are based on a lemma on H¨older classes of functions.

(15)

Lemma 5.2. For β L > 0 there is a universal constantKβ L >0 such that for arbitrary compact intervalsI⊂Rand anyf∈⺖β LIthe following hold.

(a) There is an intervalJf⊂Isuch that

f ≥Kβ LfI onJf LebJf ≥Kβ L

f1/βI ∧LebI

(b) Ifβ≤2, then for arbitrary g∈⺖ccIthere is an interval Jfg ⊂I such that

g−f ≥ gxo −fxo/4 onJfg LebJfg ≥Kβ L

gxo −fxo1/β∧LebI wherexodenotes the midpoint ofI.

Proof of Lemma 5.2(a). Let x1 ∈ I with fx1 = fI, and define γ= f1/βI ∧LebI.

If 0< β ≤ 1, thenfx ≥ fI−Lx−x1β ≥ fI/2 for any pointx in Jf= x1− 2L−1/βγ x1+ 2L−1/βγ ∩I, where LebJf ≥ 2L−1/β∧2−1γ.

For β > 1 we use induction on k = kβ. Suppose the assertion is true for β−1 L in place of β L. If fx ≥ fI/2 for all x ∈ J!f = x1− γ/2 x1+γ/2∩I, the assertion would be true withJf=J!fandKβ L=1/2.

Otherwise letx2∈J!fwithfx2 ≤ fI/2. Then

f1I≥ fx1 −fx2/x1−x2 ≥ fI

By assumption, sincef1∈⺖β−1 LI, there is an intervalJ!!f⊂Isuch that f1 ≥Kβ−1 LfI/γonJ!!f

LebJ!!f ≥Kβ−1 LfI1/β−1∧LebI =Kβ−1 Lγ Hence, ifa0=infJ!!fandai=a0+ i/4LebJ!!f, then

fai −fai−1 ≥4−1K2β−1 LfI for 1≤i≤4

In addition,fis strictly monotone onJ!!fby continuity off1. Hence one easily verifies thatf ≥4−1K2β−1 LfIona0 a1 ora3 a4.

Proof of Lemma 5.2(b). At first we consider the special case, where I=

−44, f ≡ 0, g0 = 1 and g is convex–concave on −44. Under these assumptions there is an intervalJ⊂ −44 such that

g ≥1/2 onJ and LebJ ≥1

Obviously, this is true if g >1/2 on −10or on 01. Otherwise, let −1≤ x1<0< x2≤1 withgx1 ∨gx2 ≤1/2. Convex-concavity ofg implies that

(16)

g is concave onx4 and convex on −4 xfor somex ∈ −41, because otherwisegwould be convex on−11. Ifx ≤0, then theL1-derivativeg1 of gsatisfies

g1x ≤g1x2 ≤ gx2 −1/x2≤ −1/2 forx2≤x <4

If x >0, then convexity of g on−4 xand gx1<1 together imply that gx>1, whence

g1x ≤g1x2 ≤ gx2 −gx/x2−x<−1/2 forx2≤x <4 Thus

gx ≤gx2 +x x2

g1rdr≤1/2− x−x2/2≤ −1/2 for 3≤x <4 With the help of affine transformations, one can deduce that in the general case for any 0< γ≤LebI/2 there is an intervalJfgγ ⊂ xo−γ xo+γ ⊂I such that

G ≥ gxo −fxo/2 onJfgγ and LebJfgγ ≥γ/4 where

Gx =

gx −fxo if 0< β≤1 gx −fxo −f1xox−xo if 1< β≤2 is also in⺖ccI. But forx∈ xo−γ xo+γ,

Gx − g−fx =

fx −fxo if 0< β≤1 x

xof1r −f1xodr if 1< β≤2

≤Lγβ≤ gxo −fxo/4 provided thatγ≤ 4L−1/βgxo −fxo1/β

Proof of Theorem 1.1. Let F G be continuous functions on 01 with L1-derivatives f g ∈ C1n Vn α, respectively, such that f−g ∈ ⺖β LIn. Then by (1),

Tn1/2F−G/2 ≤2−1Tn1/2Vn−F +Tn1/2Vn−G ≤cα Given any fixed number R ≥ 1, suppose that fx −gx ≥ Rρβ/2β+1n for somex∈In. According to Lemma 5.2 there exists an intervalJ=Jfgn⊂In such that

f−g≥Kβ/2β+1n onJ LebJ ≥K

f−g1/βIn∧LebIn ≥Kρ1/2β+1n

(17)

where K denotes a generic positive constant depending only on β L but possibly different in various places. If J1 andJ3 denote the left and right third ofJ, respectively, then fors∈J1 andt∈J3,

t−s≥Kρ1/2β+1n F−Gs t2

2t−s ≥KR2ρ2β/2β+1n t−s ≥KR2ρn Thus

Tn1/2F−G/2 ≥

1s∈J11t∈J3exp

KR2n ds dt

≥KnKR2−2/2β+1

For n and R sufficiently large, the latter bound exceeds cα. In that case f−gIn is necessarily smaller thanRρβ/2β+1n .

Proof of Proposition 2.1. LetGandG!be independent Gamma-distrib- uted random variables with meanmp andm1−p, respectively. ThenB= G/G+G!has the desired Beta-distribution, and forp < x <1,

PB≥x =P1−xG−xG!≥0

≤inf

r>0Eexpr1−xG−rxG!

= inf

0<r<1/1−xexp−mplog1−r1−x −m1−plog1+rx

=exp−m .x p

wherermin= x−p/x1−x. Withκ=p1−pandγ=1−2pone can write

.x p =x−p

0

r

κ+γr−r2dr

x−p

0

r κ+γrdr

≥κ−1x−p2/2 ifp≥1/2

=κγ−2−1γx−p ifp <1/2 where Hy =y−log1+yis strictly increasing iny≥0. It follows easily from the series expansion of exp·thatH−1y ≤ 2y1/2+y. Thus.x p ≤c implies that

x−p≤

2κc1/2 ifp≥1/2 κγ−1H−1κ−1γ2c ≤ 2κc1/2+γc ifp <1/2

For 0< x < pthe assertions follow from the fact that 1−B∼Betam1− p mpand.x p =.1−x1−p.

Here is a modified version of Lemma VII.9 of Pollard (1984), which is con- venient for our purposes. The proof is essentially the same.

(18)

Lemma 5.3 (Chaining). Let S = Stt∈⺤ be a stochastic process on a to- tally bounded metric space⺤ ρhaving continuous sample paths. LetQ be a measurable, nonnegative function on 0∞2 such that for allη δ >0 and s t∈⺤,

PSs −St ≥ρs tQη δ ≤2 exp−η ifρs t ≥δ Then

PSs −St>12Jρs t afor somes t∈⺤ withρs t ≤δ ≤2δ/a for arbitrarya δ >0, where

Jε a =ε

0 QlogaDu2/u udu Du =sup

#⺤oo⊂⺤ ρs t> ufor differents t∈⺤o

Proof of Proposition 2.2(a). At first it is shown that T˜n= max

j k∈ ¯Dn

n.FnXjn Xkn δjkn log

δ−1jkn1−δjkn−1 =Op1 It follows from Proposition 2.1 that forηn>0,

P

j k∈ ¯maxDnn.FnXjn Xkn δjkn ≥ηn

≤2 n+2

2

exp−ηn If ηn = 3 logn, the latter bound tends to zero. Thus for arbitrary fixed 0< γ <1/2,

j k∈ ¯Dnδmaxjkn1−δjkn≤n−γ

n.FnXjn Xkn δjkn log

δ−1jkn1−δjkn−1 =Op1 On the other hand,

j k∈ ¯Dnδmaxjkn1−δjkn≥n−γ

n.F˜ nXjn Xkn δjkn log

δ−1jkn1−δjkn−1 =Op1 (5)

where .x p = 2p1˜ −p−1x−p2. This follows, for instance, from the Chaining Lemma 5.3 applied to the uniform quantile processSj/n+1 = n+11/2FnXjnon⺤ = j/n+10≤j≤n+1equipped withρs t = VarBs t1/2. For elementary calculations, show that Du ≤ 2/u2 for 0 <

u≤1. Further, one can easily deduce from Proposition 2.1 that Qη δ = 2η1/2+max

n+11/2δ1−1 η

satisfies the assertion of Lemma 5.3. Then elementary calculations show that Jε a ≤Ka

εlog1/ε1/2+n−1/2logn2

(19)

for allε∈ 01/2and some constantKanot depending onn. Alternatively one may deduce (5) from the Hungarian approximation [cf. Shorack and Well- ner (1986), Chapter 12.2]. But (5) implies that

j k∈ ¯Dnδmaxjkn1−δjkn≥n−γlogitFnXjn Xkn −logitδjknp0

where logitx =logx/1−x. Elementary calculations show that.x p/

.x p →˜ 1 as logitx −logitp →0. Thus one may replace.˜ in (5) with. and obtains thatT˜n=Op1.

Analogously one can show that TB =˜ sup

s t∈1

Bs t2 2ρs t2logρs t−2 is finite almost surely.

Now it follows from Lemma 5.1 that for arbitraryε,M >0, E1 ˜Tn≤Mn−2

j k∈Dnδjkn1−δjkn≤ε

expn.FnXjn Xkn δjkn

≤n−2

j k∈Dnδjkn1−δjkn≤ε

1+2Mlogδ−1jkn1−δjkn−1

s t∈δρs t2≤ε1+2Mlogρs t−2ds dt E1 ˜TB ≤M

s t∈δρs t2≤εexp

Bs t2 2ρs t2

ds dt

s t∈δρs t2≤ε1+2Mlogρs t−2ds dt

This bound tends to zero asε↓0. Moreover,S· =Sn·converges in distri- bution toBif it is suitably extended toSn∈⺓01, whence

n−2

j k∈Dnδjkn1−δjkn≥ε

expn.FnXjn Xkn δjkn

s t∈δρs t2≥εexp

Bs t2 2ρs t2

ds dt

[Here we applied Rubin’s extended continuous mapping theorem; see Billings- ley (1968), Theorem 5.5.] These two facts together imply the asserted conver- gence in distribution ofTnXn Fn.

Proof of Proposition 2.2(b). LetK be a generic real constant depend- ing only onDnnand possibly different in various (in)equalities. LetFbe an arbitrary element ofCnXn α. It follows straightforwardly from part (a) that

.FXjn Xkn δjkn ≤3ρn for allj k ∈Dn (6)

(20)

provided thatn≥noα ≥2. This implies that

FXjn Xkn −δjkn ≤ Kδjknρn1/2+Kρn for allj k ∈ ¯Dn (7)

It follows from Proposition 2.1 and (6) that (7) holds for Dn in place of D¯n with K =6. Then elementary considerations show that Dn can be replaced withD¯nifK is adjusted properly.

Now let G be another element of CnXn α and J ⊂R an interval with FJ< GJ. Define

j=jJ =maxlXln≤infJ k=kJ =minlXln≥supJ Then (7) implies that

GJ ≤GXjn Xkn ≤δjkn+ Kδjknρn1/2+Kρn (8)

FJ ≥FXj+1 n Xk−1 n ≥δjkn−2/n− Kδjknρn1/2−Kρn (9)

and one easily deduces from (9) that

δjkn≤2FJ +Kρn (10)

Now subtracting (9) from (8) and plugging in (10) yields

GJ −FJ ≤ KFJρn1/2+Kρn Proof of Theorem 2.3. LetFandGbe the distribution function offand g, respectively. For arbitrarya x y∈I with a≤x < y, the monotonicity of fandg, together with Proposition 2.2(b), implies that

gy −fx ∨ fy −gx ≤ Gx y −Fx y/y−x

KoFx yρn/y−x2 1/2+Koρn/y−x

≤ Kofaρn/y−x1/2+Koρn/y−x where we assume throughout thatn≥noα. Ify−xis greater than

κna = Kofaρn1/3+ Koρn1/2

thenKofaρn/y−x1/2+Koρn/y−x ≤κna. Hencedf gI∩a∞ ≤ κna.

Now suppose in addition thatf∈⺖β LI. Then gy −fy ∨ fx −gx

≤Ly−xβ+ Kofxρn/y−x1/2+Koρn/y−x (11)

Letx=y− fyρn1/2β+1−ρ1/β+1n , assuming that this point is also inI.

Iffyρn1/2β+1≥ρ1/β+1n , which is equivalent toρn≤fyβ+1/β, then fx ≤fy +Ly−xβ ≤fy +L2βfyρnβ/2β+1≤ 1+L2βfy

(21)

and (11) yields

gy −fy ≤

L2β+

Ko1+L2β 1/2+Ko fyρnβ/2β+1

On the other hand, fyρn1/2β+1≤ρ1/β+1n is equivalent tofy ≤ρβ/β+1n

and implies that

fx ≤fy +L2βρβ/β+1n ≤ 1+L2βρβ/β+1n Thus (11) leads to

gy −fy ≤

L2β+ Ko1+L2β1/2+Ko ρβ/β+1n

As for the lower bound, let γ = infy∈Ify, and suppose that x + γρn1/2β+1 ∈ I. Since fx −gx ≤ fx, we may assume that fx ≥ Kρβ/β+1n and define y=x+ fxρn/K1/2β+1 for some constant K≥1 to be specified later. This definition implies that

ρ1/β+1n ≤y−x≤ fx/K1/β Ify∈I, one can easily deduce from (11) that

fx −gx ≤

LK−β/2β+1+K1/2o K1/4β+2 fxρnβ/2β+1+Koρβ/β+1n It remains to be shown that y∈ I for suitable K=Kβ L. If fx ≤Kγ, theny−x≤ γρn1/2β+1. Otherwise,

Lz−xβ≥ 1−fz/fxfx ≥ 1−1/Kfx

for somez∈I,z > x. Thus ifK=L+1, thenz−x≥ fx/K1/β≥y−x. Proof of Theorem 2.4. According to Proposition 2.2(b), it suffices to show that mGn → mFo and, in case of (2), mGn =mFo +Oρ1/5n , where Gnn is an arbitrary sequence of distribution functions in⺠uni with

GnJ −FoJ ≤ KoρnFoJ1/2+Koρn

for all intervals J ⊂ R and anyn > 1. For fixed ε > 0 there are bounded, nondegenerate intervalsJ1≤J2≤J3(in a pointwise sense) such that

maxl=13FoJl/LebJl< FoJ2/LebJ2 J1∪J2∪J3⊂ mFo −ε mFo

It follows from Proposition 2.2(b), that there is an integer n1 such that maxl=13GnJl/LebJl< GnJ2/LebJ2 ifn≥n1 But these two inequalities forGn imply thatmGn ⊂J1∪J2∪J3.

Suppose that fo satisfies the regularity condition (2), where mFo = 0 without loss of generality. We define

Jn1= −2κn−κn Jn2= −κn κn Jn3= κnn

Referenzen

ÄHNLICHE DOKUMENTE

◆ Use either fixed width windows, or windows that contain a fixed number of data points. ◆

The main objective of this paper is to consider estimation and testing of the interaction terms using marginal integration theory.. Again the latter makes it possible to construct

Severini and Staniswalis focus their analysis on the problem of estimating and deriving asymptotic results for estimators of the parameters of the parametric component of a

and a thorough finite sample study suggest that in particular in cases with a pole at zero the standard gamma kernel significantly outperforms the modified version, which is

In this paper, we introduce a general flexible model framework, where the compound covariate vector can be transient and where it is sufficient for nonparametric type inference if

Our test is then based on the distinctive asymptotic behavior, under the null and alternative hypothesis, of the sample covariation be- tween the risk factor and the estimated

Both the realized kernel and the moving average DPM estimators reduce the average level of daily variance and indicate the presence of significant market microstructure noise.. Based

This paper introduced the concept of functional cointegration and proposed a novel method of estimating the unknown functional coefficients linking the variables of interest under