Consistency of Concave Regression with an Application to Current-Status Data

(1)

https://doi.org/10.7892/boris.73748 | downloaded: 1.2.2022

Consistency of Concave Regression with an Application to Current-Status Data

Lutz D¨umbgen, University of Bern Sandra Freitag, University of Kiel

Geurt Jongbloed, Vrije Universiteit Amsterdam May 2002 (revised March 2004)

Appeared in Mathematical Methods of Statistics 13, pp. 69-81.

Abstract. We consider the problem of nonparametric estimation of a concave regression function F. We show that the supremum distance between the least squares estimator and F on a compact interval is typically of order (log(n)/n)^2/5. This entails rates of convergence for the estimator’s derivative. Moreover, we discuss the impact of additional constraints onF such as monotonicity and pointwise bounds. Then we apply these results to the analysis of current status data, where the distribution function of the event times is assumed to be concave.

AMS 1991 subject classifications:primary 62G05; secondary 62E20

Key words: interval censoring, least squares, monotone density, subgaussian errors, supremum norm

Work supported by Deutsche Forschungsgemeinschaft

1 Introduction

Suppose that we observe (t1, Y1),(t2, Y2), . . . ,(tn, Yn) with fixed numbers t1 ≤ t2 ≤

· · · ≤tnand independent random variablesY1, Y2, . . . , Yn. Let

IE(Yi) = F(ti) (1)

for some unknown regression functionF : J →R, whereJ is some interval containing the design pointsti. In various applications, e.g. in econometrics, the regression function F is known to be concave. Then it is possible to estimate F without further assumptions by the method of least squares. That means, letFb be any concave function onJ

(2)

minimizing

q(G) :=

Xn

i=1

(Yi−G(ti))²

over the set of all concave functions G on J. This estimator Fb exists and is uniquely determined on the set {t₁, t2, . . . , tn}, because q(G) is strictly convex and coercive in (G(ti))ⁿ_i=1. Basic properties of Fb and various consistency results have been derived.

Hanson and Pledger (1976) prove uniform consistency, whereas Mammen (1991) and Groeneboom et al. (2001) concentrate on pointwise limit theorems. The present paper focuses on the supremum norm ofFb−F and its derivative. Section 2 contains the main results. In particular, under certain regularity assumptions, the supremum norm ofFb−F on a bounded interval is of stochastic order(log(n)/n)^2/5, whileFb^′−F^′converges at rate (log(n)/n)^1/5.

The impact of additional constraints such as isotonicity and pointwise bounds is dis- cussed in Section 3. In Section 4 we consider the special case of current-status data. Here t1, . . . , tn∈(0,∞)are inspection times, and

Yi = 1{Xi ≤ti}

with independent event timesX1, X2, . . . , Xnhaving distribution functionF on[0,∞]. If F is assumed to be concave on[0,∞), then (1) holds, and the main results from Sections 2 and 3 carry over to the least-squares estimatorFbfor the distribution functionF.

All proofs are deferred to Section 5. Note that the techniques developed here are different from the entropy-based approach of van de Geer (2000). While she uses covering numbers for the set of all potential regression functions, we are using a much smaller class of “caricatures” (piecewise linear functions) for the difference between true and estimated curve.

2 Uniform consistency

We consider a triangular scheme of observationsti =tn,iandYi =Yn,i but suppress the additional subscriptn for notational simplicity. LetMn be the empirical distribution of the design pointsti, i.e.

Mn(B) := n⁻¹ Xn

i=1

1{ti ∈B}

(3)

forB ⊂ R. In this section, we analyze the asymptotic behavior of Fb = Fbn on a fixed compact interval[a, b]⊂J under certain conditions onMnand the errors

Ei =En,i := Yi−F(ti).

Condition I. There is a constantC > 0such that lim inf

n→∞

Mn[an, bn] b_n−a_n ≥ C

whenevera≤an< bn ≤b such thatlim infn→∞n^1/3(bn−an)>0.

Condition II. For some constantσ >0,

i=1,...,nmax IE exp(λE_i) ≤ exp(σ²λ²/2) for allλ∈R.

Condition III. There are constantsβ ∈[1,2]andLsuch that for arbitrarys, t∈[a, b], |F(s)−F(t)| ≤L|s−t| ifβ = 1,

|F^′(s)−F^′(t)| ≤L|s−t|^β⁻¹ ifβ >1.

If, for instance, t1, . . . , tn are the order statistics of independent random variables T₁, . . . , T_n with distribution Q satisfying Q[a^′, b^′] ≥ C(b^′ − a^′) for a ≤ a^′ < b^′ ≤ b, then Condition I is satisfied almost surely. It is also satisfied with[a, b] = [0,1]in case of regular design pointst_i =i/n.

Condition II is satisfied with λ = 2 if, for instance, the errorsEi are Gaussian with standard deviation not greater thanσ.

Condition III is always satisfied withβ = 1 and some L, provided that a > inf(J) andb <sup(J). IfF is twice differentiable with|F^′′| ≤L, then Condition III holds with β = 2.

Theorem 1 Suppose that Conditions I-III are satisfied. Then

tmax∈[a,b](Fb−F)(t) = Op ρ^β/(2β+1)_n ,

t∈[a+δmaxn,b−δn](F −Fb)(t) = Op ρ^β/(2β+1)_n ,

whereρn := log(n)/nandδn:=ρ^1/(2β+1)n .

As a direct consequence of this theorem we get a result about uniform consistency of the derivativeFb^′in the caseβ > 1.

(4)

Corollary 1 Suppose that Conditions I-III are satisfied withβ >1. Then

t∈[a+δmaxn,b−δn]|Fb^′(t)−F^′(t)|=Op ρ^(β_n⁻^1)/(2β+1)

, (2)

whereFb^′ can be interpreted either as left- or rightsided derivative.

Groeneboom et al. (2001, Theorem 6.3) establish the pointwise limit behavior of the least squares estimator, featuring a rate ofn⁻^2/5for the concave functionFband a rate of n⁻^1/5for its derivativeFb^′ at a fixed point. These rates are established under a smoothness assumption which corresponds to the current situation withβ = 2.

The rates derived here are indeed optimal. More precisely, in the special case of gaussian errorsEiwith varianceσ² and equidistant design pointsti =i/none can modify the arguments of Ibragimov and Khasminskii (1980) in order to show that for any nondegen- erate interval[a, b] ⊂[0,1]and parametersβ ∈[0,1],L > 0, there exist strictly positive constantsc(β, L)andc^′(β, L)such that

infFb

sup

F∈F_conc(β,L)

IPn

tmax∈[a,b]|Fb(t)−F(t)| ≥c(β, L)ρ^β/(2β+1)_n o

→ 1

and

infFb

sup

F∈F_conc(β,L)

IPn

tmax∈[a,b]|Fb^′(t)−F^′(t)| ≥c^′(β, L)ρ^(β_n⁻^1)/(2β+1)o

→ 1

asn → ∞. HereF_conc(β, L)stands for the set of all concave functions satisfying Condi- tion III.

3 Additional constraints

In several settings the regression functionF is assumed to satisfy additional constraints such as isotonicity or certain pointwise bounds. Then it is natural to impose the same additional restrictions on the estimatorFb. Intuitively one would expect that this improves the estimator, but there seems to be no simple argument for this claim. In terms of rates of convergence there is no improvement: The minimax results mentioned at the end of Section 2 remain valid if the functionF is assumed in addition to be isotonic and to satisfy finitely many inequalities of the typec_o ≤F(s_o)≤d_o.

(5)

LetF(1) be the set of all concave andisotonic functions onJ. Furthermore, letF(2)

be the set of all concave functionsGonJ satisfying the inequalities vi ≤ G(si) ≤ wi for1≤i≤I

for a finite numberI of pointss_i ∈ J and numbers −∞ ≤ v_i ≤ w_i ≤ ∞. Finally, let F(3) :=F(1)∩ F(2). Then we define the restricted LS estimators

Fb_(j) := arg min

G∈F_(j)

q(G),

assuming tacitly that the setF_(j)is nonvoid.

Theorem 2 For a givenj ∈ {1,2,3}, suppose thatF ∈ F_(j), and let Conditions I–III be satisfied. Then the conclusions of Theorem 1 and Corollary 1 remain true forFb(j)in place ofFb.

4 Current status data

A special example for the present setting is the current status model. The basic object of interest is a distribution functionF on[0,∞]modelling a random event time, e.g. the time of onset of a certain disease. Suppose that X1, X2, . . .are event times with distribution functionF, but we are not able to observe these directly. Instead, given inspection time points0< t1 ≤t2 ≤ · · · ≤tn <∞, we observeYi = 1{X_i ≤ti}for1≤i≤n.

The standard current status model and estimators for the distribution function based on such data are understood well by now; see for instance Groeneboom and Wellner (1992).

An intensely studied estimator forF is the nonparametric maximum likelihood estimator (NPMLE) which maximizes

ℓ(G) = Xn

i=1

[YilogG(ti) + (1−Yi) log(1−G(ti))]

over the class of all distribution functionsGon[0,∞]. This estimator may be chosen to be a step function with jumps only at the design pointst_iand, possibly, at infinity. Since the NPMLE solves a so–called generalized isotonic regression problem (see e.g. Robertson et al. 1988, Section 1.5), it coincides with the least squares estimator, i.e. it minimizesq(·) as well.

(6)

Now let us assume that the (sub-) distribution functionF is concave on[0,∞). That means, it has a non-increasing density on[0,∞)and possibly a point mass at ∞. Then the LS estimator for the distribution functionF is given byFb(3)as in the previous section withJ := [0,∞), I = 1,s1 = 0and[v1, w1] = [0,∞]. Here we assume without loss of generality that0≤ Fb(3) ≤ 1, because all valuesYi are bounded from above by one, and min(G,1)∈ F₍₃₎for anyG∈ F₍₃₎.

Note also that Condition II is automatically satisfied withσ² = 1/4; see the proof of Hoeffding’s (1963) inequality. Thus Conditions I and III together imply the conclusions of Theorem 1 and Corollary 1.

A final remark. As we just mentioned, without further constraints on F, the LS estimator and the NPMLE are identical. With the additional assumption of concavity, we may define the NPMLE FbM L as the maximizer of ℓ(G) over the class of concave subdistribution functions on[0,∞). Characterizations, algorithms and consistency results forFbM Lare given by D¨umbgen et al. (2003). We conjecture that the present results for the LS estimator hold forFbM L as well. But the subsequent example shows thatFb(3) 6=FbM L

in general.

A Counterexample.Suppose that(t1, Y1) = (1,0)and(t2, Y2) = (2,1). First consider the NPMLE, maximizing log(1 − G(1)) + logG(2). Given a fixed value G(2) = α, this function is maximized by takingG(1) as small as possible under the constraints of concavity andG(0) ≥0, soG(1) =α/2. Since the functionα 7→log(1−α/2) + logα takes its maximum over[0,1]atα = 1, we get thatFbM L(1) = 1/2andFbM L(2) = 1.

Now consider the LS estimator, minimizingG(1)²+(1−G(2))². Again, forG(2) =α fixed, this function is minimized byG(1) =α/2. Since the functionα 7→(α/2)²+ (1− α)²attains its minimum atα= 4/5, we get thatFb(3)(1) = 2/5andFb(3)(2) = 4/5. Hence, Fb₍₂₎ 6=FbM L.

5 Proofs

Our proof of Theorem 1 is based on directional derivatives of the sum of squared residuals.

Let∆ :R→Rsuch thatFb+λ∆is concave onJ for someλ >0. Then the optimality

(7)

ofFbimplies that 0 ≤ d

dλ

λ=0

Xn

i=1

(Yi−(Fb+λ∆)(ti))² = 2 Xn

i=1

∆(ti)(Fb(ti)−Yi),

which is equivalent to

− Xn

i=1

∆(ti)Ei ≥ Xn

i=1

∆(ti)(F −Fb)(ti). (3) In what follows we apply (3) to a special class of perturbation functions∆and write

k∆k_n := Xⁿ

i=1

∆(ti)²1/2

.

Lemma 1 For an integerm≥0, letD_mbe the family of all continuous, piecewise linear functions on^Rwith at mostmknots. Then for any fixedγ >4,

Sn(m) := sup

∆∈D_m

Pn

i=1∆(t_i)E_i

k∆k_n ≤ γ σ(m+ 1)^1/2(logn)^1/2 for allm≥0 with probability tending to one asn→ ∞.

Proof of Lemma 1. Condition II implies that IPn

Xn

i=1

h(ti)Ei

.

khk_n≥ηo

≤ 2 exp(−η²/(2σ²)) (4) for any function h with khk_n > 0 and arbitrary η ≥ 0. This follows from standard arguments involving Markov’s inequality. For1≤j ≤k ≤n, let

φ⁽¹⁾_jk(t) := 1{t∈[tj, tk]} t−tj

t_k−t_j and φ⁽²⁾_jk(t) := 1{t∈[tj, tk]} tk−t t_k−t_j

iftj < tk. Otherwise letφ⁽¹⁾_jk(t) := 1{t=tk}andφ⁽²⁾_jk(t) := 0. This defines a collectionΦ of at mostn² different nonzero functionsφ^(e)_jk. Then (4) implies that for any fixedγo >2,

S_n := max

φ∈Φ

Xn

i=1

φ(t_i)E_i.

kφk_n ≤ γ_oσ(logn)^1/2 (5) with probability tending to one asn → ∞. For letGn(φ) :=kφk⁻_n¹Pn

i=1φ(ti)Ei. Then, by (4),

IP{S_n ≥γ_oσ(logn)^1/2} ≤ X

φ∈Φ

IPn

|G_n(φ)| ≥γ_oσ(logn)^1/2o

≤ 2n²exp(−γ_o²log(n)/2)

→ 0 asn→ ∞.

(8)

Now for any ∆ ∈ Dm, there are m^′ ≤ 2m + 2 disjoint intervals on which ∆ is either linear and nonnegative, or linear and nonpositive. For one such interval B with Mn(B)>0let{t1, . . . , tn} ∩B ={tj, . . . , tk}. Then

∆(t) = ∆(t_k)φ⁽¹⁾_jk(t) + ∆(t_j)φ⁽²⁾_jk(t) fort∈[t_j, t_k].

This shows that there are real coefficientsλ1, . . . , λ4m+4 and functionsφ1, . . . , φ4m+4 in Φ such that ∆ = P4m+4

j=1 λ_jφ_j on {t₁, . . . , t_n}, and λ_jλ_kφ_jφ_k ≥ 0 for all pairs (j, k).

Consequently, inequality (5) entails that

Pn

i=1∆(ti)Ei

k∆k_n ≤

P4m+4

j=1 |λ_j|Pn

i=1φj(ti)Ei

P4m+4

j=1 λ²_jkφjk²_n1/2

≤

P4m+4

j=1 |λj|kφjkn

P4m+4

j=1 λ²_jkφ_jk²_n1/2 Sn

≤ (4m+ 4)^1/2Sn

≤ 2γo(m+ 1)^1/2σ(logn)^1/2,

by the Cauchy-Schwarz inequality. ✷

The next ingredient for our proof is a claim about differences of concave functions, which is similar to Lemma 5.2 of D¨umbgen (1998); for the reader’s convenience a proof will be given here.

Lemma 2 Suppose that F satisfies Condition III. There is a universal constant K = K(β, L) > 0with the following property: For any ǫ > 0, let δ := Kmin(b−a, ǫ^1/β). Then

sup

t∈[a,b]

(Fb−F)(t) ≥ ǫ or sup

t∈[a+δ,b−δ]

(F −Fb)(t) ≥ ǫ implies that

t∈[c,c+δ]inf (Fb−F)(t) ≥ ǫ/4 or inf

t∈[c,c+δ](F −Fb)(t) ≥ ǫ/4 for somec∈[a, b−δ].

Proof of Lemma 2. Suppose that(Fb−F)(to)≥ǫfor someto ∈[a, b]. Without loss of generality letto ≤(a+b)/2. We define an auxiliary linear functionFevia

Fe(t) :=

F(t_o) ifβ = 1 F(to) +F^′(to)(t−to) ifβ >1

(9)

and note that

|(Fe−F)(t)| ≤ L|t−to|^β/β, (6) by Condition III. Now let0 < δ ≤ (b−a)/8. SinceFb−Fe is concave, it follows from (Fb−Fe)(t_o+δ)≥ǫ/2thatFb−Fe≥ǫ/2on[t_o, t_o+δ]. Otherwise, if(Fb−Fe)(t_o+δ)< ǫ/2, then the derivative ofFb−Feis less than or equal to−δ⁻¹ǫ/2on[to+δ,∞). Consequently, fort≥t_o+ 3δ,

(Fb−Fe)(t) ≤ ǫ/2−δ⁻¹(ǫ/2)(t−to−δ) ≤ −ǫ/2.

ThusFb−Fe ≥ ǫ/2 orFe −Fb ≤ ǫ/2 on some intervalJ ⊂ [to, to + 4δ] with length δ.

Together with (6) this entails thatFb−F orF−Fbis not smaller thanǫ/2−L(4δ)^β/β ≥ǫ/4 onJ, provided thatδ≤(β/L)^1/β4⁻¹⁻^1/βǫ^1/β.

Now suppose that (F −Fb)(to) ≥ ǫ for someǫ > 0 andto ∈ [a+δ, b−δ], where 0 < δ ≤ (b−a)/2. By Condition III and concavity ofFbthere exist numbersγ,bγ such that

F(t) ≥ F(to) +γ(t−to)−L|t−to|^β/β, Fb(t) ≤ Fb(to) +bγ(t−to).

Thus

(F −Fb)(t) ≥ ǫ+ (γ−bγ)(t−to)−L|t−to|^β/β ≥ ǫ−Lδ^β

for alltin the interval[t_o, t_o+δ]or[t_o−δ, t_o], depending on the sign ofγ−bγ. Moreover, ǫ−Lδ^β/β≥ǫ/4, provided thatδ ≤(3/4)^1/β(β/L)^1/βǫ^1/β. ✷ Finally we have to show that one of our classesD_mdoes indeed contain useful perturbation functions∆. For that purpose we define the set

T := {t1, t2, . . . , tn}

and denote with Fˇ the unique continuous and piecewise linear function with knots in T ∩(t1, tn)such thatFˇ =FbonT. ThusFˇis one particular LS estimator forF.

Lemma 3 For0< u≤b−alet

Mfn(u) := min

c∈[a,b−u] Mn[c, c+u].

(10)

Suppose thatF−Fb ≥ǫ >0orFb−F ≥ǫon some interval[c, c+δ]⊂[a, b]with length δ >0. Then there is a function∆∈ D₆such that

Fˇ+λ∆is concave for someλ >0, (7)

∆(F −Fb) ≥ ǫ∆² onT, (8)

k∆k²_n ≥ nMfn(δ/2)/4. (9)

Proof of Lemma 3. Without loss of generality we assume thatT ∩[c, c+δ]6=∅. For otherwise,Mf_n(δ)≤M_n[c, c+δ] = 0, so that∆≡0would satisfy (7–9).

We define the auxiliary set S := n

t∈R: ˇF^′(t−)>Fˇ^′(t+)o

⊂ T ∩(t1, tn).

Then for any function∆∈ D₆, requirement (7) is equivalent to nt∈R: ∆^′(t−)<∆^′(t+)o

⊂ S. (10)

Case I:Fb−F ≥ǫon[c, c+δ]. Here a function∆∈ D₄will do.

Case Ia: S ∩(c, c+δ)contains some pointto. We take∆∈ D₃with knotsc, to, c+δ, where∆ = 0on (−∞, c]∪[c+δ,∞) and∆(t_o) = −1. This function∆satisfies (10) and (8). Moreover,∆² ≥ 1/4on some subinterval of [c, c+δ]with lengthδ/2, whence k∆k²_n ≥nMfn(δ/2)/4.

Case Ib: S ∩(c, c+δ) =∅. NowF −Fˇis concave on[c, c+δ], andF −Fˇ≤ −ǫon [c, c+δ]∩ T. Let[co, do]⊃[c, c+δ]be a maximal interval in[−∞,∞]such thatF −Fˇ is concave on[co, do]∩ T. Note thatco ∈ S ifco > −∞, and do ∈ S ifdo < ∞. One easily verifies that there exists a linear function∆e such that∆e ≥ F −Fˇon[co, do]∩ T and∆e ≤ −ǫon[c, c+δ]∩ T. Next let(c1, d1) :={∆e <0} ∩(co, do). Further let

c2 :=

max(T ∩(−∞, c1)) ifc1 >−∞and∆(ce 1)<0,

c1 else,

d2 :=

min(T ∩(d1,∞)) ifd1 <∞and∆(de 1)<0,

d1 else.

Note that neither (c2, c1) nor (d1, d2)contains a design point ti. Now let ∆ ∈ D4 with knots in {c₂, c1, d1, d2} ∩Rsuch that ∆ = ∆/ǫe on (c1, d1) and∆ = 0 on (−∞, c₂)∪ (d2,∞). This function∆satisfies (10) and (8). Moreover,∆² ≥ 1on[c, c+δ]∩ T, so that evenk∆k²_n ≥nMfn(δ).

(11)

Figure 1 illustrates the latter construction. For simplicity we only consider Fb = ˇF. The upper subplot shows the graphs ofF (thin line) andFb(thick line), the interval[c, c+δ]

on whichFb−F ≥ ǫas well as the auxiliary pointsco, do. The lower subplot shows the corresponding perturbation function∆(thick line) and the scaled difference(F −Fb)/ǫ (thin line).

co d

o

-1 0

c1 = c

o d

1 = d c o

2 d

2

Figure 1: The perturbation function∆in Case Ib

Case II: F −Fb ≥ ǫon [c, c+δ]. Let [co, do] ⊃ [c, c+δ] be a maximal interval in [−∞,∞] such thatF −Fˇ ≥ ǫ on [co, do]∩ T. Now we define a function ∆ ∈ D₆ as follows. At first let∆ := 1on[c_o, d_o]. Suppose thatd_o <∞, which implies thatd_o < t_n. Then letd1 be the largest number in (do,∞] such thatFˇ is linear on [do, d1). Note that F −Fˇis concave on[do, d1]∩Rand strictly decreasing on[do, d1]∩ T, where we define F := −∞ on R\J. Next let ∆ be linear on [do, d1]∩ R such that ∆(do) = 1 and

∆(to) = 0. Hereto is the supremum of all pointst ∈[do, d1]∩Rwith(F −Fˇ)(t) ≥0.

(12)

Ifd1 is finite, then it belongs necessarily toS, and we define d2 :=

d1 ifto =d1, min(T ∩(d1,∞)) else.

Then let∆ := 0on[d2,∞), and let∆be linear on[d1, d2].

With an analogous construction in case ofc_o > −∞we end up with a function∆ ∈ D6satisfying (10) and (8), whilek∆k²_n≥nMn[c, c+δ]≥nMfn(δ).

Figure 2 illustrates the latter construction. The upper subplot shows the graphs of F (thin line) andFb = ˇF (thick line), the interval [co, do]on which F −Fb ≥ ǫ as well as the auxiliary pointsc2, c1, d1, d2. The lower subplot shows the corresponding perturbation function∆(thick line) as well as(F −Fb)/ǫ(thin line). ✷

co d

c o

1 d

1 = d c 2

2

0 1

Figure 2: The perturbation function∆in Case II Proof of Theorem 1. Suppose that

sup

t∈[a,b]

(Fb−F)(t) ≥ κδ_n^β or sup

t∈[a+δn,b−δn]

(F −Fb)(t) ≥ κδ_n^β

(13)

for someκ >0. It follows from Lemma 2 that there is a (random) interval[cn, cn+δn]⊂ [a, b]on which eitherFb−F ≥(κ/4)δ_n^βorF−Fb≥(κ/4)δ_n^β, provided thatnis sufficiently large andκ ≥K⁻^β. But then, by the definition ofSn(6)and Lemma 3, there is a (random) function∆n∈ D₆ such that

Sn(6) ≥ −k∆nk⁻_n¹ Xn

i=1

∆n(ti)Ei

(3,7)

≥ k∆nk⁻_n¹ Xn

i=1

∆n(ti)(F −Fb)(ti)

(8)≥ (κ/4)δ^β_nk∆_nk_n

(9)≥ (κ/4)δ^β_n(nMfn(δn/2)/4)^1/2.

Consequently, by Condition I and Lemma 1,

κ ≤ 4δ_n⁻^β(nMf_n(δ_n/2)/4)⁻^1/2S_n(6)

≤ 4δ_n⁻^β

(C/8 +o(1))nδn

⁻1/2

Sn(6)

= O(1)(logn)⁻^1/2S_n(6)

= Op(1). ✷

Proof of Corollary 1. Let

t∈[a+δmaxn/2,b−δn/2]|Fb(t)−F(t)| = δ_n^βR_n.

The proof of Theorem 1 reveals that Rn = Op(1). By concavity of Fb, for any t ∈ [a+δn, b−δn]andνn:=δn/2,

Fb(t)−Fb(t−νn) νn

≥ Fb^′(t−) ≥ Fb^′(t+) ≥ Fb(t+νn)−Fb(t) νn

,

whereFb^′(t−) and Fb^′(t+) denote the left- and rightsided derivative of Fb, respectively.

Moreover, the definition ofRn, concavity ofF and Condition III withβ > 1imply that Fb(t)−Fb(t−νn)

νn

≤ F(t)−F(t−νn) + 2δ_n^βRn

νn

≤ F^′(t−νn) + 2δ_n^βRn/νn

≤ F^′(t) +Lν_n^β⁻¹ + 2δ_n^βRn/νn

= F^′(t) + (2¹⁻^βL+ 4Rn)ρ^(β_n⁻^1)/(2β+1).

(14)

Similarly,

Fb(t+νn)−Fb(t)

ν_n ≥F^′(t)−(2¹⁻^βL+ 4Rn)ρ^(β_n⁻^1)/(2β+1). Hence we obtain

|Fb^′(t±)−F^′(t)| ≤ (2¹⁻^βL+ 4Rn)ρ^(β_n⁻^1)/(2β+1) = Op ρ^(β_n⁻^1)/(2β+1)

. ✷

Proof of Theorem 2. A close inspection of the proof of Theorem 1 reveals that we only need a surrogate for Lemma 3. Namely let T₍₁₎ := T and T₍₂₎ := T₍₃₎ := T ∪ {s1, . . . , sI}. LetFˇ(j)be the unique continuous and piecewise linear function with knots inT_(j)∩ min(T_(j)),max(T_(j))

such thatFˇ(j)=Fb(j)onT_(j). Then we have to show that Lemma 3 remains true with(Fb(j),Fˇ(j),T(j)) in place of(F ,b F ,ˇ T), where Condition (7) has to be replaced with

Fˇ_(j)+λ∆ ∈ F_(j) for someλ >0. (11) For that purpose we use the same construction of ∆as in the proof of Lemma 3. In order to guarantee (11), sinceFˇ_(j) and∆are piecewise linear withFˇ_(j) ∈ F_(j), it suffices to verify the following two conditions:

Ifj ∈ {1,3}, then ∆^′(t+)≥0wheneverFˇ_(j)^′ (t+) = 0. (12)

Ifj ∈ {2,3}, then for1≤i≤I, (13)

∆(si)

≥ 0 ifFˇ(j)(si) = vi,

≤ 0 ifFˇ_(j)(s_i) = w_i.

(14) Let us start with (12), where j ∈ {1,3}. Note first that Fˇ_(j)^′ (t+) > 0 for all t <

max(S), whereS ={t : ˇF_(j)^′ (t−)>Fˇ_(j)^′ (t+)}. Now we consider Case Ia as defined in the proof of Lemma 3. There the function∆satisfies∆^′(·+) ≥ 0on [to,∞), for some to ∈ S ∩(c, c+δ), which entails (12).

In Case Ib, ifd1 < ∞and ∆(de 1) < 0, thend1 = do belongs toS and∆^′(t+) ≥ 0 for all t ≥ d1. If d1 < ∞ and ∆(de 1) = 0, then ∆^′(t+) ≥ 0 for all t ≥ c1, and c1 > −∞ entails that c1 ∈ S. Thus (12) is satisfied in case of d1 < ∞. If d1 = ∞ and∆e^′ < 0, suppose that∆e^′ < 0was really necessary. That means, there exists a point r∈(co, c)∩ T_(j)with(F−Fˇ(j))(r)>−ǫ. Since(F−Fˇ(j))(s)≤ −ǫfor somes ≥c, this

(15)

implies thatF^′(t+)−Fˇ_(j)^′ (t+) <0for allt≥s. ButF^′(·+) ≥0, so thatFˇ(j)(·+)>0 everywhere, whence (12) is trivial.

Now consider Case II. If ∆^′(t+) < 0 for some t < co, then t is strictly smaller than some point withinS, by construction of ∆. If∆^′(t+) < 0 for somet ≥ do, then either t < d1 ∈ S, or Fˇ(j) is linear on [do,∞). In the latter case, there exists a point s ∈ T_(j) ∩ (do,∞) such that (F − Fˇ(j))(do) ≥ ǫ > (F −Fˇ(j))(s). This entails that (F −Fˇ(j))^′(s+)<0. HenceFˇ_(j)^′ (s+)> F^′(s+)≥0, so thatFˇ_(j)^′ (·+)>0everywhere.

As for Condition (13), note that∆(F−Fˇ(j))≥ǫ∆²onT_(j). In particular, ifj ∈ {2,3}, then∆(si) < 0implies thatF(si)−Fˇ_(j)(si) <0, i.e.Fˇ_(j)(si) > F(si)≥ vi. Similarly,

∆(si)>0entails thatFˇ(j)(si)< wi. ✷

Acknowledgement. We are grateful to a referee for constructive comments.

References

D ¨UMBGEN, L. (1998). New goodness-of-fit tests and their application to nonparametric confidence sets. Ann. Statist. 26, 288–314

D ¨UMBGEN, L., FREITAG, S., AND JONGBLOED, G. (2003). Estimating a unimodal distribution from interval-censored data. Preprint

GROENEBOOM, P., JONGBLOED, G., AND WELLNER, J.A. (2001). Estimation of a convex function: characterization and asymptotic theory. Ann. Statist. 29, 1653–

1698

GROENEBOOM, P.,ANDWELLNER, J.A. (1992).Information bounds and nonparametric maximum likelihood estimation. Birkh¨auser, Basel.

HANSON D.L. AND PLEDGER, G. (1976). Consistency in Concave Regression. Ann.

Statist. 4, 1038–1050

HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables.

J. Amer. Statist. Assoc. 58, 13–30

IBRAGIMOV, I.A.ANDR.Z. KHASMINSKII(1980). Estimates of signal, its derivatives, and point of maximum for Gaussian observations. Theory Prob. Appl. 25, 703-716.

(16)

MAMMEN, E. (1991). Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19, 741–759

ROBERTSON, T., WRIGHT, F.W. ANDDYKSTRA, R.L. (1988). Order restricted statis- tical inference. Wiley, New York.

VAN DE GEER, S.A. (2000). Empirical Processes in M-Estimation. Cambridge Uni- versity Press, Cambridge.

Lutz D¨umbgen

Department of Mathematics University of Bern

Sidlerstrasse 5

CH-3012 Bern, Switzerland duembgen@stat.unibe.ch

Sandra Freitag

Institut f¨ur Med. Informatik und Statistik Universit¨at Kiel

Brunswiker Strasse 10 D-24105 Kiel, Germany

freitag@medinfo.uni-kiel.de Geurt Jongbloed

Department of Mathematics Vrije Universiteit

De Boelelaan 1081A

1081 HV Amsterdam, The Netherlands geurt@cs.vu.nl