• Keine Ergebnisse gefunden

Robustness to model misspecification

n−1(2kn+ 1)min{r/p,r/2}

(logn)r/2

.

This concludes the proof forq =a√ logn.

Finally, we consider the choice of threshold q=q(β). The corresponding assertions follow readily from the proof above, by noting the facts thatq(β)≤a√

lognfor some constanta, due to (3.12), and thatP{TI(yn;f)> q(β)}=O(n−r) by the choice ofβ =O(n−r).

Remark 3.1.5. In the above theorem, we note that the choice of the only tuning parameter q is universal, i.e., completely independent of the (unknown) true regression function. One can easily obtain a lower bound of order (kn/n)min{1/2,1/p} on the best possible rate in terms of Lp-loss, 0 < p < ∞, by standard arguments based on testing many hypotheses and information inequalities (cf. Tsybakov, 2009; Li et al., 2016). Thus, the multiscale change-point segmentation method adapts to the underlying complexity of the truth, and is up to a log-factor minimax optimal over classes SL(kn) with different choices of kn, such that kn = o(n), in particular, kn nθ, 0 ≤ θ < 1. This includes the case θ = 0, where, by convention, kn is finite. Moreover, we point out that the choice of threshold q is independent of the specific loss function, but depends on the order r of the moments of the loss.

3.2 Robustness to model misspecification

In practical applications the underlying function in model (2.3) is usually not a precise step function. As a robustness study, we next consider the convergence behavior of the multiscale change-point segmentation methods for more general functions. Using the ter-minology introduced in Section 2.3, we consider the following approximation space.

Aγ:=

n

f ∈ D([0,1)) : sup

k≥1

kγΓk(f)<∞o

, forγ >0, and the subclasses

AγL:=n

f ∈ D([0,1)) : sup

k≥1

kγΓk(f)≤L, and kfkL ≤Lo

, forγ >0 and L >0, where Γk(f) is the approximation error defined by

Γk(f) := inf

kf−gkL :g∈ S([0,1)),#J(g)≤k

. (3.14)

3.2 Robustness to model misspecification Note that Aγ = S

L>0AγL. The order γ of these spaces (or classes) reflects the speed of approximation off by step functions as the number of change-points increases. In addition, it is worth noting that if we consider instead the Lq-loss only for a fixed q, then we can replace kf −gkL by kf −gkLq in the definition (3.14) of the approximation error ∆k. This will slightly enlarge the approximation spaces.

The rates of convergence for these spaces (or classes) are provided in the following theorem.

Theorem 3.2.1. Assume model (2.3) and that Assumption 1 holds with constants c > 1 and δ > 0. Let 0 < p, r < ∞, and let fˆn be the multiscale change-point segmentation estimator from (2.4) with threshold

q =ap

logn for some a > δ+σ√ 2r+ 4, or q =q(β) as in (2.6) withβ =O(n−r).

Then it holds that

kfˆn−fkrLp=O

n2γ+1 min{1/p,1/2}r

(logn)

γ+(1/2−1/p)+

2γ+1 r

, a.s., uniformly for f ∈ AγL. Furthermore, the same result also holds in expectation,

E

kfˆn−fkrLp

=O

n

2γ+1min{1/p,1/2}r

(logn)

γ+(1/2−1/p)+

2γ+1 r

,

uniformly for f ∈ AγL.

Proof. The idea behind is that we first approximate the truth f by a step function fkn withO(kn) jumps, and then treatfkn as the underlying “true” signal in model (2.3) (with additional approximation error). In this way, it allows us to employ similar techniques as in the proof of Theorem 3.1.4. To be rigorous, we give a detailed proof as follows.

Firstly, we consider the choice of threshold q=a√ logn.

(i) Good noise case. Assume for the moment that the observations yn = {yni}n−1i=0 from model (2.3) are close to the truthf in the sense that the event

Gn:=

yn : sup

I∈I

1 pn|I|

X

i/n∈I

yin−f i n

−sI≤a0

plogn

 holds witha0 =δ+σ√

2r+ 4.Now let kn:=

l 2L a−a0

2/(2γ+1) n logn

1/(2γ+1)m .

Note that f ∈ AγL, so for every n, by means of compactness argument, there exists a step function ˜fkn ∈ S([0,1)) with #J( ˜fkn) ≤ kn such that kf −f˜knkL ≤ kn−γL. By

3 Theory

introducing additional change-points at {i/kn}ki=1n−1 into ˜fkn, one can construct another step function fkn with #J(fkn) ≤ 2kn such that its largest segment length ≤ 1/kn and kf −fknkL ≤2kn−γL. Then

TI(yn;fkn)≤ sup

I∈I fkn≡cI onI

1 pn|I|

X

i/n∈I

f i

n −cI

+ sup

I∈I

1 pn|I|

X

i/n∈I

yin−f i n

−sI

≤2n1/2kn−γ−1/2L+a0

plogn≤ap logn.

That is,fkn lies in the constraint of (2.4). Thus, by definition, #J( ˆfn)≤#J(fkn)≤2kn. Let intervals {Ii}mi=0 be the partition of [0,1) by J( ˆfn)∪J(fkn) withm≤3kn. Then

kfˆn−fknkpLp=

m

X

i=0

(ˆθi−θi)p|Ii| with ˆfn|Ii ≡θˆi andfkn|Ii ≡θi.

If |Ii|> c/n, there is ˜Ii ∈ I such that ˜Ii⊆Ii and |I˜i| ≥ |Ii|/c. Then,

i

1/2

θ− 1 n|I˜i|

X

j/n∈I˜i

yjn

≤(a+δ)

rlogn

n forθ=θi or ˆθi, which, together with|I˜i| ≥ |Ii|/c, implies

|Ii|1/2|θˆi−θi| ≤2(a+δ)

rclogn n . If |Ii| ≤c/n, then we have for somei0,j0

|θˆi| ≤ |θˆi−yni0|+

yin0 −f i0

n

+kfkL ≤2(a+δ)p

logn+L, and |θi|=|fkn j0

n

| ≤ kf−fknkL+kfkL ≤(2kn−γ+ 1)L, which lead to

|θˆi−θi| ≤ |θˆi|+|θi| ≤2(a+δ)

rlogn

n + 2(kn−γ+ 1)L.

Thus, by combining these two situations, we obtain that kfˆn−fknkpLp≤ X

i:|Ii|>c/n

2(a+δ) s

clogn n|Ii|

p

|Ii|

+ X

i:|Ii|≤c/n

2(a+δ)p

logn+ 2(k−γn + 1)Lp c n.

3.2 Robustness to model misspecification

Then, with a similar argument as for (3.11), we obtain asn→ ∞ kfˆn−fknkpLp ≤2

4(a+δ)2lognp/2(3kn+ 1)c n

min{1,p/2}

1 +o(1) ,

which together with a triangular inequality leads to kfˆn−fkrLp≤2(2/p+1)r

4(a+δ)2logn

r/2(3kn+ 1)c n

min{r/p,r/2}

1 +o(1)

. (3.15) (ii) Rates of convergence. The rate of almost convergence is a consequence of (3.15) and the fact that, due to (3.12),

lim sup

n→∞ P{Gnc} ≤lim sup

n→∞ P

 sup

I∈I

1 pn|I|

X

i/n∈I

ξin

>(a0−δ)p logn

= 0.

Similar to the proof step (iii) of Theorem 3.1.4, we drive from (3.15) that, asn→ ∞, E

kfˆn−fkrLp

=E

kfˆn−fkrLp;Gn +E

kfˆn−fkrLp;Gnc

≤E

kfˆn−fkrLp;Gn

+ 2r/pnr/2P{Gnc}+ Z

2np/2

P n

kfˆn−fkpLp ≥u or

pur/p−1du

≤O

(logn)r/2 n−1kn

min{r/p,r/2}

+O n−r/2

=O

(logn)r/2 n−1kn

min{r/p,r/2}

,

which shows the rate of convergence in expectation.

Lastly, for the choice of threshold q = q(β), the proof follows in the same way as above, based on the facts that q(β) ≤ a√

logn for some constant a, due to (3.12), and that

P{Gnc}=O(n−r) by the choice ofβ=O(n−r).

Remark 3.2.2. Similar to Theorem 3.1.4, the above theorem shows that the multiscale change-point segmentation method with a universal threshold automatically adapts to the smoothness of the approximation spaces, in the sense that it has a faster rate for larger order γ. However, unlike in Theorem 3.1.4, we assume the constanta in the definition of the thresholdq should bestrictly greater than δ+σ√

2r+ 4, which is necessary since the constant hidden in the O notation tends to infinity as a→δ+σ√

2r+ 4.

Example 3.2.3. (i) (Piecewise) H¨older functions. For 0 < α ≤1 and L >0, we consider the H¨older function classes

HLα([0,1)) :=

f ∈ D([0,1)) :kfkL ≤L and

3 Theory

|f(x1)−f(x2)| ≤L|x1−x2|α for all x1, x2∈[0,1) , and the piecewise H¨older function classes with at mostκ jumps

Hκ,Lα ([0,1)) :=n

f ∈ D([0,1) : there is a partition{Ii}li=0, withl≤κ, of [0,1) such that f

Ii ∈HLα(Ii) for all possiblei o

. Obviously, the latter one contains the former as a special case when κ = 0, that is, H0,Lα ([0,1)) ≡ HLα([0,1)). It is easy to see that HLα([0,1)) ⊆ AαL0 with L0 ≥ L, and Hκ,Lα ([0,1))⊆ AαL0 withL0≥L(κ+ 1)α+1/2 (cf. Boysen et al., 2009).

It is known that the fastest possible rate over HLα([0,1)), 0 < α ≤ 1 with respect to the Lp-loss, 0 < p < ∞, is at most of order n−2α/(2α+1) min{1/2,1/p} (see eg. Ibragimov and Has’minski˘ı, 1981; Ibragimov and Khas’minski˘ı, 1982). Thus, as a consequence of Theo-rem 3.2.1, the multiscale change-point segmentation method with a universal threshold is simultaneously minimax optimal (up to a log-factor) over HLα([0,1)) and Hκ,Lα ([0,1)) for every κ ∈ N0, 0 < α ≤ 1 and L >0, that is, adaptive to the smoothness order α of the underlying function.

(ii) Bounded variation functions. Recall that the (total) variation k·kTV of a function f is defined as

kfkTV := sup nXm

i=0

|f(xi+1)−f(xi)|: 0 =x0<· · ·< xm+1 = 1, m∈N o

.

We introduce the c`adl`ag-bounded variation classes BVL([0,1)) :=

f ∈ D([0,1)) :kfkL ≤Land kfkTV≤L forL >0.

Elementary calculation, together with Jordan decomposition, implies that BVL([0,1))⊆ A1L0 forL0 ≥L.

Since the H¨older class HL1([0,1)) ⊆ BVL([0,1)), the best possible rate for BVL([0,1)) cannot be faster than that for HL1([0,1)), which is of order n−2/3 min{1/2,1/p}. Then, Theo-rem 3.2.1 implies that the multiscale change-point segmentation method attains the min-imax optimal rate (up to a log-factor) over the bounded variation classes BVL([0,1)) for L >0.

We point out that the convergence rates of the multiscale change-point segmentation meth-ods in the examples above coincide with the rates reported in Boysen et al. (2009) for jump-penalized least square estimators, while they are faster than the rates reported in Fryzlewicz (2007) for the unbalanced Haar wavelets based estimator, with the difference being in log-factors. All these examples concern the approximation spacesAγforγ ≤1. Note, however,

3.3 Implications of the convergence rates