• Keine Ergebnisse gefunden

Estimation of the occurring error terms and probabilities of exception sets

3.3 Estimation of the occurring error terms and probabilities of excep-tion sets

We collect a series of auxiliary results before we can gain outcomes in the next section. We start with the probabilities occurring in Corollary 3.2.7, where we can proceed similar to Gobet et al. [22]:

Lemma 3.3.1. For any i=0, . . . ,N−1, d=1, . . . ,D it holds:

P([A1,L0,i]c) 2E

"

K0,iL exp Ã

L∆iβ 8K0,iL R2(1+T)2

!#

,

P([A1,Ld,i]c) 2E

"

Kd,iL exp Ã

L∆β+1i 8KLd,iR2(1+T)2R20

!#

,

and for n=2, 3, . . .

P([An,L0,i ]c) 2E

· N−1

j=i+1

µ

N2³ β/2 i

12 q

KL0,iTK2jN,[P0,j]y,(λXitj,λXtj)λ=1,...,L´

·

D d=1

N2

³ β/2i 12q

K0,iLTK2jND,[Pd,j]z,(λXitj,λXtj)λ=1,...,L

´¶

·K0,iL exp µ

L∆βi 72K0,iL R2(1+T)2

¶¸

.

P([An,Ld,i ]c) 2E

· N−1

j=i+1

µ

N2³ (β+1)/2 i

12 q

KLd,iTK2jNR20,[P0,j]y,(λXitj,λXtj)λ=1,...,L´

·

D e=1

N2

³ (β+1)/2i 12q

Kd,iLTK2jNDR20,[Pe,j]z,(λXitj,λXtj)λ=1,...,L

´¶

·Kd,iL exp µ

L∆β+1i 72Kd,iL R2(1+T)2R20

¶¸

.

Proof. The estimate for the first probability is least complicated since the occurring coefficients do not depend on random estimators. For anyi=0, . . . ,N−1 we have

P([A1,L0,i]c) = P Ã1

L

L λ=1

|(eα1,L0,i −α1,L0,i)>(pλ0,i)>|2βi

!

= P Ã

(eα1,L0,i −α1,L0,i)>1 L

L λ=1

(pλ0,i)>pλ0,i(eα1,L0,i −α1,L0,i)iβ

!

= P µK0,iL

k=1

|eα1,L0,i,k−α1,L0,i,k|2βi

= E

· PiL

µK0,iL

k=1

|eα1,L0,i,k−α1,L0,i,k|2iβ

¶¸

46 3.3. Estimation of the occurring error terms and probabilities of exception sets

where we used(BL0,i)>B0,iL

L =IdKL

0,i. Due to the definition of the coefficients we have:

PiL which are independent of any other occurring random variables. The last equality is true sincepλ0,i,kis measurable with respect toFiL, the random variables(∆λWj,λXtj+1)and(∆λWj,λXitj+1),j=i, . . . ,N−1

Now, Hoeffding’s inequality (see the appendix) yields for anyk=1, . . . ,K0,iL the following estimate:

PNL,i

3.3. Estimation of the occurring error terms and probabilities of exception sets 47

The second probability gets a little bit more involved: For anyd = 1, . . . ,Dand i = 0, . . . ,N−1 the definition of the coefficients yields

P([A1,Ld,i]c) = P Again, we insert a sequence of i.i.d. Bernoulli random variables denoted byUλ with P(Uλ = 1) = 12 andP(Uλ =−1) = 12. This does not change the probability since(∆λWj,λXtj+1)and(∆λWj,λXitj+1)for j=i, . . . ,N−1 conditioned toFiLare identically distributed. Hence, we further consider:

PiL

48 3.3. Estimation of the occurring error terms and probabilities of exception sets

yields due to the independence ofUλfork=1, . . . ,KLd,i

EL,iN[Hd,λ,k] =0

and

|Hd,λ,k| ≤ |pλd,i,k| ·2R(1+T)R0

p∆i.

Hoeffding’s inequality implies for arbitraryk=1, . . . ,Kd,iL the following estimates:

PNL,i

¯¯

¯¯

¯ 1 L

L λ=1

Hd,λ,k

¯¯

¯¯

¯ vu ut∆β+2i

KLd,i

2 exp Ã

2L∆β+2i

Kd,iL 1LLλ=1|4R(1+T)R0p

ipλd,i,k|2

!

= 2 exp Ã

L∆β+1i

8KLd,iR2(1+T)2R201LLλ=1|pλd,i,k|2

!

= 2 exp Ã

L∆β+1i 8KLd,iR2(1+T)2R20

! .

Gathering the results yields

P([A1,Ld,i]c)2E

"

Kd,iL exp Ã

L∆β+1i 8KLd,iR2(1+T)2R20

!#

.

Forn=2, 3, . . . we additionally use methods from nonparametric statistics. For a fixed timei=0, . . . ,N− 1 in the grid we have:

P([An,L0,i ]c) = P Ã1

L

L λ=1

|(eαn,L0,i −α0,in,L)>(p0,iλ)>|2βi

!

= P Ã

(eαn,L0,i −αn,L0,i )>1 L

L λ=1

(p0,iλ)>pλ0,i(eαn,L0,i −αn,L0,i)iβ

!

= P µKL0,i

k=1

|eαn,L0,i,k−αn,L0,i,k|2βi

= E

· PiL

µK0,iL

k=1

|eαn,L0,i,k−αn,L0,i,k|2βi

¶¸

.

3.3. Estimation of the occurring error terms and probabilities of exception sets 49 variablesUλcannot be inserted without difficulties as before. The way out of this problem leads via the introduction of covers: Due toyn−1,Lj [P0,j]yandzd,n−1,Lj [Pd,j]zwe obtain for anyk=1, . . . ,KL0,i: where againUλare i.i.d. Bernoulli random variables independent of any other with the same property as above. The last equality is true since(∆λWj−1,λXtj)and(∆λWj−1,λXitj)forj=i+1, . . . ,Nconditioned

50 3.3. Estimation of the occurring error terms and probabilities of exception sets Without loss of generality we can assume that elements ofG0,jare bounded byCy and elements ofGd,j, are bounded byCzπ. Moreover,G0,jandGd,jdepend on(λXitj,λXtj)λ=1,...,Lbut not onUλ. variablesλV, the Lipschitz property of f and Young’s inequality further imply:

¯¯

3.3. Estimation of the occurring error terms and probabilities of exception sets 51

where we took in the last inequality the number of all combinations of functions fromG0,j andGd,j,j = i+1, . . . ,N−1 andd=1, . . . ,Dand weighted them with the maximal probability. Applying Hoeffding’s inequality in the usual way to the last conditional probability finally yields

P([An,L0,i ]) 2E

Combining the techniques from the estimate forP([A1,Ld,i]c)and the estimate forP([An,L0,i ]c)concerning the statistical tools we also obtain an upper bound forP([An,Ld,i]c). For fixedi=0, . . . ,N−1,d=1, . . . ,Dand

52 3.3. Estimation of the occurring error terms and probabilities of exception sets Due to the definition of the coefficients it holds:

PiL

Again, we cannot insert the Bernoulli random variablesUλ at once and consequently introduce again covers of function classes: Sinceyn−1,Lj [P0,j]yandze,n−1,Lj [Pe,j]z(e=1, . . . ,D), for anyk=1, . . . ,Kd,iL

3.3. Estimation of the occurring error terms and probabilities of exception sets 53 which are independent of any other. The last equality is due to the conditionally identical distribution of (∆λWtj,λXtj+1)and(∆λWtj,λXitj+1)for j =i, . . . ,N−1 conditioned toFiL. Again, we introduce covers Without loss of generality we can again assume that elements ofG0,jare bounded byCyand elements of Ge,jare bounded byCπz. Moreover,G0,jandGe,jdepend on(λXitj,λXtj)λ=1,...,L, but not onUλ. Cauchy-Schwarz and Young’s inequality and the Lipschitz continuity off yield:

¯¯

54 3.3. Estimation of the occurring error terms and probabilities of exception sets Thus, we can conclude

PiL

3.3. Estimation of the occurring error terms and probabilities of exception sets 55

Because of the independence ofUλwe obtain

ENL,i[Hd,λ,k] =0 and

|Hd,λ,k| ≤ 2R(1+T)R0

p∆i|pλd,i,k|.

Hoeffding’s inequality implies the following estimate:

PNL,i Thus, we finally obtain:

P([An,Ld,i]c) 2E

The probabilities describing the exception set concerning the change from ghost sample to original sample are estimated next:

56 3.3. Estimation of the occurring error terms and probabilities of exception sets

which can be seen by further conditioning onUλ.

We now introduce a coverGof the function class[P0,j]y−yn−1j such that for anyξ [P0,j]y−yn−1j there We can assume that elements ofG are bounded by 2Cy. Recall also thatG depends on the simulations (λXtj,λXitj)λ=1,...,Lbut not on the Bernoulli random variables(Uλ)λ=1,...,L. Moreover, the random variable

Now, letg∈ G satisfying (3.28). Then, we obtain vu

and the first and fourth summand of the last expression can be estimated by:

2

3.3. Estimation of the occurring error terms and probabilities of exception sets 57

Since on the set we are interested in A :=

q1

Thus, combining these results we obtain:

PjL,i

58 3.3. Estimation of the occurring error terms and probabilities of exception sets

Hoeffding’s inequality implies

PjL,i Ã1

LLλ=1Uλ n

|g(λXitj)|2− |g(λXtj)|2 o r

1

LLλ=1|g(λXtj)|2+|g(λXitj)|2

1 3∆β/2j

!

2 exp

−2L∆βj L1Lλ=1|g(λXtj)|2+|g(λXitj)|2 361LLλ=1

³

|g(λXitj)|2− |g(λXtj)|2

´2



=2 exp

 L∆βjLλ=1|g(λXtj)|2+|g(λXitj)|2 18∑λ=1L

³

|g(λXitj)|2− |g(λXtj)|2´2

.

Moreover, because of

L λ=1

³

|g(λXitj)|2− |g(λXtj)|2

´2

L λ=1

|g(λXitj)|4+|g(λXtj)|4

4Cy2

L λ=1

|g(λXitj)|2+|g(λXtj)|2

we gain

2 exp

− L∆βjLλ=1|g(λXtj)|2+|g(λXitj)|2 18∑λ=1L

³

|g(λXitj)|2− |g(λXtj)|2

´2

2 exp

−L∆βj 72Cy2

.

Thus, finally we can conclude

P([An,Ly,ji]c)2 exp

−L∆βj 72C2y

E

·

N2³β/2

j

3

2,[P0,j]y−yn−1j ,(λXtj,λXitj)λ=1,...,L´¸

.

For the probability concerning theZ-part we can copy large parts of the previous proof only noting the higher dimension:

Lemma 3.3.3. For n∈N, i=0, . . . ,N−1and j=i, . . . ,N−1holds:

P([An,Lz,ji]c) 2 exp

L∆βj 72D(Czπ)2

D

d=1

E

·

N2³ β/2

j

3

2D,[Pd,j]z−zd,n−1j ,(λXtj,λXitj)λ=1,...,L´¸

.

Proof. As before we define(Uλ)λ=1,...,L to be i.i.d. Bernoulli random variables independent from every-thing else, satisfyingP(Uλ = 1) = 12 = P(Uλ = −1). Furthermore we define for fixedi = 0, . . . ,N−1 and fixedj=i, . . . ,N−1

Bλ= λXitj andBL+λ= λXtj, ifUλ =1, Bλ= λXtj andBL+λ= λXitj, ifUλ =−1.

3.3. Estimation of the occurring error terms and probabilities of exception sets 59

As in the last proof we obtain

PiL

which again can be seen by further conditioning onUλ.

We now introduce in a whole D covers, that is Gd, d = 1, . . . ,D of [Pd,j]z−zd,n−1j such that for any

60 3.3. Estimation of the occurring error terms and probabilities of exception sets

Since on the set we are interested in,A:=

q1

3.3. Estimation of the occurring error terms and probabilities of exception sets 61

Hoeffding’s inequality yields

PjL,i Ã1

Lλ=1L UλDd=1 n

|gd(λXitj)|2− |gd(λXtj)|2o r

L1Lλ=1Dd=1|gd(λXtj)|2+|gd(λXitj)|2

1 3∆β/2j

!

2 exp

−2L∆βj 1Lλ=1Ld=1D |gd(λXtj)|2+|gd(λXitj)|2 361LLλ=1

³

d=1D |gd(λXitj)|2− |gd(λXtj)|2

´2



2 exp

 L∆βjλ=1Ld=1D |gd(λXtj)|2+|gd(λXitj)|2 18D∑Lλ=1Dd=1

³

|gd(λXitj)|2− |gd(λXtj)|2´2

.

Moreover, from

L λ=1

D d=1

³

|gd(λXitj)|2− |gd(λXtj)|2

´2

L λ=1

D d=1

|gd(λXitj)|4+|gd(λXtj)|4

4(Czπ)2

L λ=1

D d=1

|gd(λXitj)|2+|gd(λXtj)|2

we can derive

2 exp

− L∆βjλ=1LDd=1|gd(λXtj) +|gd(λXitj)|2 18D∑λ=1Ld=1D

³

|gd(λXitj)|2− |gd(λXtj)|2

´2

2 exp

L∆βj 72D(Cπz)2

.

Gathering the results we can conclude

P([An,Lz,ji]c) 2 exp

L∆βj 72D(Czπ)2

D

d=1

E

· N2

³ β/2

j

3

2D,[Pd,j]z−zd,n−1j ,(λXtj,λXitj)λ=1,...,L

´¸

.

As last exception set we consider a probability which does not appear in the Euler-type scheme of Gobet et al. [22] and can therefore be seen as typical for our kind of approximation:

Lemma 3.3.4. For any n∈Nand i=1, . . . ,N−1it holds

P([An,Lz i−1,Li]c) E

·N−1

j=i

µ N2

³ β+1i 12q

TK2jN,[P0,j]y,(λXi−1tj ,λXitj)λ=1,...,L

´

+

D d=1

N2³ β+1 i

12 q

TK2jND,[Pd,j]z,(λXi−1tj ,λXitj)λ=1,...,L´¶¸

·2 exp Ã

L∆2(β+1)i 72R2(1+T)2

! .

62 3.3. Estimation of the occurring error terms and probabilities of exception sets

Chow and Teicher [13], Corollary 7.3.3 moreover implies EiL,i−1

Due to the independence of all occurring Brownian increments, we know that for fixedξd,j,d=0, . . . ,D, j=i, . . . ,N−1 (3.30) can be written as

G(∆λW0, . . . ,∆λWi−2,∆λWi−1)

for some Borel functionG. Moreover, since the structure of term (3.31) for fixedξd,j,d = 0, . . . ,Dand j=i, . . . ,N−1 is identical we obtain that this term can be represented as

G(∆λW0, . . . ,∆λWi−2,∆λWi−1).

3.3. Estimation of the occurring error terms and probabilities of exception sets 63

Hence, conditioned toFi−1L (3.30) and (3.31) are identically distributed. Moreover,yni(λXi−1ti )andyni(λXiti) satisfy this property as well, such that in principle we can proceed similar to the last proof.

That means, we define again i.i.d. random variablesUλ,λ =1, . . . ,Lwith P(Uλ =1) = 12 andP(Uλ =

−1) = 12, which are independent from all other random variables.

Furthermore, we introduce

Since the random variables under the square root givenFi−1L are identically distributed we obtain the equality

for which the analog properties and notations are used as in the former proofs. To ease notation we define

64 3.3. Estimation of the occurring error terms and probabilities of exception sets

Letg0,jandgd,jsatisfying (3.32) and (3.33) respectively. Then, as in the last proofs, we obtain vu

which is again due to the Lipschitz continuity off, Young’s and the Cauchy-Schwarz inequality. Conse-quently, it holds

3.3. Estimation of the occurring error terms and probabilities of exception sets 65 where we used the definition

Vλ := holds, Hoeffding’s inequality implies

PNL,i−1 which is due to

L which is the assertion.

66 3.3. Estimation of the occurring error terms and probabilities of exception sets

We now show that all occurring covering numbers are finite which implies that the above probabilities for fixed|π|andKd,i,d=0, . . . ,D,i=0, . . . ,N−1 converge to 0, ifL→∞exceeds all bounds.

Lemma 3.3.5. For n∈N,ε<1, i=0, . . . ,N−1, j=i, . . . ,N−1and d=1, . . . ,D holds:

N2³

ε,[P0,j]y−ynj,(λXtj,λXitj)λ=1,...,L´

3 µ8eCy2

ε2 ln µ12eC2y

ε2

¶¶K0,j+1

, (3.34)

N2

³

ε,[Pd,j]z−zd,nj ,(λXtj,λXitj)λ=1,...,L

´

3

³8e(Cπz)2 ε2 ln

³12e(Czπ)2 ε2

´´Kd,j+1

. (3.35)

Proof. We can proceed word by word as Lemor [33] and only give the proof for the sake of completeness.

Leti=0, . . . ,N−1 be fixed. Lemma 9.2 of Györfi et al. [26] then implies for anyj=i, . . . ,N−1:

N2

³

ε,[P0,j]y−ynj,(λXtj,λXitj)λ=1,...,L

´

= N2

³

ε,[P0,j]y,(λXti,λXiti)λ=1,...,L

´

= N2

³

ε,[P0,j]y+Cy,(λXtj,λXitj)λ=1,...,L

´

≤ M2

³

ε,[P0,j]y+Cy,(λXtj,λXitj)λ=1,...,L

´ , whereM2

³

ε,[P0,j]y+Cy,(λXtj,λXitj)λ=1,...,L

´

denotes the ε-packing number of[P0,j]y+Cy on the ran-dom variables(λXtj,λXitj)λ=1,...,L. For a definition of packing numbers see Appendix A.3.2.

Now,[P0,j]y+Cy is a class of positive functions bounded by 2Cy. Since usually 0 < ε < 1 < C4y holds and the Vapnik-Chervonenkis dimension of the subgraphsV([P0,j]y+Cy)+ is larger than 2 (for both terms we refer to Appendix A.3.3), we can apply Theorem 9.4 of Györfi et al. [26] and obtain

M2³

ε,[P0,j]y+Cy,(λXtj,λXitj)λ=1,...,L´

3 µ

2e(2Cy)2 ε2 ln

µ

3e(2Cy)2 ε2

¶¶V

([P0,j]y+Cy)+

= 3 µ8eC2y

ε2 ln µ12eC2y

ε2

¶¶V

([P0,j]y+Cy)+

. Hence, it is left to bound the exponent. To do so we first showV([P0,j]y+Cy)+ ≤V([P0,j]y)+: IfV([P0,j]y+Cy)+ =∞we consider in each case a fixed subset inR2:

{(z1,t1), . . . ,(zm,tm)}.

For any m N and any subset I of {1, . . . ,m} there is in this case by the definition of the Vapnik-Chervonenkis dimensiong∈[P0,j]y+Cysuch that

g(zk) tk, fork∈ Iand g(zk) < tk, fork∈/ I.

This implies for anym∈Nand any subsetIof{1, . . . ,m}there isg∈[P0,j]ysatisfying g(zk) tk−Cy, fork∈Iand

g(zk) < tk−Cy, fork/∈I and we can concludeV([P0,j]y)+ =∞.

IfV([P0,j]y+Cy)+ =mfor somem∈N, the above argumentation yieldsV([P0,j]y)+ ≥m.

In the next step we showV([P0,j]y)+ ≤V(P0,j)+:

IfV([P0,j]y)+ =∞we argue as follows: We again consider a subset ofR2 {(z1,t1), . . . ,(zm,tm)},

3.3. Estimation of the occurring error terms and probabilities of exception sets 67

which is now shattered by([P0,j]y)+, see Definition A.3.4 in the appendix for the meaning of the so-called shatter coefficients. As required for anym∈Nand any subset Iof{1, . . . ,m}there is a functiong∈ P0,j satisfying

[g]y(zk) < tk, fork∈Iand [g]y(zk) tk, fork/∈I.

It is enough to show

g(zk) [g]y(zk), fork∈Iand g(zk) [g]y(zk), fork/∈I,

to find an element of(P0,j)+which also identifiesIand consequently alsoV(P0,j)+ =∞.

Assuming there isk∈Iwithg(zk)>[g]y(zk). Then, necessarilyg(zk)>Cyand hence[g]y(zk) = Cyand tk >Cy. Thus, it is impossible to find a function in[P0,j]yidentifying the complement ofkin{1, . . . ,m}:

For if there was suchg0, it must hold[g0]y(zk)≥tk >Cyand we obtain a contradiction. Consequently, it holdsg(zk)[g]y(zk)fork∈ I.

On the other hand assuming the existence ofk∈/ Isatisfyingg(zk)<[g]y(zk). Then necessarilyg(zk) <

−Cy holds and moreover[g]y(zk) = −Cy and tk ≤ −Cy are valid. Hence, it is not possible to find a functiong0 [P0,j]yidentifying the set{(zk,tk)}. In that case, we would have[g0]y(zk)< tk ≤ −Cyalso leading to a contradiction.

IfV([P0,j]y)+ =mfor somem∈Nholds, the above argumentation impliesV(P0,j)+ ≥m.

Theorem 9.5 and the remark at p. 152 in the upper part in Györfi et al. [26] now yieldV(P0,j)+ ≤K0,j+1.

Thus, finally we obtain:

N2

³

ε,[P0,j]y−ynj,(λXtj,λXitj)λ=1,...,L

´

3 µ8eC2y

ε2 ln µ12eCy2

ε2

¶¶K0,j+1 .

The estimate for the second covering number can be derived analogly where we now have to consider subgraphs inRD+1.

Remark 3.3.6. Estimates (3.34) and (3.35) also hold for the covering numbers occurring in Lemma 3.3.1 and 3.3.4. For the former this is obvious. The second group of covering numbers conditioned toFi−1L has the same structural properties as those covering numbers we explicitly examined in the above lemma.

The tower-property of conditional expectations then completes the proof.

We now turn to the error terms, which can be treated analogly to the error terms in Gobet et al. [22]. At first, we consider the terms containing typical projection errors.

Lemma 3.3.7. For n∈N, i=0, . . . ,N−1and d=1, . . . ,D holds

T1,in,L R2(1+T)2E[K0,iL ]

L +inf

α E h

|yni(Xti)−α·p0,i(Xti)|2 i

, T3,d,in,L R2(1+T)2E[Kd,iL ]

L∆i + 1

i inf

α E h

|p

izd,ni (Xti)−α·pd,i(Xti)|2 i

.

68 3.3. Estimation of the occurring error terms and probabilities of exception sets

which is due to the measurability of the involved random variables and Jensen’s inequality.

We now apply a conclusion of Theorem 11.1 in Györfi et al. [26] also used by Lemor [33], see Corollary A.4.2 in the appendix, and obtain

T1,in,L sup dimen-sional subspace ofP0,i. The term containing the variance can be estimated using the boundedness ofyni as follows: Analogly, we can derive the estimate forT3,d,in,L:

T3,d,in,L = E Again, the conclusion of Theorem 11.1 in Györfi et al. [26] yields

T3,d,in,L sup

Here, the variance term can be further bounded by Varhp

3.3. Estimation of the occurring error terms and probabilities of exception sets 69

The last remaining terms yield estimates, which already appeared in the lemma above:

Lemma 3.3.8. For any n∈N, i=0, . . . ,N−1and d=1, . . . ,D holds

L =Id. Furthermore, we define V:= This yields the representation

eα0,in,L = 1 inde-pendent from each other, it holds

EL

70 3.3. Estimation of the occurring error terms and probabilities of exception sets

Consequently, the terms of the matrix(V−EL[V])(V−EL[V])>, which are outside of the main diagonal, have conditional expectation (with respect toFL) of 0. This would be wrong if we would not have used the ghost sample. For the terms on the main diagonal we obtain because of the boundedness ofφand f:

ELh

Gathering the results yields:

T2,in,L = E

where the inequality is due to (3.36) and tr(A)denote the trace of a matrixA. Analogly, we can proceed forT4,d,in,L, where nowBd,iL =

L =Id. Moreover, we define

V:=