• Keine Ergebnisse gefunden

Here we iteratively applied Lemma 2.1.6 and the last estimate is due to Lemma 2.1.5.

The claim now follows by induction. Forn =1 it is true by Lemma 2.1.5. Now, suppose it is valid for some(n1)N, then

The first term is finite by the induction hypothesis, the second one can be estimated with the above calcu-lation. For theZ-part we can proceed analogly.

2.2 Least-squares Monte Carlo

To get a fully implementable algorithm we have to approximate the conditional expectations by some es-timator. In this section we describe a simulation based least-squares Monte Carlo estimator and prove its convergence. Recall that the least-squares method can be applied to estimate the conditional expectation of a square-integrable random variable, see e.g. Carrière [11] or Longstaff and Schwartz, [34]. However, we cannot guarantee that the processes(Yn,Zn)are square integrable in general under the measureP.

Therefore we cannot apply the least-squares approach directly to(Yn,Zn), but work with(Ψ0Yn0Zn) instead.

As explained above, our remaining task is to estimate Ytni = E

Consequently,E[Ψ0tiV|Fti]is the orthogonal projection on the spaceL2(Gi), whereGidenotes theσ-field generated by the random variables of the formΨ0tiv(Xti)for deterministic and measurable functionsv.

We now replace this projection by a projection on a finite-dimensional subspace. To do so, we choose, for each time partition point,D+1 sets of basis functions

{p0,i,1(·), . . . ,p0,i,K0,i(·)}for the estimation ofYtni and {pd,i,1(·), . . . ,pd,i,Kd,i(·)}for the estimation ofZnd,ti.

2.2. Least-squares Monte Carlo 15

We assume that

ηd,i,k:=Ψ0tipd,i,k(Xti)

satisfyE[|ηd,i,k|2] < ∞for everyd = 0, . . . ,D,i = 0, . . . ,N−1 andk = 1, . . . ,Kd,i, and that the vectors (ηd,i,1, . . . ,ηd,i,Kd,i) are linearly independent for every d = 0, . . . ,D, i = 0, . . . ,N−1. Now, we define Λd,i =span(ηd,i,k)and denote byPd,ithe orthogonal (in theL2-sense) projection onΛd,i. As these spaces are finite-dimensional, there are coefficientsαd,i,k(V)such that

Pd,i0tiV] =

Kd,i k=1

αd,i,k(V)Ψ0tipd,i,k(Xti). (2.8)

The inner-product matrices associated to the chosen bases are Bd,i=E£

ηd,i,kηd,i,l¤

k,l=1,...,Kd,i. (2.9)

Hence, we obtain as coefficients

αd,i(V) = (Bd,i)−1E[ηd,iV], (2.10) whereηd,i = (ηd,i,1, . . . ,ηd,i,Kd,i)> and αd,i(V) = (αd,i,1(V), . . . ,αd,i,Kd,i(V))>. Finally, the corresponding estimator forE[V|Fti] =E[V|Xti], given the basis{pd,i,1(·), . . . ,pd,i,Kd,i(·)}, is

Kd,i

k=1

αd,i,k(V)pd,i,k(Xti).

Thanks to Theorem 2.1.1 and Corollary 2.1.7 we can apply this machinery for estimatingYtni andZnd,t

i. As

estimators for these quantities we define Ybtni = (Ψ0ti)−1P0,i

·

Ψ0tNφ(XtN) +

N−1

j=i

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbn−1tj )∆j

¸

=

K0,i k=1

αn0,i,kp0,i,k(Xti), Zbnd,ti = (Ψ0ti)−1Pd,i

·∆Wd,ih

i µ

Ψ0tNφ(XtN) +

N−1

j=i+1

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbn−1tj )∆j

¶¸

=

Kd,i k=1

αnd,i,kpd,i,k(Xti) where

αn0,i = (B0,i)−1E

· η0,i

µ

Ψ0tNφ(XtN) +

N−1

j=i

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbn−1tj )∆j

¶¸

, (2.11)

and ford=1, . . . ,D αnd,i= (Bd,i)−1E

· ηd,i

µ∆Wd,ih

i µ

Ψ0tNφ(XtN) +

N−1

j=i+1

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbtn−1j )∆j

¶¶¸

, (2.12)

initialized at(Yb0,Zb0) =0.

16 2.2. Least-squares Monte Carlo

Remark 2.2.1. Note that Assumption A 7 and Theorem 2.2.2 below guarantee that the weights in (2.11) -(2.12) are finite.

In the following, we analyze the error which results from the approximation of (Ψ0tiYtni0tiZnti) with (Ψ0tiYbtni0tiZbnti). Analogly to Bender and Denk [2] this will be done in terms of the projection errors

0tiYtni −P0,i0tiYtni)| and0tiZd,tn

i −Pd,i0tiZnd,t

i)|. We extend their Theorem 11 (which corresponds to the caseh=0), reflecting the advantage of the Picard-type scheme: The error induced by the approxima-tion of the condiapproxima-tional expectaapproxima-tions does neither explode when the number of time steps tends to infinity nor does it blow up if the number of iterations grows. We simply obtain, that theL2-error is bounded by a constant times the worstL2-projection error occurring during iterations.

Theorem 2.2.2. There is a constant C depending on the data and the bound of h such that

0≤i≤Nmax Eh

for sufficiently small|π|.

Proof. Define Due to the orthogonality of the projection we also have

Eh and the analog equation holds forZnd,t

i. Consequently, we get for anyi=0, . . . ,N,

where we used in the last step that nontrivial orthogonal projections have norm 1. In the same way we get

2.2. Least-squares Monte Carlo 17

Multiplying these inequalities with the weightsλi, satisfyingλ0 =1 andλi = (1+Γ∆i−1i−1for some Γ>0 to be specified later, we obtain

0≤i≤Nmax λiE Putting estimates (2.13) - (2.14) together we obtain

0≤i≤Nmax λiE Iterating this inequality yields

0≤i≤Nmax E Hence, the claim finally follows if|π|is small enough.

Remark 2.2.3. The proof of the above theorem only made use of the fact that nontrivial orthogonal pro-jections have norm 1. Hence, it holds for orthogonal propro-jections on any, possibly infinite-dimensional, subspacesΛd,i⊂L2(Gi).

18 2.2. Least-squares Monte Carlo

The final approximation step of our algorithm replaces the expectations in (2.9) and (2.10) by their simula-tion-based counterparts, i.e. we assume that we have a number ofL maxd,i{Kd,i}independent draws from(∆Wih,Xti,φ(XtN),Ψ0ti,ηd,i)which we denote by(∆λWih,λXti,φ(λXtN),λΨ0ti,ληd,i)forλ =1, . . . ,L.

The column vectors of these copies are denoted by(∆Whi,Xti,phi(XtN),Psi0ti,ed,i), e.g.

Xti = (1Xti, . . . , LXti)>. Define

ALd,i := 1

√L

¡

ληd,i,k¢

λ=1,...,L,k=1,...,Kd,i,d=0, . . . ,D, so that

Bd,iL := (ALd,i)>ALd,i= 1 L

à L

λ=1

ληd,i,kληd,i,l

!

k,l=1,...,Kd,i

ford=0, . . . ,D

are the simulation-based analogons to the matricesBd,i. Since the inverses ofBLd,ineed not exist, we switch to the pseudo-inverses(ALd,i)+in order to introduce in recursive manner simulation based analogons to (2.11) - (2.12) with the help of the least-squares method. In detail we define:

α0,Ld,i = 0, d=0, . . . ,D,

λYbtn−1i =

K0,i k=1

(λΨ0ti)−1αn−1,L0,i,k λη0,i,k,

λZbd,tn−1

i =

Kd,i

k=1

(λΨ0ti)−1αn−1,Ld,i,k ληd,i,k,

f(ti) = (f(ti,1Sti,1Ybtn−1i ,1Zbn−1ti ), . . . ,f(ti,LSti, LYbtn−1i ,LZbn−1ti ))>, αn,L0,i = 1

√L(AL0,i)+ µ

Psi0tNphi(XtN) +

N−1

j=i

Psi0tjf(tj)∆j

¶ ,

αn,Ld,i = 1

√L(ALd,i)+

µ∆Whd,i

i µ

Psi0tNphi(XtN) +

N−1

j=i+1

Psi0tj f(tj)∆j

¶¶

,d=1, . . . ,D, wheredenotes the componentwise multiplication of two vectors and we used the abbreviationαn,Ld,i = (αn,Ld,i,1, . . . ,αn,Ld,i,K

d,i)>. This enables us to define simulation based estimators by Ybtn,Li =

K0,i

k=1

0ti)−1αn,L0,i,kη0,i,k =

K0,i

k=1

α0,i,kn,L p0,i,k(Xti), Zbn,Ld,t

i =

Kd,i k=1

0ti)−1αn,Ld,i,kηd,i,k=

Kd,i k=1

αn,Ld,i,kpd,i,k(Xti), d=1, . . . ,D.

Note that the thus constructed coefficients solve linear least-squares problems, e.g.

α0,in,L = arginf

α∈RK0,i

1 L

L λ=1

¯¯

¯α>λη0,iλΨ0tNφ(λXtN)

N−1

j=i

λΨ0tj f(tj,λStj,λYbtn−1j ,λZbtn−1j )∆j

¯¯

¯2, and similarly ford=1, . . . ,D.

Next, we derive almost sure convergence of the simulation based estimators starting with the coefficients.

2.2. Least-squares Monte Carlo 19

Lemma 2.2.4.n,L0,i , . . . ,αn,LD,i)converges P-almost surely ton0,i, . . . ,αnD,i), when L tends to infinity.

Proof. We proceed with an induction onn. Forn=0 the claim is true by definition, since α0,Ld,i,k = α0d,i,k=0, d=0, . . . ,D,k=1, . . . ,Kd,i.

Now, we suppose convergence is already proved for some(n1)N0. We only show the convergence forαn,Ld,i,kfor some fixedd=1, . . . ,D, the arguments forαn,L0,i,kare basically the same.

By the law of large numbers we have

L→∞lim Bd,iL =Bd,i, P-a.s. (2.15)

AsBd,i is invertible the same is valid forBd,iL provided L is large enough. We assumeL to satisfy this condition in the following. In particular,Ad,iL then has full rank and its pseudo-inverse can be written as

(ALd,i)+= (Bd,iL )−1(Ad,iL )>. Hence, we obtain

αn,Ld,i = (Bd,iL )−1 µ1

L

L λ=1

ληd,iλWd,ih

i µ

λΨ0tNφ(λXtN) +

N−1

j=i+1

λΨ0tjf(tj,λStj,λYbtn−1j ,λZbtn−1j )∆j

¶¶

so that due to (2.15), we only have to show for alll=1, . . . ,Kd,i 1

L

L λ=1

ληd,i,lλWd,ih

i µ

λΨ0tNφ(λXtN) +

N−1

j=i+1

λΨ0tjf(tj,λStj,λYbtn−1j ,λZbtn−1j )∆j

−→E

·

ηd,i,l∆Wd,ih

i µ

Ψ0tNφ(XtN) +

N−1

j=i+1

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbn−1ti )∆j

¶¸

P-a.s.

To do so, define

λYetn−1i =

K0,i

k=1

(λΨ0ti)−1αn−10,i,kλη0,i,k, λZed,tn−1

i =

Kd,i

k=1

(λΨ0ti)−1αn−1d,i,kληd,i,k (2.16) andλZetn−1i = (λZe1,tn−1

i , . . . ,λZen−1D,t

i)>. Note that

ληd,i,lλWd,ih

i µ

λΨ0tNφ(λXtN) +

N−1

j=i+1

λΨ0tjf(tj,λStj,λYetn−1j ,λZen−1tj )∆j

,λ=1, . . . ,L are independent and identically distributed with the same distribution as

ηd,i,l∆Wd,ih

i µ

Ψ0tNφ(XtN) +

N−1

j=i+1

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbtn−1j )∆j

¶ . Moreover, it holds that

E

·

ηd,i,l∆Wd,ih

i µ

Ψ0tNφ(XtN) +

N−1

j=i+1

Ψ0tjf(tj,Stj,Ybtn−1j ,Zbn−1tj )∆j

¶¸

20 2.2. Least-squares Monte Carlo

where we used Hölder’s inequality, the independence of∆WihandXti, the Lipschitz continuity of f and Assumption A 7. Therefore we can apply Kolmogorov’s law of large numbers, and deduce that

1

2.2. Least-squares Monte Carlo 21

The first factor converges to zero due to the induction hypothesis, the second one to a finite number due to the law of large numbers. Combining (2.15) - (2.18) yields the claim.

Consequently, we obtain theP-a.s. convergence of the simulation based estimator:

Theorem 2.2.5. (Ybtn,Li ,Zbtn,Li )converges P-almost surely to(Ybtni,Zbtni)as L tends to infinity.

Hence the claim follows in view of Lemma 2.2.4.

We now summarize the approximation of (Y0,Z0)by the modified forward scheme with importance sampling in a least-squares Monte Carlo framework:

The final estimator for(Y0,Z0)is(Ybtn,L0 ,Zbtn,L0 ). Notice that at timet0=0 the only choice for the projection It is important to see that the averaging here is over dependent paths, because the weights in the definition of(λYbtn−1j ,λZbtn−1j )depend on the whole collection of sample paths. In the very special case f =0, one averages, however, over independent paths andYbtn,L0 reduces to the usual Monte Carlo estimator for the expectation ofφ(XT)with importance sampling given byh. In the context of option pricing (with f =0) in a complete marketZ0is (up to a linear transformation) the delta of the option. The estimator Zbn,Ld,t then corresponds to the likelihood ratio delta with importance sampling. For more information on this0

classical situation we refer to Glasserman [20], Chapter 4.6 for importance sampling, and Chapter 7.3 for the likelihood ratio delta.

We now decompose the error into

|Y0−Ybtn,L0 |2+|Z0−Zbn,Lt0 |2

22 2.2. Least-squares Monte Carlo

The first term captures the error due to the time discretization and the iteration. We know from the results in Section 3 that this error does not depend on the choice ofh. In typical situations it is of order 1/2 in the mesh size of the time partition and converges exponentially in the number of iterations, see Corollary 2.1.2. Although the size of this first error term does not depend onh, we emphasize thatYtn0 is the expectation of an expression, whose variance changes withh. The second term contains the error stemming from the choice of the basis. Obviously, the weights (2.11) - (2.12) in the construction of(Ybtn0,Zbtn0) depend onh. Hence, for the second error term, the choice ofhinfluences the error term itself and the variances in the computation of the weights. By Theorem 2.2.2 the second term converges to zero when the basis increases in a way that the projections spacesΛd,i exhaust the space L2(Gi). Finally, the third term covers the simulation error. Thanks to Theorem 2.2.5 this error converges to zero almost surely as the number of paths tends to infinity.

Conclusively, we want to mention a few words, what we do and what we do not in this chapter: The objective of the importance sampling procedure, introduced in this chapter, is to reduce the third error in the above decomposition by a judicious choice ofh. However, nothing is said about how to choose it in practice. For this purpose it would be nice to have some theoretical criterion how to do this. Regarding this concern in the context of option pricing one is rather left pessimistic on the ability to establish a general rule covering any setting.

We do not go in further details here and come back to this task in an important class of BSDEs in Chapter 4, where we also illustrate the success of the importance sampling by several numerical examples.

Chapter3

L 2 -convergence for the Picard-type estimator

The scope of this chapter is to establish anL2-convergence theorem for a variant of the Picard type al-gorithm of Bender and Denk [2]. Thereby, we have to circumvent the problem that arises out of taking means with respect to dependent random variables, see for this purpose (2.19) and (2.20) withλΨ0tj 1 for j=0, . . . ,N. Thus the variance of the estimator can not be written as sum of the variances of the individ-ual random variables and we cannot apply the usindivid-ual methodology. Nonetheless, such a theorem is more than wishful, since the overall results of Bender and Denk [2] only yield convergence in probability of the final estimator towards the solution of the BSDE at time zero, whereas in the first two approximation steps they can show convergence in the strongerL2-sense.

With the help of concepts from nonparametric statistics we can overcome these difficulties, however, we have to deal with several technical details and a demanding notation. At first, we have to make sure that certain processes are bounded.

3.1 Bounded processes

Additionally to the Assumptions (A 1) - (A 5) we now impose for the whole chapter:

A 8. The functions f andφare bounded by some constant R>0.

At first sight this seems rather restrictive, however, we can regard this as first approximation of the origi-nal equations in the following sense: Defining

fR(t,x,y,z):=



R, if f(t,x,y,z)>R, f(t,x,y,z), if −R≤ f(t,x,y,z)≤R,

−R, if f(t,x,y,z)<−R, φR(x):=



R, ifφ(x)>R, φ(x), if −R≤φ(x)≤R,

−R, ifφ(x)<−R

we obtain Lipschitz continuous functions bounded byR. Thus assuming A 8 is equivalent to considering dSt=b(t,St)dt+σ(t,St)dWt, S0=s0,

dYtR=−fR(t,St,YtR,ZtR)dt+ZtRdWt, YTR=φR(XT)

23

24 3.1. Bounded processes

where we simply truncated the functions f and φ in absolute value at some large R. Theorem 3.3 of Yong and Zhou [44] then implies the unique solvability of this modified equation and furthermore we can obtain an estimate for the difference of the solutions of the original equation and the modified one. To avoid an even more complex notation we only mention this point of view and rather accept the further assumption.

We consider in the sequel some variant of the Picard-type discretization of BSDEs introduced by Bender and Denk [2].

For a fixed partition, their approximation scheme for the backward part is given fori=0, . . . ,Nby Ytni = E

·

φ(XtN) +

N−1

j=i

f(tj,Stj,Ytn−1j ,Ztn−1j )∆j

¯¯

¯¯Fti

¸ ,

Zd,tn i = E

·∆Wd,i

i µ

φ(XtN) +

N−1

j=i+1

f(tj,Stj,Ytn−1j ,Zn−1tj )∆j

¶¯¯

¯¯Fti

¸

initialized at(Y0,Z0) = (0, 0).

In order to obtain bounded, discrete processes, we truncate the Brownian increments in the above iter-ation scheme and end up with the following approximiter-ation for the solution of the backward equiter-ation suppressing the dependency on the partitionπ:

Ytn,Ri = E

·

φ(XtN) +

N−1

j=i

f(tj,Stj,Ytn−1,Rj ,Ztn−1,Rj )∆j

¯¯

¯¯Fti

¸ ,

Zn,Rd,t

i = E

·[∆Wd,i]wi

i µ

φ(XtN) +

N−1

j=i+1

f(tj,Stj,Ytn−1,Rj ,Ztn−1,Rj )∆j

¶¯¯

¯¯Fti

¸ ,

which we also initialize with(Y0,R,Z0,R) = (0, 0)and constantly extend between two points in the time grid.

Hereby, the mapping[·]witruncates atR0

p∆ifor someR0>0, meaning forx Rwe have

[x]wi = (−R0

p∆i∨x)∧R0

p∆i. Moreover, forx∈RDwe also write[x]wi = ([x1]wi, . . . ,[xD]wi)>.

For a fixed partition, we now can show the boundedness of the discretized solution of the BSDE:

Lemma 3.1.1. There is a constant Cyonly depending on the bound of f andφand a constant Cπz also depending on the partition, such that for i=0, . . . ,N and any n∈N0holds

|Ytn,Ri | ≤Cy and |Zd,tn,R

i| ≤Cπz.

Proof. Jensen’s inequality and the boundedness ofφand f imply for anyi=0, . . . ,Nand anyn∈N

|Ytn,Ri | =

¯¯

¯¯E

·

φ(XtN) +

N−1

j=i

f(tj,Stj,Ytn−1,Rj ,Zn−1,Rtj )∆j

¯¯

¯¯Fti

¸¯¯

¯¯

E

·¯¯¯φ(XtN)

¯¯

¯+

N−1

j=i

¯¯

¯f(tj,Stj,Ytn−1,Rj ,Zn−1,Rtj )

¯¯

¯∆j

¯¯

¯¯Fti

¸

E£

R+RT|Fti

¤=R(1+T) =:Cy.

3.1. Bounded processes 25

We now turn to the aim of this section. We show that the truncation of the Brownian increments cause an error converging fast to zero, ifR0exceeds all bounds. We separate the proof of this property in different steps to improve clarity:

Proposition 3.1.2. For any n N,Γ >0andλi, i =0, . . . ,N−1, withλ0 =1andλi+1 = (1+Γ∆ii for

Proof. This can be shown exactly as in Bender and Denk [2], Lemma 7, Step 2.

The second result is devoted to theZ-part of the solution and also uses ideas of Gobet et al. [22]:

Proposition 3.1.3. For any n∈N,γ,Γ>0andλi, i=0, . . . ,N−1, withλ0=1andλi+1= (1+Γ∆iifor Taking squares and expectations and multiplying with∆iyields together with the inequalities of Young,

26 3.1. Bounded processes

We plug-in the definition of the Picard-type and its modified approximation scheme and introduce the following abbreviation

Multiplying both sides of the inequality withλiand summing up from 0 toN−1 we obtain via Young’s inequality and the Lipschitz property off for anyγ>0:

N−1

3.1. Bounded processes 27

Together with Proposition 3.1.2 this yields for theD-dimensional process

N−1

We are now able to estimate the error in then-th Picard-iteration against the error in the first iteration and the error resulting out of the truncation of the Brownian increments:

Corollary 3.1.4. For n∈N,γ=8DK2(T+1),Γ=4K2(T+1)(2DγT+1)it holds

28 3.1. Bounded processes

Proof. Adding the terms in the preceding propositions yields for anyn∈N

0≤i≤N−1max λiE

ChoosingΓandγas indicated above we obtain the claim via an iterated application of the above estimate.

We are left to give an upper bound for the error in the first Picard iteration which we will do next:

Lemma 3.1.5. It holds

0≤i≤N−1max λiEh

Together with the inequality of Proposition 3.1.3 (see (3.1) - (3.3)) we obtain the following estimate for the Z-part of the solution:

N−1

For the preparations we still need the following lemma specifying an upper bound for the error resulting out of the truncation of the Brownian increments:

Lemma 3.1.6. There is a constant C, such that for any n∈N

Proof. For anyn∈Nand a generic constantCwe can calculate

N−1