A Picard-type Iteration for Backward Stochastic Differential Equations : Convergence and Importance Sampling

(1)

A Picard-type Iteration for Backward Stochastic Differential Equations:

Convergence and Importance Sampling

Dissertation zur Erlangung des akademischen Grades Doktor der Naturwissenschaften

am Fachbereich Mathematik und Statistik der Universität Konstanz

vorgelegt von Thilo Moseler

Tag der mündlichen Prüfung: 10.06.2010

Referenten: Prof. Dr. Robert Denk (Universität Konstanz) Prof. Dr. Christian Bender

(Universität des Saarlandes, Saarbrücken)

Konstanzer Online-Publikations-System (KOPS) URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-121057

URL: http://kops.ub.uni-konstanz.de/volltexte/2010/12105/

(2)

(3)

Acknowledgements

I would like to express my gratitude to several people who helped me during the work on this thesis.

First of all, I would like to thank my supervisor Professor Dr. Robert Denk, who gave me the opportu- nity to study a challenging topic and always supported me in every conceivable way including many mathematical suggestions.

In the same breath, I want to mention Professor Dr. Christian Bender to whom I am greatly indebted for his continuous support and great hospitality. I enjoyed four very intensive stays in Braunschweig and Saarbrücken, where I had many fruitful discussions with him and his whole research group. Moreover, large parts of this thesis developed during these days.

Moreover, I am very grateful to my fellow students and friends Mario Kaip, Michael Pokojovy and Olaf Weinmann and the other members of the PDE research group. I appreciated the pleasant atmosphere and the permanent cooperativeness. In particular, I would like to highlight their great willingness to discuss any problem despite the fact that our research topics were miles away from each other.

Financial support by the DFG (German Research Foundation) via the research unit 518 ‘Price, Liquidity and Credit Risks: Measurement and Distribution’ is gratefully acknowledged.

Last but not least, I would like to thank my parents Christa and Walter Moseler, my sister Anke, and all my friends for their patience and their constant encouragement during my whole studies.

Konstanz, April 2010 Thilo Moseler

i

(4)

(5)

Introduction

The object of investigation in this thesis are backward stochastic differential equations, BSDE for short.

More precisely, we aim at solving numerically decoupled forward-backward stochastic differential equations (FBSDEs) driven by a Brownian motionW, of the form

dSt = b(t,St)dt+σ(t,St)dWt, S0=s0, dYt = −f(t,St,Yt,Zt)dt+ZtdWt, Y_T =Φ(S).

The origin of such stochastic equations with terminal condition is found in Bismut [7] in the early 1970s where optimal control problems are considered. However, it took until 1990, when Pardoux and Peng [41] published their result about existence and uniqueness for a broad class of also nonlinear BSDEs.

Afterwards, a widespread development in the theory of such equations started, mainly driven by the numerous applications in mathematical finance, see the books of El Karoui [17], Ma and Yong [37], Yong and Zhou [44] and the survey article of El Karoui, Peng and Quenez [18].

At first, the numerics of BSDEs could not keep up with the speed of the development in the theory and accelerated only in recent years. The starting point for numerical schemes for FBSDEs was the theoretical Four Step Schemeof Ma, Protter and Yong [35], from which Douglas, Ma and Protter [16] developed an algorithm in 1996 approximating the solution of a parabolic partial differential equation related to the BSDE.

A totally different approach was followed later on by Bally [1] and Chevance [12]. They tried to solve the equation directly with the help of stochastic techniques using random time partitions under strong regularity assumptions. Unfortunately, their algorithms are hardly implementable. In 2002, Ma et al. [36]

suggested a similar approach, where the Brownian motion in the equation is replaced by a binary random walk.

The trigger for the research on the numerics of FBSDEs was the work of Zhang [45, 46], which established new results about the regularity of the second part of the solutionZwithout involving the derivatives of the coefficient functions of the BSDE. This allowed for a convergence proof under rather weak assumptions with a deterministic time partition.

In recent years several algorithms were introduced based on these tools. They can be distinguished into categories along different criteria: The first characteristic is the time direction. The algorithms of Bouchard and Touzi [9], Gobet et al. [21, 22]¹and Zhang [45, 46] are working backward in time and are based on a proceeding as in the Euler-Maruyama scheme for forward SDEs. We therefore talk of Euler-type schemes, whose characteric is a nesting of conditional expectations backward in time.

Only last year Zhao, Wang and Peng [47] proposed aθ-scheme for BSDEs, which transfers the ideas of the θ-scheme for forward SDEs and improves the error estimates forZunder rather restrictive assumptions.

In contrast to these algorithms the Picard-type schemes of Bender and Denk [2] and Labart [31], Chapter III, do not reverse time and thereby avoid nested conditional expectations. However, they have to put up with nestings of Picard iterations as used in the existence proof of Pardoux and Peng [41].

A second categorization can be made via the type of estimator used to approximate conditional expectations. While Zhao et al. [47] employ a Gauss-Hermite quadrature rule most fully implementable algo-

1For [22] we should rather write Lemor et al. since this is the original order. However, to get things simpler and more stadardized we use the other notation. We do not want to downweight the contribution of J. Lemor in this way.

iii

(6)

iv Introduction

rithms apply different Monte Carlo techniques. These in turn are based either on Malliavin calculus as the scheme of Bouchard and Touzi [9], on nonparametric regression as that of Labart [31] or most popular on least-squares Monte Carlo, see Bender and Denk [2], Bender and Zhang [5] and Gobet et al. [21, 22].

The outcome of these Monte Carlo algorithms once implemented are discrete time stochastic processes or considered at a specific point in the time grid random variables. Hence, starting the algorithm with different seeds for the simulations we end up with different outcomes of these random variables.

If one focuses on the applications of FBSDEs in mathematical finance and even more specific in option pricing, a particular high empirical variance of the estimators especially arises for out of the money options or more general for options containing some rare event feature. In this kind of utilization,Stypically represents the price processes of some underlyings,Y is the price process of the option,Φis the payoff function andZis in simple cases a linear transformation of the hedging portfolio,

From a practitioners point of view, who is interested in the initial option priceY0this variability is clearly annoying and he wants to reduce this effect. Using such a Monte Carlo algorithm he is faced with an estimator of the form

Yb₀= 1 L

∑

L λ=1

θ_λ

where we for the moment assume thatθ_λ,λ =1, . . . ,Lare independent and identically distributed random variables to get things simple. Hence,

Var[Yb₀] = Var[θ₁] L

such that one possibility to obtain estimators with lower variance is to increase the number of simulations L. However, this means also increasing computation time and is therefore not attractive for practice.

Instead, a reduction of the term in the nominator also leads to a more stable estimator and is in many cases less costly than a higher number of simulations.

Now, this second possibility is the basic idea of variance reduction methods, which were already applied in special cases in the numerics of BSDEs see Bender and Denk [2] and Labart [31]. Both schemes use a so-called control variate method to stabilize the estimators.

The technique we want to apply for BSDEs is the importance sampling approach originating from the classical linear option pricing problem. In that context it turns out to be highly efficient for some path dependent options, see e.g. Glasserman [20]. In order to calculate option prices one popular way is to simulate paths of the underlying and then average over the corresponding discounted payoffs under an equivalent martingale measureQ, i.e. one tries to approximate

E^Q h

Φ(S)B_T⁻¹ i

,

whereE^Qdenotes the expectation under the measureQandBtis the price of the risk-free asset. However, if the option under investigation involves a rare event feature one often ends up with only a few non-zero payoffs and the Monte Carlo estimator suffers from high empirical variance. Now, the basic idea of importance sampling is to drive more simulated paths of the underlyings into ‘interesting’ or ‘important’

regions, e.g. in the money. In doing so, the number of zero payoffs is reduced and therefore we obtain a more stable estimator.

Mathematically spoken, this drift change is nothing but a change of measure. Hence, adjusting the payoff Φ(S^h), whereS^hdenotes the price process of the underlying under a new measureQ^h, and the numéraire B_t^hunderQ^hby multiplying their product with the stochastic exponentialΨ= ^dQ_dQ^h, yields a Monte Carlo estimator for the initial option price of a random variable with the same mean

E^Q^h h

Φ(S^h)(B_T^h)⁻¹ i

=E^Q h

ΨΦ(S^h)(B^h_T)⁻¹ i

.

(7)

Introduction v

At the same time we hope that its variance Var^Q^h

·

Φ(S^h)(B^h_T)⁻¹

¸

=E^Q

· Ψ

µ

Φ(S^h)(B^h_T)⁻¹

¶₂¸

−E^Q

·

ΨΦ(S^h)(B^h_T)⁻¹

¸₂

is smaller than the variance of its corresponding original counterpart Var^Q

·

Φ(S)(B_T)⁻¹

¸

=E^Q

·µ

Φ(S)(B_T)⁻¹

¶₂¸

−E^Q

·

Φ(S)(B_T)⁻¹

¸₂ .

This is the delicate feature of this variance reduction technique: Choosing the wrong measure one can be faced with variance blow-ups and therefore the selection of a different drift has to be made very carefully.

The vast existing literature in the context of option pricing in different models reflects the complexity of importance sampling and can be categorized as follows. On the one hand, the optimal selection of a new measure is examined in continuous time, see e.g. the articles of Newton [39], Milstein and Schoenmakers [38] or Guasoni and Robertson [25], who try to find general rules for optimality. On the other hand, authors develop specific methods for special settings in discrete time, see e.g. the articles of Boyle et al.

[10], Glasserman et al. [19] or Ökten et al. [40].

The aim of this thesis is twofold. At first, we want to introduce importance sampling to BSDEs. This is done in the context of the forward in time scheme of Bender and Denk [2]. However, we think that our technique is not limited to this special algorithm and, in principle, can be used in any least-squares Monte Carlo approach for BSDEs. The second concern is to establish anL²-convergence theorem for the original Picard-type algorithm in order to complete the publication of Bender and Denk [2].

The organization of this thesis is as follows: InChapter 1we start with the framework, assumptions and definitions which hold throughout this publication. Furthermore, we briefly review the results of Bender and Denk [2], which are generalized later on and comment on the rather extensive notation used in the sequel.

Thesecond chapterintroduces importance sampling to BSDEs and is in large parts already published in Bender and Moseler [4]. More precisely, by a change of measure we parameterize the forward scheme of Bender and Denk [2] to obtain a family of time discretizations for the initial value (Y₀,Z₀) of the solution of the BSDE. That is, for some fixed processhwith suitable properties and a time gridπ: 0=t₀ <. . . <

t_N=Twe define discrete time stochastic processes(S^h,π,Ψ^h,π,j,Y^h,n,π,Z^h,n,π)by:

S^h,π_t_i+1 = S^h,π_t_i +³

b(t,S^h,π_t_i ) +σ(t,S^h,π_t_i )ht_i

´

(t_i+1−t_i) +σ(t,S^h,π_t_i )(Wt_i+1−Wt_i), Ψ^h,π,j_t_i = exp

½

−

i−1 k=j

∑

h^>_t_k(Wt_k+1−Wt_k)−1 2

i−1 k=j

∑

|ht_k|²(t_k+1−t_k)

¾ , and recursively

Y_t^h,n,π_i = E

·

Ψ^h,π,i_t_N φ(X^h,π_t_N ) +

N−1

∑

j=i

Ψ^h,π,i_t_j f(t_j,S^h,π_t_j ,Y_t^h,n−1,π_j ,Z_t^h,n−1,π_j )(t_j+1−t_j)

¯¯

¯¯Ft_i

¸ , Z_t^h,n,π_i = E

·µW_t_i+1−W_t_i t_i+1−t_i +hti

¶µ

Ψ^h,π,i_t_N φ(X_t^h,π_N )

+

N−1

∑

j=i+1

Ψ^h,π,i_t

j f(t_j,S_t^h,π

j ,Y_t^h,n−1,π

j ,Z_t^h,n−1,π

j )(t_j+1−t_j)

¶¯¯

¯¯F_t_i

¸ ,

starting with(Y^h,0,π,Z^h,0,π) = (0, 0), whereΦ(S) = φ(X_T)for some Markov process(Xt,Ft)which is related to the forward diffusion andX^h,π_t_N is some approximation ofX_T.

(8)

vi Introduction

A simple but very elegant observation yields an immediate error estimate for this approximation in Corol- lary 2.1.2, p. 10. We further proceed in the discretization procedure by replacing conditional expectations via least-squares Monte Carlo estimators. In comparison to Bender and Denk [2] and Gobet et al. [22]

we thereby face additional technical difficulties since the time discrete approximation for(Y,Z)is not square-integrable under the original measure. This problem is overcome by exploiting the properties of the density process of the change of measureΨ^h,π,0. In doing so, we can define an appropriate regression basis and finally we can show convergence of the final estimator, which yields a fully implementable algorithm.

To be more specific, the just mentioned convergence proof is divided in two steps. The first step is devoted to give in Theorem 2.2.2, p. 16, anL²-estimate for the error which arises if one replaces conditional expectations by projections on finite-dimensional subspaces. The second stage, Theorem 2.2.5, p. 21, shows almost sure convergence of the final estimator towards this only theoretically feasible approximation under the physical measureP.

Hence, overall we (only) prove convergence in probability of our estimator towards the solution of the BSDE at time zero, though in two out of three steps we are able to deriveL²-error estimates. The reason for this shortcoming lies in the fact that the final estimators are not independent and we average on them to obtain estimators of the next Picard-iteration level.

This disadvantage is overcome inChapter 3by means of nonparametric statistics. Here, we consider a variant of the Picard-type scheme of Bender and Denk [2] with slightly stronger assumptions. In the discrete time approximation we simply truncate the occurring Brownian increments and analyze this additional approximation error. It turns out to be vanishing rapidly as the truncation is relaxed more and more, see Theorem 3.1.7, p. 29. Furthermore, the reformulation of the scheme yields bounded approximations for(Y,Z)thereby opening the door to strong statistical tools.

They are used to estimate terms which occur when examining the averageL²-error over a number of L Monte Carlo simulations which again are used to obtain discrete versions of the conditional expectations.

The main tool for this purpose is the introduction of a so-called ’ghost-sample’, that is a further set of only imaginary Monte Carlo simulations independent in a suitable sense of the former, actually appearing one.

With the help of these additional random variables we can come back to an average over independent random variables and then apply Hoeffding’s inequality for the mean of bounded, independent random variables, see section 3.3.

Lengthy calculations finally lead to the main theorem (Theorem 3.4.1, p. 71), which gives an upper bound for the L²-error depending on the parameters which can be chosen by ourselves. That is, we obtain an estimate containing the number of time steps, the dimension of the basis spanning the subspace for the approximation of the conditional expectation and the number of Monte Carlo simulations. We thus establish a rule, how to simultaneously choose these parameters such that we can assure convergence of our algorithm in the same time.

Finally, we compare our result to that of Gobet et al. [22]. It turns out that in higher dimensional settings, where hypercubes are used as basis functions, both algorithms reveal the same efficiency. However, for Φ(S) =Φ(S_T)andSbeing one-dimensional the Euler-type algorithm is slightly more efficient.

Various numerical examples are studied inChapter 4where we focus on different aspects of variance reduction and numerics of BSDEs in general. After outlining our implementation as pseudo MATLAB code we first test some variance reduction methods stemming from option pricing also in the context of nonlinear BSDEs.

A first step towards a more general approach for the selection of a new measure inducing variance reduction is made in a further section. We pick up an approach from econometrics and ’translate’ it into the BSDE situation. Our main interest are the questions how to choose the new measure and do we obtain better results than in the case where we simply adopted variance reduction techniques from option pricing. Our results for this so-called ’Efficient importance sampling’ (EIS) are slightly ambiguous. The technique turns out to be highly efficient for some examples; however, there are several theoretical and numerical problems left, which limit the number of cases where this kind of selection approach can be

(9)

Introduction vii

successfully applied.

Finally, we have a look at a potential rival of least-squares Monte Carlo estimators. After a quick review of the theory we try to use the simplest nonparametric estimator - the so-called Nadaraya-Watson estimator - for the approximation of conditional expectations and report on the numerical and theoretical problems.

The Appendix at the end of the thesis provides several frequently used inequalities and resumes the tools and results from nonparametric statistics applied in the technical part of Chapter 3.

(10)

(11)

Preliminaries

1.1 The model and basic assumptions

We investigate numerical solutions of the following decoupled forward-backward stochastic differential equation on a complete probability space(Ω,F,F_t,P), where the filtration(F_t)_t∈[0,T]is the augmentation of the one generated by aD-dimensional Brownian motionWandF =F_T:

dS_t = b(t,S_t)dt+σ(t,S_t)dW_t, S₀=s₀, dYt = −f(t,St,Yt,Zt)dt+ZtdWt, Y_T =Φ(S).

Here the coefficient functionsb: [0,T]×R^M −→R^M,σ :[0,T]×R^M −→R^M×D, f :[0,T]×R^M×R× R^D −→Rare given. The terminal condition for the BSDE is defined via the functionalΦ, which acts on the space ofR^M-valued RCLL-functions on[0,T]and is Lipschitz continuous in the sup-norm, i.e. there is a constantKsuch that for all RCLL-functionsx,x⁰

|Φ(x)−Φ(x⁰)| ≤K sup

0≤t≤T

|x(t)−x⁰(t)|

is satisfied. Recall, that a solution of the above equations is a triplet(S,Y,Z)of(Ft)-adapted, square- integrable stochastic processes. We require throughout this thesis the following assumptions, which in particular ensure the existence of a unique solution in the spaceM[0,T]defined in the next section:

A 1. There is a constant K such that for each(t,s),(t⁰,s⁰)∈([0,T]×R^M):

|b(t,s)−b(t⁰,s⁰)|+|σ(t,s)−σ(t⁰,s⁰)| ≤K³p

|t−t⁰|+|s−s⁰|´ . A 2. For the same constant K and each(t,s,y,z),(t⁰,s⁰,y⁰,z⁰)∈([0,T]×R^M×R×R^D)holds:

|f(t,s,y,z)−f(t⁰,s⁰,y⁰,z⁰)| ≤K³p

|t−t⁰|+|s−s⁰|+|y−y⁰|+|z−z⁰|´ . A 3. There is an M⁰-dimensional Markov process(Xt,Ft)withStas its first M components such that

E h

sup

0≤t≤T

|Xt|² i

<∞

andΦ(S) =φ(X_T)for some Lipschitz continuous functionφwith Lipschitz constant K.

A 4. The above constant K satisfies sup

0≤t≤T

|b(t, 0)|+|σ(t, 0)|+|f(t, 0, 0, 0)|+|φ(0)| ≤K.

1

(14)

2 1.1. The model and basic assumptions

For a given, fixed partitionπ : 0 = t₀ < . . . < t_N = T with sup_i|t_i+1−t_i| =: |π| < 1 we define

∆_i =t_i+1−t_iand the increments of the Brownian motion are denoted by∆W_i=Wt_i+1−Wt_i. We add the following structural assumption concerning the time discretization of the solution of the forward equation S^π_t_i:

A 5. For every partitionπthere is a deterministic function uπ:π×R^M⁰×R^D−→R^M⁰ such that X_t^π_i =uπ(t_i,X^π_t_i−1,∆W_i−1), i=1, . . . ,N, X_t^π₀ =X0

satisfies X_m,t^π _i =S_m,t^π _i for m≤M and Eh

|X^π_t_N− X_T|²i

−→0as|π| −→0.

Under Assumption A 5(X_t^π_i,F_t_i)is a Markov process underPas well.

Given these assumptions Bender and Denk [2] introduced an approximation scheme, which we now briefly review since it is the starting point for our investigations.

The discretization for the forward equation is not considered further, we can simply apply the existing methods in the literature. However, given the results of Bender and Denk [2] it is by far enough to restrict ourselves to the simplest one, i.e. we can choose the Euler-Maruyama scheme which reads for the first components ofX_t^π_i as follows:

S^π_t_i+1 = S^π_t_i +b(t_i,S_t^π_i)∆_i+σ(t_i,S^π_t_i)∆W_i, i=0, . . . ,N−1, S^π_t₀ = s0.

The approximation scheme for the backward part is now defined recursively fori=0, . . . ,Nby Y_t^n,π

i = E

·

φ(X^π_t_N) +

N−1

∑

j=i

f(t_j,S^π_t_j,Y_t^n−1,π

j ,Z^n−1,π_t

j )∆_j

¯¯

¯¯F_t_i

¸

, (1.1)

Z_d,t^n,π

i = E

·∆W_d,i

∆_i µ

φ(X^π_t_N) +

N−1

∑

j=i+1

f(t_j,S^π_t_j,Y_t^n−1,π_j ,Z_t^n−1,π_j )∆_j

¶¯¯

¯¯Ft_i

¸

, d=1, . . . ,D, (1.2)

initialized at(Y^0,π,Z^0,π) = (0, 0). We apply the convention∆W_N :=0 and use constant extensions for the approximation, i.e. Y_t^n,π :=Y_t^n,π_i andZ_t^n,π := Z_t^n,π_i fort∈ [t_i,t_i+1[. We see, that given the solution of the (n−1)-th iteration, in principle we could calculate the solution in the next iteration level forward in time.

For this reason we also talk of a forward scheme. Obviously, nestings of conditional expectations within one Picard-iteration are avoided.

Theorem 2 of Bender and Denk [2] gives the convergence of the Picard-type discretization scheme:

Theorem 1.1.1. There is a constant C such that for any n∈N sup

0≤t≤T

Eh

|Y_t−Y_t^n,π|²i +E

"Z _T

0

|Z_s−Z^n,π_s |²ds

#

≤CEh

|X_T−X_t^π_N|²i

+C|π|+C µ1

2+C^∗|π|

¶_n ,

where C^∗=K²(T+1)¡

4DK²(T+1)DT+1¢ .

For its proof results of Bouchard and Touzi [9] and Zhang [46] are used, in particular the convergence of Bender and Denk’s scheme towards that of Bouchard and Touzi is needed. Hence, it is quite natural that in comparison to these backward schemes and that of Gobet et al. [21, 22] the error estimate of Bender and Denk [2] contains an extra term due to the Picard iterations.

In a further approximation step Bender and Denk [2] replace the conditional expectations in (1.1) - (1.2), which are actually conditional expectations with respect to theσ-algebra generated byX^π_t_i, by orthogonal

(15)

1.2. Notation 3

projectionsP_0,ionD+1 subspaces ofL²(σ(X_t^π_i))for anyi=0, . . . ,N, i.e. they define Yb_t^n,π_i = P_0,i

·

φ(X^π_t_N) +

N−1

∑

j=i

f(t_j,S^π_t_j,Yb_t^n−1,π_j ,Zb_t^n−1,π_j )∆_j

¸ , Zb_d,t^n,π

i = P_d,i

·∆W_d,i

∆_i µ

φ(X^π_t_N) +

N−1

∑

j=i+1

f(t_j,S^π_t_j,Yb_t^n−1,π_j ,Zb^n−1,π_t_j )∆_j

¶¸

, d=1, . . . ,D,

initialized again at(Yb^0,π,Zb^0,π) = (0, 0).

At this stage the advantage of the forward approximation scheme reveals: Theorem 11 of Bender and Denk [2] specifies the moderate error occurring when approximating(Y_t^n,π_i ,Z^n,π_t_i )with(Yb_t^n,π_i ,Zb_t^n,π_i ). In the forward scheme this error is bounded by a constant times the worst projection error occurring during the iterations. Consequently, it does not explode if the mesh grid size tends to zero as it is the case for the backward schemes. For more details, see the discussion in Bender and Denk [2], pp. 1802-1803.

In a final step Bender and Denk [2] replace the theoretical projections P_d,i by simulation based least- squares estimators and derive at last in their Theorem 15 that this estimator convergesP-almost surely to the approximation coming from the theoretical projection. Overall, they obtain convergence in probability for their final estimator towards the solution of the FBSDE.

1.2 Notation

1.2.1 Function spaces

As usual in the theory of BSDEs we deal with the following function spaces:

• L²(F)- the space ofF-measurable random variablesXsuch thatE h

|X|² i

<∞,

• L²_F³

Ω,C([0,T]),R^d´

- the space of(Ft)_t≥0-adaptedR^d-valued continuous processes X such that Eh

sup_t∈[0,T]|X_t|²i

<∞,

• L²

³ 0,T;R^d

´

- the space of(Ft)t≥0-adaptedR^d-valued continuous processes Xsuch that we have EhR_T

0 |Xt|²dt i

<∞and

• M[0,T]:=L²_F(Ω,C([0,T]),Rⁿ)×L²(0,T;Rⁿ)equipped with the norm

||(Y(·),Z(·))||_M[0,T]:=

µ E

· sup

t∈[0,T]

|Yt|²

¸ +E

· Z _T

0

|Zt|²dt

¸¶_1/2 .

Pardoux and Peng [41] showed that inM[0,T]there is a unique solution to BSDEs satisfying the above assumptions.

1.2.2 Approximation of stochastic processes

In order to get the intuition behind the notation of the different discretizations of the occurring stochastic processes we introduce them in the following in full detail.

Time discretization

In any approximation scheme we will consider, the first stage of approximation is with respect to time.

That is we introduce a fixed partitionπ : 0 = t₀ < . . . < t_N = T of the interval [0,T] and compute

(16)

4 1.2. Notation

approximations of the solution of the FBSDE at the partition pointst_i,i = 0, . . . ,N. For the solution of the backward part we furthermore use an iterative Picard-type approach and label these iterations with n∈N₀. Hence, writing(S_t^π_i,Y_t^n,π_i ,Z_t^n,π_i )indicates the time discretized solution of the FSBDE at timet_iand iterationngiven the partitionπ. Proceeding this way, we have to introduce discretizations of increments of the Brownian motion with respect toπdenoted by∆W_i =Wti+1−Wti, i.e. we use forward increments.

Chapter 2 introduces a family of FSBDEs which is parameterized by a further stochastic processhwhich is again chosen once and then fixed throughout the whole calculations. We thus write(S^h,π_t_i ,Y_t^h,n,π_i ,Z_t^h,n,π_i ) for the time discretized solution of the modified FSBDE at timet_iand iterationngiven the partitionπ.

The choice h ≡ 0 thereby corresponds to the original discretization of Bender and Denk [2]. As our parametrization represents a change of measure, we also have to consider Brownian increments under a further measure and denote them by∆W_i^h=W_t^h_i+1−W_t^h_ito distinguish them from the former.

In order to ease notation at a later stage we drop the superindiceshandπfor the time discrete solution of the FBSDE, i.e. instead of(S_t^h,π_i ,Y_t^h,n,π_i ,Z^h,n,π_t_i )we simply write(St_i,Y_tⁿ_i,Z_tⁿ_i). We can justify this impre- ciseness not only because of the fewer indices but also because we do not change in the following steps the partition and the processhand hold them fixed.

Another variant of the equation withh ≡ 0 is studied in Chapter 3. Here we focus on drivers f, which are bounded by some constantR. As consequence, we will derive that under mild manipulations of the scheme of Bender and Denk our time discrete approximations of the solution of the backward part are bounded. To remind the reader of this property we write(S_t_i,Y_t^n,R_i ,Z^n,R_t_i )suppressing the dependency on the time partitionπ.

In any setting, there will be Borel-functions such that the time discrete approximations of the solution of the backward SDE can be written as functions of a forward Markov process, which contains as first components the (discrete) forward diffusion and the other components depend on the shape of the terminal condition. We denote this process byXt_i. It turns out that these deterministic functions only depend on the partition point and the number of the Picard-iteration, such that we will write in Chapter 2,Y_tⁿ_i =yⁿ_i(Xt_i), Zⁿ_t_i =zⁿ_i(Xt_i). In Chapter 3 we hereby ignore the influence of the boundRand also writeY_t^n,R_i =yⁿ_i(Xt_i), Z^n,R_t_i =zⁿ_i(Xti). We emphasize that these functions are not the same across chapters, but there is no danger of mixing them up, because within one chapter we only deal with one set of functions.

Projections on finite-dimensional spaces

In Chapter 2 conditional expectations are further replaced by orthogonal projections on finite-dimensional spaces. We indicate this step by a hat, i.e. we write(Yb_tⁿ_i,Zb_tⁿ_i)for the projection of the time discretized solution of the modified BSDE at timet_i and iterationn given the partitionπ on a fixed chosen finite- dimensional subspace.

Monte Carlo simulations

The final approximation step of our proceeding in Chapter 2 replaces the orthogonal projections on finite- dimensional subspaces by an estimator coming from a simulation based least-squares approach. For this purpose we need L independent Monte Carlo simulations of the occurring forward processes. In full detail, we have to simulate in analogy to Bender and Denk [2] the Brownian increments and the forward Markov process and denote them in Chapter 2 by∆_λW_i^h and _λXt_i respectively, for λ = 1, . . . ,L and i=0, . . . ,N. Thus it is natural to write(_λYb_tⁿ_i,_λZbⁿ_t_i)for the resulting estimators for the discretized solution of the backward equation.

Our approach in Chapter 3 directly passes from the time discretization to a simulation based least-squares procedure. For this purpose a whole set of further only imaginary simulations is required. We need for each time pointt_iin the partition extra simulations of the Brownian increments and the forward Markov process running until the end of the time horizon of the equation. These new processes are independent conditional to the information up to t_i and in the mean time are identically generated as the already existing discrete time processes. To be able to distinguish these sets of processes we signify the imaginary ones with bars, i.e. ∆_λW_jand_λXⁱ_t_j denote these processes at timet_j. The additional superindex for the

(17)

1.2. Notation 5

discrete Markov process indicates that the additional feature starts at timet_i. We will comment on these so-called ’ghost samples’ later on in more detail.

Further notation

A lot of other notation is used in the sequel, see also the index at the end of the thesis, however, it is not helpful to introduce it here. We will do so at the appropriate places and turn now to a variance reduced version of the algorithm of Bender and Denk [2].

(18)

(19)

Chapter2

Importance sampling

The content of this chapter is already published in Bender and Moseler [4]. We only supplemented some comments and explanations to further clarify our proceeding. The aim of this chapter is to introduce importance sampling for BSDEs. That is we develop a variance reduction method for BSDEs via a change of measure, whose basic idea is borrowed from option pricing.

2.1 Modified forward scheme

We now explain the starting point for the algorithm developed later on. Consider the following family of decoupled FBSDEs parameterized by some measurable, bounded and adapted processh:[0,T]−→R^D:

dS_t^h = ³

b(t,S_t^h) +σ(t,S_t^h)h_t´

dt+σ(t,S_t^h)dW_t, dY_t^h = ³

−f(t,S_t^h,Y_t^h,Z_t^h) + (Z_t^h)^>ht

´

dt+Z_t^hdWt, S₀^h = s₀, Y_T^h=φ(X_T^h).

where^>denotes the transposition of a matrix. We denote(S,Y,Z) := (S⁰,Y⁰,Z⁰), the solution of the original FBSDE withh≡0.

The first observation is that the initial value of the backward part does not depend onh. In fact, defining a new measureQ^hbydQ^h=Ψ^h_TdPwhere

Ψ^h_t =exp

½

− Z _t

0

h^>_udWu−1 2

Z _t

0

|hu|²du

¾ ,

we can apply the Girsanov theorem, to deduce that the law of (S^h,Y^h,Z^h) under Q^h is the same as that of(S,Y,Z)underP. In particular, the constants(Y₀,Z₀)and (Y₀^h,Z₀^h)coincide. We mention that, however, the path of the processes at later time points(S^h,Y^h,Z^h)and(S,Y,Z)differ. Nonetheless, in many applications, e.g. in option pricing problems, one is mainly interested in estimatingY0. Having the different representations forY0at hand, we aim at reducing the variance of Monte Carlo estimators forY₀by a judicious choice ofh. This turns out to generalize the importance sampling technique from calculating expectations to nonlinear BSDEs.

We now introduce the time discretized analog to the Picard-type iteration scheme with importance sampling induced by some processh. As it is natural that the choice ofhwill vary with the partitionπ, we do assume from now on that the partitionπis fixed. At first we specify the class of processes which we will consider in the sequel.

7

(20)

8 2.1. Modified forward scheme

A 6. The discretized process h is given by

h_t_i =eh(t_i,∆W₀, . . . ,∆W_i−1)

for some bounded deterministic functioneh:π×R^D×. . .×R^D−→R^D. The bound of h will be denoted C_h. The modified forward scheme is then given by

∆W_i^h,π = ∆W_i+ht_i∆_i, i=0, . . . ,N−1, ∆W_N^h,π=0, Ψ^h,π,j_t_i = exp

½

−

i−1

∑

k=j

h^>_t_k∆W_k−1 2

i−1

∑

k=j

|ht_k|²∆_k

¾

, j=0, . . . ,N−1, i=j, . . . ,N, X_t^h,π₀ = X₀,

X_t^h,π_i = uπ(ti,X_t^h,π_i−1,∆W_i−1^h,π), i=1, . . . ,N, and, fori=0, . . . ,N,d=1, . . . ,D

Y_t^h,n,π_i = E

·

Ψ^h,π,i_t_N φ(X^h,π_t_N ) +

N−1

∑

j=i

Ψ^h,π,i_t_j f(t_j,S^h,π_t_j ,Y_t^h,n−1,π_j ,Z^h,n−1,π_t_j )∆_j

¯¯

¯¯Ft_i

¸

, (2.1)

Z^h,n,π_d,t

i = E

·∆W_d,i^h,π

∆_i µ

Ψ^h,π,i_t_N φ(X_t^h,π_N ) +

N−1

∑

j=i+1

Ψ^h,π,i_t_j f(t_j,S^h,π_t_j ,Y_t^h,n−1,π_j ,Z_t^h,n−1,π_j )∆_j

¶¯¯

¯¯Ft_i

¸

, (2.2)

initialized at(Y_t^h,0,π_i ,Z^h,0,π_t_i ) = (0, 0). For the special caseh ≡0, we are just back in the forward scheme discussed by Bender and Denk [2]. Note that, by construction, the firstMcomponents ofX_t^h,π

i coincide withS_t^h,π

i defined via the Euler-Maruyama scheme S^h,π_t₀ = s₀,

S^h,π_t_i+1 = ³

b(t_i,S^h,π_t_i ) +σ(t_i,S_t^h,π_i )ht_i

´

∆_i+σ(t_i,S_t^h,π_i )∆W_i, i=0, . . . ,N−1.

Defining a new measureQ^h,πbydQ^h,π=Ψ^h,π,0_t_N dPthe Girsanov theorem implies that the process W_t^h,π=W_t+

N−1

∑

j=0

h_t_j(t_j+1∧t−t_j∧t),

is Brownian motion underQ^h,π. Consequently,∆W^h,πare Brownian increments under this measure. This implies that(X^h,π,Ft_i)is a Markovian process underQ^h,π and that the transition probabilities of X^h,π underQ^h,πare the same as those ofX^πunderP.

The following theorem shows that, in this Markovian setting, the conditional expectations in the above iteration scheme actually simplify to regressions onX_t^h,π_i . On the one hand this is crucial for the Monte Carlo algorithm described in the next section, on the other hand it also allows us to derive some convergence results for the modified scheme in an elegant way.

Theorem 2.1.1. Under the standing assumptions there are deterministic functions y^n,π_i and z^n,π_i not depending on h such that

Y_t^h,n,π_i =y^n,π_i (X_t^h,π_i ), Z_t^h,n,π_i =z^n,π_i (X^h,π_t_i ).

A Picard-type Iteration for Backward Stochastic Differential Equations : Convergence and Importance Sampling