• Keine Ergebnisse gefunden

4.3 First try to a more general approach for effective importance sampling

4.3.1 Idea and heuristics

So far, the selection for a processhinducing the measure transformation was done with methods stem-ming from option pricing and therefore was only adapted to the terminal conditionφof the underlying BSDE. These approaches never took the driver f into account and consequently are suspicious not to be the most effective choice, since we do not use all the information we have. Hence, a method to choose a process also adapted to the features of f is desirable but yet not existing. A first idea which tries to tackle this problem is a variation of an algorithm introduced in econometrics. Richard and Zhang [42]

propose a procedure which they call efficient importance sampling (EIS). This approach is in its most general version not limited to econometric methods which they examine e.g. likelihood functions in dif-ferent models. Unfortunately, the methodology is theoretically only feasible ifφ, f and the discrete time approximation forY0satisfy some positivity conditions which are hardly fulfilled in general. However, as the numerical examples show, we can use the idea in special applications to obtain variance reduction effects, which improve in certain cases the estimators considerably.

At first, we sketch the approach of Richard and Zhang [42] ’translated’ to the BSDE situation for a fixed partitionπ: As shown by Bender and Denk [2], Theorem 5, the limitYt∞,π0 of the time-discrete sequence (Ytn,π0 )n∈Nsatisfies

Yt∞,π0 =E

·

φ(XtπN) +

N−1

i=0

f(ti,Sπti,Yt∞,π

i ,Z∞,πt

i )∆i

¸

where(Yt∞,πi ,Z∞,πti )is the limit of(Ytn,πi ,Zn,πti )n∈N. After the usual measure transformation we obtain Yth,∞0 =E

· Ψh,0tN

µ

φ(XthN) +

N−1 i=0

f(ti,Shti,Yth,∞

i ,Zh,∞t

i )∆i

¶¸

where now(Yth,∞

i ,Zth,∞

i )is the limit of(Yth,n

i ,Zth,n

i )n∈N,i=0, . . . ,N−1. Please note that we now explicitly denote the dependence with respect tohof these approximations in contrast to the former chapters. Hence heuristically, the ’optimal’ processhis given by a minimizer of

E

·µ Ψh,0tN

µ

φ(XthN) +

N−1

i=0

f(ti,Shti,Yth,∞

i ,Zth,∞

i )∆i

−Yth,∞0

2¸

chosen from some suitable class of bounded processes. For the moment we define ζ:=Ψh,0tN

µ

φ(XthN) +

N−1

i=0

f(ti,Shti,Yth,∞

i ,Zh,∞t

i )∆i

96 4.3. First try to a more general approach for effective importance sampling

the first order Taylor approximation of the first factor of the above integrand leads to the simpler expres-sion

Now settingc:=ln(Yth,∞0 )and thereby ignoring the fact thatcdepends also onh, Richard and Zhang [42]

propose to minimize the simulation based counterpart of the simplified integral, that is we should look for a minimizer in(h(·),c)of A further approximation leads to a sequential proceeding: Starting with some pilot process h0(·)we obtain a sequence of processes(hb(·))b∈Nby

(hb+1(·),cb+1) = arginf and hope that this sequence will converge numerically. However, there is no proof that there exists some limit neither there is one assuring uniqueness. Nevertheless, Richard and Zhang [42] claim that in their applications the technique is working and they also report that neglecting the last two factors (4.6) and (4.7) results in a more stable algorithm. As last simplification we restrict ourselves to the class of constant

4.3. First try to a more general approach for effective importance sampling 97

processes, i.e. hti h for alli = 0, . . . ,N−1. Besides the sequential proceeding, we also try to tackle problem (4.1) - (4.3) directly by means we describe next.

In our simulations we tried out three different algorithms to select an ’optimal’ processh based on the above heuristics. In full detail these are:

1. Numerical minimization of expression (4.1) - (4.3), direct simplex method:

We thereby choose randomly 100 starting points aκ, κ = 1, . . . , 100 in the interval [−2, 2] for we conjecture thathshould not be too far from zero such it does not dominate the other rather small (in absolute value) parameters in the financial applications. Given these starting points we use the MATLAB-functionfminsearchrelying on the Nelder-Mead-Simplex method to get a minimizer h(aκ)of the objective function (4.1) - (4.3). The other componentc(aκ)is not needed. We pick out hopt, which is defined ash(aeκ)yielding the lowest function value of (4.1) - (4.3) to finally start the variance reduced least-squares Monte Carlo simulation.

2. Sequential proceeding with simplex method:

As proposed by Richard and Zhang [42] we recursively compute (hb+1,cb+1) = arginf

h,c

L λ=1

½µ

−h

N−1

i=0

λWi1 2h2

N−1

i=0

i

(4.8)

+ln

³

φ(λXthNb) +

N−1

i=0

f(ti,λShtib,λYbthib,nstop,λZbthib,nstop)∆i

´

−c

¾2 . (4.9) Starting again with 100 random draws κh0,κ = 1, . . . , 100 in[−2, 2], we compute the sequences (κhb)b∈Nuntil some termination criterion is satisfied and denote this quantity withκhbstop. Thereby, we again make use of the MATLAB-functionfminsearch. Finally, we pick out

κhbstop = arginf

κhbstop

L λ=1

½µ

κhbstop

N−1

i=0

λWi1

2(κhbstop)2

N−1

i=0

i

+ln µ

φ(λXtκNhbstop−1) +

N−1

i=0

f(ti,λSκtihbstop−1,λYbκhbstop

−1,nstop

ti ,λZbκhbstop

−1,nstop

ti )∆i

−c

¾2

to start the variance reduced Monte Carlo simulation.

3. Sequential proceeding with gradient method:

Given∆λWi,λXhtib,λYbthib,nstop andλZbthib,nstop the objective function

L λ=1

½µ

−h

N−1 i=0

λWi1 2h2

N−1 i=0

i

¶ +ln³

φ(λXthNb) +

N−1 i=0

f(ti,λShtib,λYbthb,nstop

i ,λZbthb,nstop

i )∆i´

−c

¾2

is differentiable in h,csuch that algorithms using derivatives with respect to these variables can be applied. We use the MATLAB-function fminunc providing also the gradient of the objective function. This tool is based on the large scale algorithm being a member of the subspace trust region methods. Again, we randomly choose 100 initial values κh0,κ = 1, . . . , 100 in [−2, 2]and proceed analogly to variant 2.

There are at least two problems left in this procedure. Most severe at first sight is the restriction imposed by the requirement for positive arguments of the logarithm. We tried two possibilities to circumvent this feature:

(a) Neglection of those paths with φ(λXhtNb) +

N−1

i=0

f(ti,λSthib,λYbthib,n,λZbhtib,n)∆i0.

98 4.3. First try to a more general approach for effective importance sampling

(b) Setting the sum to some thresholdC(i.e. to some small positive valueC, e.g. C=0.01), if it drops below:

ln¡ max©

φ(λXthNb) +

N−1 i=0

f(ti,λShtib,λYbthib,n,λZbthib,n)∆i,Cª¢

.

The examination of the same Asian option example in the nonlinear Bergman model as in subsection 4.2.1 shows that in this case both possibilities lead to almost the same result in the algorithm using the sequential simplex method. We used 10,000 paths and the same monomial basis as in subsection 4.2.1 to determine the ’optimal’ processh. In the at the money case (s0=K=100), there are at most 81 paths with

λζbb,n :=φ(λXthNb) +

N−1 i=0

f(ti,λShtib,λYbthib,n,λZbthib,n)∆i0

and only 4 with 0< λζbb,n 0.01 for 100 randomly chosen starting points in the interval[−2, 2]. Proceed-ing analogly for the out of the money case (s0=100,K=120) there are at most 1824 paths withλζbb,n 0 and at most 61 paths with 0 < λζbb,n 0.01. The maximum number of sobering paths occurs for both option types forb=0 and always declines considerably after few iterations with respect tob.

The comparison theorem (see e.g. Theorem 2.2 and Corollary 2.2 of El Karoui et al. [18]) implies that in this special case and of course in any other option pricing settingYt 0 P−a.s. Hence, one can hope that also the numerical approximations fulfill this condition and we can conclude, that the low number of numerical outcomes withλζbb,n0, ifbgrows, occurred not only accidentally.

For this reason variant (b) seems advantageous from the theoretical point of view: We punish choices of hb which possibly lead to negative estimators forYtand favor choices leading to more realistic results.

Hence, in the sequel we only considered this approach. For the sake of completeness we remark that in the Asian option example both variants together with the sequential simplex method lead to almost the same optimalhand consequently to the same variance reduction effect.

The second concern is: Does the direct and the sequential proceeding lead to ’convergence’ towards an optimalh? And if not, what shall we do in that case? At first, we have to say that contrary to Richard and Zhang [42] our results are partly sensitive to the starting values. We do not get ’convergence’ to the same

’optimal’husing the sequential algorithms in most cases. We rather find for any initial valueκh0a series (κhb)b∈Nwith 2-4 limit points or an even more irregular behavior. Only in about the half of the cases we can stop the iterations in the sequential proceeding along with a more desirable termination criterion as e.g. |κhb+1κhb| < 0.01 or the like. Instead of that we simply setbstop = 50 to get anyway a result.

Moreover,κhbstopdepends in many cases on the starting value.

However, if we use the direct simplex approach the results concerning convergence are more encourag-ing: With very few exceptions we get an ’optimal’hwhich is independent of the starting valueaκthough the results of the adjacent least-squares Monte Carlo simulation are not superior to the sequential opti-mization methods in each case.

4.3.2 Asian call options

We revisit the example of Asian call options in the nonlinear Bergman model of subsection 4.2.1 to get comparable results between the method stemming from option pricing and the EIS approach. The rele-vant FBSDE hence is

dSth = ³

bSth+σSthht´

dt+σSthdWt, dYth =

µ

rYth+b−r

σ Zth(R−r) µ

Yth−Zth σ

+Zthht

dt+ZthdWt, S0h = s0, YTh=

µ1 T

Z T

0 Sthdt−K

+

4.3. First try to a more general approach for effective importance sampling 99

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.7: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

with parameters:

b σ R r T s0 K(atm) K(otm)

0.06 0.2 0.15 0.1 1 100 100 120

where atm and otm stands for at the money and out of the money respectively. Again, we use 20 time steps, 500 up to 50,000 paths, the bivariate monomial function basisxα1·x2βforα,β=0, . . . , 3 and repeat the simulations 100 times.

For the selection of an ’optimal’ processh which hopefully induces variance reduction we proceed as described in the last subsection and obtain the following results: For the sequential simplex method the algorithm converges independent of the starting value. However, using the sequential gradient method we are faced with the problem of non-convergence for 22 out of 100 different starting values. Moreover, we obtain different ’optimal’husing different starting points. Applying then the above described selec-tion criterion we nevertheless obtain for both sequential approaches a similar ’optimal’h:

option type optimization method selected ’optimal’h average variance reduction factor

at the money direct simplex method 0.38534176 2.5419

at the money sequential simplex method 0.60091548 3.9518 at the money sequential gradient method 0.60003201 3.9453

out of the money direct simplex method 1.05272288 9.3574

out of the money sequential simplex method 0.81947816 6.5918 out of the money sequential gradient method 0.81929033 6.5897

Figures 4.7 - 4.9 depict the empirical mean of 100 repetitions for the estimatorYbtn0stop,L plus/minus two empirical standard deviations and illustrate the effect of the EIS approach. In comparison to the approach stemming from option pricing we now obtain in average a smaller variance reduction effect. Similar to the former approach is the positive dependency of this effect with respect to the differenceK−s0.

100 4.3. First try to a more general approach for effective importance sampling

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 0.8

1 1.2 1.4 1.6 1.8 2

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.8: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 0.8

1 1.2 1.4 1.6 1.8 2

number of paths

Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 0.8

1 1.2 1.4 1.6 1.8 2

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.9: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

4.3. First try to a more general approach for effective importance sampling 101

4.3.3 Lookback options

Furthermore, we applied the EIS approach to the lookback options already considered in subsection 4.2.2.

We examine the linear Black-Scholes model dSth = ³

bSth+σSthht´

dt+σSthdWt, dYth =

µ

rYth+b−r

σ Zth+Zthht

dt+ZthdWt, S0h = s0, YTh=

µ

K− min

0≤t≤TSth

+

as well as the nonlinear Bergman-model dSth = ³

bSth+σSthht

´

dt+σSthdWt, dYth =

µ

rYth+b−r

σ Zth(R−r) µ

Yth−Zth σ

+Zthht

dt+ZthdWt, S0h = s0, YTh=

µ

K− min

0≤t≤TSth

+

with parameters:

b σ R r T s0 K(atm) K(otm)

0.05 0.15 0.1 0.05 0.25 95 94 85

We furthermore use 50 time steps, 6,000-80,000 paths and the bivariate monomial function basisx1α·xβ2 forα,β=0, . . . , 2.

The following table as well as Figures 4.10 - 4.15 summarize our findings for 100 repetitions of our proce-dure:

model option type optimization method selected ’optimal’h average variance reduction factor

linear at the money direct simplex method -0.45061769 1.7471

linear at the money sequential simplex method 3.16755490 0.0086 linear at the money sequential gradient method 0.99216669 0.2731 linear out of the money direct simplex method -0.94041427 2.7219 linear out of the money sequential simplex method -2.68862276 9.2794 linear out of the money sequential gradient method -2.68787377 9.2773 nonlinear at the money direct simplex method -0.53656738 1.9036 nonlinear at the money sequential simplex method 2.33085690 0.0099 nonlinear at the money sequential gradient method 0.33570400 0.6495 nonlinear out of the money direct simplex method -0.94299730 2.7615 nonlinear out of the money sequential simplex method -2.76040545 9.5364 nonlinear out of the money sequential gradient method -2.75413580 9.5238 Also here we are faced with the problem of non-convergence in the sequential optimization methods for the at the money option and again we obtain a minor efficiency of the EIS approach in comparison to the algorithms coming from option pricing. In fact, we are now even faced with tremendous variance blow-ups in the cases where we applied the sequential optimization methods for at the money options.

Clearly, the selected ’optimal’hhas a counterintuitive algebraic sign, such that the higher variance of the estimator for the initial option price is not very surprising. More sobering is the observation that in one

102 4.3. First try to a more general approach for effective importance sampling

0 1 2 3 4 5 6 7 8

x 104 3.52

3.54 3.56 3.58 3.6 3.62 3.64 3.66 3.68 3.7 3.72

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 3

3.5 4 4.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.10: Convergence ofYbtn0stop,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 104 3.5

3.55 3.6 3.65 3.7 3.75

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths

Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.11: Convergence ofYbtn0stop,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

4.3. First try to a more general approach for effective importance sampling 103

0 1 2 3 4 5 6 7 8

x 104 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths

Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.12: Convergence ofYbtn0stop,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 104 3.54

3.56 3.58 3.6 3.62 3.64 3.66 3.68 3.7 3.72 3.74

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 3.4

3.6 3.8 4 4.2 4.4 4.6 4.8 5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.13: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

104 4.3. First try to a more general approach for effective importance sampling

0 1 2 3 4 5 6 7 8

x 104 3.55

3.6 3.65 3.7

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.14: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 104 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 104 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.15: Convergence ofYbtn0stop,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

4.3. First try to a more general approach for effective importance sampling 105

case the mean is influenced by the change of measure see Figure 4.13 (b). Unfortunately, we have no explanation for this irregular result contradicting theory.

Quite astonishing, for the out of the money option both sequential optimization methods lead to conver-gence, though the limit is dependent on the starting point.

Overall, these findings are rather deflating since we now use more information about our models espe-cially the shape of f but anyhow get worse results than with methods only relying on the shape of φ.

Though, there are two reasons why we think that it is worth further examining this approach despite these shortcomings: On the first hand we only consider the class of constant processesht≡hand choose an ’optimal’ candidate among them. By contrast, Glasserman [20] allows in the Asian option example for a more flexible class of processes:htis time-dependent. Hence, it would be desirable to extend the class of considered processeshtfor example to time-dependent ones. First experiments in the Asian option case are yet not very successful, since we are then faced with high-dimensional optimization which creates more and more numerical difficulties. We again end up with a variance blow-up instead of a variance reduction.

The second reason why we proceed with the research on EIS in the BSDE framework is its very general approach. We do not need tailor-made algorithms for each special example and are able to use the same implementation only exchanging few quantities.