Idea and heuristics - First try to a more general approach for effective importance sampling

4.3 First try to a more general approach for effective importance sampling

4.3.1 Idea and heuristics

So far, the selection for a processhinducing the measure transformation was done with methods stem-ming from option pricing and therefore was only adapted to the terminal conditionφof the underlying BSDE. These approaches never took the driver f into account and consequently are suspicious not to be the most effective choice, since we do not use all the information we have. Hence, a method to choose a process also adapted to the features of f is desirable but yet not existing. A first idea which tries to tackle this problem is a variation of an algorithm introduced in econometrics. Richard and Zhang [42]

propose a procedure which they call efficient importance sampling (EIS). This approach is in its most general version not limited to econometric methods which they examine e.g. likelihood functions in dif-ferent models. Unfortunately, the methodology is theoretically only feasible ifφ, f and the discrete time approximation forY0satisfy some positivity conditions which are hardly fulfilled in general. However, as the numerical examples show, we can use the idea in special applications to obtain variance reduction effects, which improve in certain cases the estimators considerably.

At first, we sketch the approach of Richard and Zhang [42] ’translated’ to the BSDE situation for a fixed partitionπ: As shown by Bender and Denk [2], Theorem 5, the limitY_t^∞,π₀ of the time-discrete sequence (Y_t^n,π₀ )_n∈Nsatisfies

Y_t^∞,π₀ =E

φ(X_t^π_N) +

N−1

∑

i=0

f(t_i,S^π_t_i,Y_t^∞,π

i ,Z^∞,π_t

i )∆_i

where(Y_t^∞,π_i ,Z^∞,π_t_i )is the limit of(Y_t^n,π_i ,Z^n,π_t_i )_n∈N. After the usual measure transformation we obtain Y_t^h,∞₀ =E

· Ψ^h,0_t_N

φ(X_t^h_N) +

N−1 i=0

∑

f(t_i,S^h_t_i,Y_t^h,∞

i ,Z^h,∞_t

i )∆_i

¶¸

where now(Y_t^h,∞

i ,Z_t^h,∞

i )is the limit of(Y_t^h,n

i ,Z_t^h,n

i )_n∈N,i=0, . . . ,N−1. Please note that we now explicitly denote the dependence with respect tohof these approximations in contrast to the former chapters. Hence heuristically, the ’optimal’ processhis given by a minimizer of

·µ Ψ^h,0_t_N

φ(X_t^h_N) +

N−1

∑

i=0

f(t_i,S^h_t_i,Y_t^h,∞

i ,Z_t^h,∞

i )∆_i

−Y_t^h,∞₀

¶₂¸

chosen from some suitable class of bounded processes. For the moment we define ζ:=Ψ^h,0_t_N

φ(X_t^h_N) +

N−1

∑

i=0

f(t_i,S^h_t_i,Y_t^h,∞

i ,Z^h,∞_t

i )∆_i

96 4.3. First try to a more general approach for effective importance sampling

the first order Taylor approximation of the first factor of the above integrand leads to the simpler expres-sion

Now settingc:=ln(Y_t^h,∞₀ )and thereby ignoring the fact thatcdepends also onh, Richard and Zhang [42]

propose to minimize the simulation based counterpart of the simplified integral, that is we should look for a minimizer in(h(·),c)of A further approximation leads to a sequential proceeding: Starting with some pilot process h⁰(·)we obtain a sequence of processes(h^b(·))_b∈Nby

(h^b+1(·),c^b+1) = arginf and hope that this sequence will converge numerically. However, there is no proof that there exists some limit neither there is one assuring uniqueness. Nevertheless, Richard and Zhang [42] claim that in their applications the technique is working and they also report that neglecting the last two factors (4.6) and (4.7) results in a more stable algorithm. As last simplification we restrict ourselves to the class of constant

4.3. First try to a more general approach for effective importance sampling 97

processes, i.e. h_t_i ≡ h for alli = 0, . . . ,N−1. Besides the sequential proceeding, we also try to tackle problem (4.1) - (4.3) directly by means we describe next.

In our simulations we tried out three different algorithms to select an ’optimal’ processh based on the above heuristics. In full detail these are:

1. Numerical minimization of expression (4.1) - (4.3), direct simplex method:

We thereby choose randomly 100 starting points a_κ, κ = 1, . . . , 100 in the interval [−2, 2] for we conjecture thathshould not be too far from zero such it does not dominate the other rather small (in absolute value) parameters in the financial applications. Given these starting points we use the MATLAB-functionfminsearchrelying on the Nelder-Mead-Simplex method to get a minimizer h^∗(aκ)of the objective function (4.1) - (4.3). The other componentc^∗(aκ)is not needed. We pick out h^opt, which is defined ash^∗(a_e_κ)yielding the lowest function value of (4.1) - (4.3) to finally start the variance reduced least-squares Monte Carlo simulation.

2. Sequential proceeding with simplex method:

As proposed by Richard and Zhang [42] we recursively compute (h^b+1,c^b+1) = arginf

h,c

∑

L λ=1

½µ

−h

N−1

∑

i=0

∆_λW_i−1 2h²

N−1

∑

i=0

∆_i

(4.8)

+ln

φ(_λX_t^h_N^b) +

N−1

∑

i=0

f(t_i,_λS^h_t_i^b,_λYb_t^h_i^b^,n^stop,_λZb_t^h_i^b^,n^stop)∆_i

−c

¾₂ . (4.9) Starting again with 100 random draws κh⁰,κ = 1, . . . , 100 in[−2, 2], we compute the sequences (κh^b)_b∈Nuntil some termination criterion is satisfied and denote this quantity withκh^b^stop. Thereby, we again make use of the MATLAB-functionfminsearch. Finally, we pick out

κ^∗h^b^stop = arginf

κh^bstop

∑

L λ=1

½µ

−_κh^b^stop

N−1

∑

i=0

∆_λW_i−1

2(_κh^b^stop)²

N−1

∑

i=0

∆_i

+ln µ

φ(_λX_t^κ_N^h^bstop⁻¹) +

N−1

∑

i=0

f(t_i,_λS^κ_t_i^h^bstop⁻¹,_λYb^κ^h^bstop

−1,nstop

ti ,_λZb^κ^h^bstop

−1,nstop

ti )∆_i

−c

¾₂

to start the variance reduced Monte Carlo simulation.

3. Sequential proceeding with gradient method:

Given∆_λW_i,_λX^h_t_i^b,_λYb_t^h_i^b^,n^stop and_λZb_t^h_i^b^,n^stop the objective function

L λ=1

∑

½µ

−h

N−1 i=0

∑

∆_λW_i−1 2h²

N−1 i=0

∑

∆_i

¶ +ln³

φ(_λX_t^h_N^b) +

N−1 i=0

∑

f(t_i,_λS^h_t_i^b,_λYb_t^h^b^,n^stop

i ,_λZb_t^h^b^,n^stop

i )∆_i´

−c

¾₂

is differentiable in h,csuch that algorithms using derivatives with respect to these variables can be applied. We use the MATLAB-function fminunc providing also the gradient of the objective function. This tool is based on the large scale algorithm being a member of the subspace trust region methods. Again, we randomly choose 100 initial values κh⁰,κ = 1, . . . , 100 in [−2, 2]and proceed analogly to variant 2.

There are at least two problems left in this procedure. Most severe at first sight is the restriction imposed by the requirement for positive arguments of the logarithm. We tried two possibilities to circumvent this feature:

(a) Neglection of those paths with φ(_λX^h_t_N^b) +

N−1

∑

i=0

f(t_i,_λS_t^h_i^b,_λYb_t^h_i^b^,n,_λZb^h_t_i^b^,n)∆_i≤0.

98 4.3. First try to a more general approach for effective importance sampling

(b) Setting the sum to some thresholdC(i.e. to some small positive valueC, e.g. C=0.01), if it drops below:

ln¡ max©

φ(_λX_t^h_N^b) +

N−1 i=0

∑

f(t_i,_λS^h_t_i^b,_λYb_t^h_i^b^,n,_λZb_t^h_i^b^,n)∆_i,Cª¢

The examination of the same Asian option example in the nonlinear Bergman model as in subsection 4.2.1 shows that in this case both possibilities lead to almost the same result in the algorithm using the sequential simplex method. We used 10,000 paths and the same monomial basis as in subsection 4.2.1 to determine the ’optimal’ processh. In the at the money case (s0=K=100), there are at most 81 paths with

λζb^b,n :=φ(_λX_t^h_N^b) +

N−1 i=0

∑

f(t_i,_λS^h_t_i^b,_λYb_t^h_i^b^,n,_λZb_t^h_i^b^,n)∆_i≤0

and only 4 with 0< _λζb^b,n ≤0.01 for 100 randomly chosen starting points in the interval[−2, 2]. Proceed-ing analogly for the out of the money case (s₀=100,K=120) there are at most 1824 paths with_λζb^b,n ≤0 and at most 61 paths with 0 < _λζb^b,n ≤ 0.01. The maximum number of sobering paths occurs for both option types forb=0 and always declines considerably after few iterations with respect tob.

The comparison theorem (see e.g. Theorem 2.2 and Corollary 2.2 of El Karoui et al. [18]) implies that in this special case and of course in any other option pricing settingYt ≥ 0 P−a.s. Hence, one can hope that also the numerical approximations fulfill this condition and we can conclude, that the low number of numerical outcomes with_λζb^b,n≤0, ifbgrows, occurred not only accidentally.

For this reason variant (b) seems advantageous from the theoretical point of view: We punish choices of h^b which possibly lead to negative estimators forYtand favor choices leading to more realistic results.

Hence, in the sequel we only considered this approach. For the sake of completeness we remark that in the Asian option example both variants together with the sequential simplex method lead to almost the same optimalhand consequently to the same variance reduction effect.

The second concern is: Does the direct and the sequential proceeding lead to ’convergence’ towards an optimalh? And if not, what shall we do in that case? At first, we have to say that contrary to Richard and Zhang [42] our results are partly sensitive to the starting values. We do not get ’convergence’ to the same

’optimal’husing the sequential algorithms in most cases. We rather find for any initial valueκh⁰a series (κh^b)_b∈Nwith 2-4 limit points or an even more irregular behavior. Only in about the half of the cases we can stop the iterations in the sequential proceeding along with a more desirable termination criterion as e.g. |κh^b+1−κh^b| < 0.01 or the like. Instead of that we simply setbstop = 50 to get anyway a result.

Moreover,κh^b^stopdepends in many cases on the starting value.

However, if we use the direct simplex approach the results concerning convergence are more encourag-ing: With very few exceptions we get an ’optimal’hwhich is independent of the starting valueaκthough the results of the adjacent least-squares Monte Carlo simulation are not superior to the sequential opti-mization methods in each case.

4.3.2 Asian call options

We revisit the example of Asian call options in the nonlinear Bergman model of subsection 4.2.1 to get comparable results between the method stemming from option pricing and the EIS approach. The rele-vant FBSDE hence is

dS_t^h = ³

bS_t^h+σS_t^hh_t´

dt+σS_t^hdW_t, dY_t^h =

rY_t^h+b−r

σ Z_t^h−(R−r) µ

Y_t^h−Z_t^h σ

−

+Z_t^hht

dt+Z_t^hdWt, S₀^h = s0, Y_T^h=

µ1 T

Z _T

0 S_t^hdt−K

4.3. First try to a more general approach for effective importance sampling 99

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.7: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

with parameters:

b σ R r T s0 K(atm) K(otm)

0.06 0.2 0.15 0.1 1 100 100 120

where atm and otm stands for at the money and out of the money respectively. Again, we use 20 time steps, 500 up to 50,000 paths, the bivariate monomial function basisx^α₁·x₂^βforα,β=0, . . . , 3 and repeat the simulations 100 times.

For the selection of an ’optimal’ processh which hopefully induces variance reduction we proceed as described in the last subsection and obtain the following results: For the sequential simplex method the algorithm converges independent of the starting value. However, using the sequential gradient method we are faced with the problem of non-convergence for 22 out of 100 different starting values. Moreover, we obtain different ’optimal’husing different starting points. Applying then the above described selec-tion criterion we nevertheless obtain for both sequential approaches a similar ’optimal’h:

option type optimization method selected ’optimal’h average variance reduction factor

at the money direct simplex method 0.38534176 2.5419

at the money sequential simplex method 0.60091548 3.9518 at the money sequential gradient method 0.60003201 3.9453

out of the money direct simplex method 1.05272288 9.3574

out of the money sequential simplex method 0.81947816 6.5918 out of the money sequential gradient method 0.81929033 6.5897

Figures 4.7 - 4.9 depict the empirical mean of 100 repetitions for the estimatorYb_tⁿ₀^stop^,L plus/minus two empirical standard deviations and illustrate the effect of the EIS approach. In comparison to the approach stemming from option pricing we now obtain in average a smaller variance reduction effect. Similar to the former approach is the positive dependency of this effect with respect to the differenceK−s₀.

100 4.3. First try to a more general approach for effective importance sampling

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 7.5

8 8.5 9 9.5 10 10.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 0.8

1 1.2 1.4 1.6 1.8 2

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.8: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 0.8

1 1.2 1.4 1.6 1.8 2

number of paths

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁴ 0.8

1 1.2 1.4 1.6 1.8 2

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for Asian call Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.9: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and Asian call option with different optimization methods.

4.3. First try to a more general approach for effective importance sampling 101

4.3.3 Lookback options

Furthermore, we applied the EIS approach to the lookback options already considered in subsection 4.2.2.

We examine the linear Black-Scholes model dS_t^h = ³

bS_t^h+σS_t^hh_t´

dt+σS_t^hdW_t, dY_t^h =

rY_t^h+b−r

σ Z_t^h+Z_t^hht

dt+Z_t^hdWt, S₀^h = s₀, Y_T^h=

K− min

0≤t≤TS_t^h

as well as the nonlinear Bergman-model dS_t^h = ³

bS_t^h+σS_t^hht

dt+σS_t^hdWt, dY_t^h =

rY_t^h+b−r

σ Z_t^h−(R−r) µ

Y_t^h−Z_t^h σ

−

+Z_t^hht

dt+Z_t^hdWt, S₀^h = s₀, Y_T^h=

K− min

0≤t≤TS_t^h

with parameters:

b σ R r T s0 K(atm) K(otm)

0.05 0.15 0.1 0.05 0.25 95 94 85

We furthermore use 50 time steps, 6,000-80,000 paths and the bivariate monomial function basisx₁^α·x^β₂ forα,β=0, . . . , 2.

The following table as well as Figures 4.10 - 4.15 summarize our findings for 100 repetitions of our proce-dure:

model option type optimization method selected ’optimal’h average variance reduction factor

linear at the money direct simplex method -0.45061769 1.7471

linear at the money sequential simplex method 3.16755490 0.0086 linear at the money sequential gradient method 0.99216669 0.2731 linear out of the money direct simplex method -0.94041427 2.7219 linear out of the money sequential simplex method -2.68862276 9.2794 linear out of the money sequential gradient method -2.68787377 9.2773 nonlinear at the money direct simplex method -0.53656738 1.9036 nonlinear at the money sequential simplex method 2.33085690 0.0099 nonlinear at the money sequential gradient method 0.33570400 0.6495 nonlinear out of the money direct simplex method -0.94299730 2.7615 nonlinear out of the money sequential simplex method -2.76040545 9.5364 nonlinear out of the money sequential gradient method -2.75413580 9.5238 Also here we are faced with the problem of non-convergence in the sequential optimization methods for the at the money option and again we obtain a minor efficiency of the EIS approach in comparison to the algorithms coming from option pricing. In fact, we are now even faced with tremendous variance blow-ups in the cases where we applied the sequential optimization methods for at the money options.

Clearly, the selected ’optimal’hhas a counterintuitive algebraic sign, such that the higher variance of the estimator for the initial option price is not very surprising. More sobering is the observation that in one

102 4.3. First try to a more general approach for effective importance sampling

0 1 2 3 4 5 6 7 8

x 10⁴ 3.52

3.54 3.56 3.58 3.6 3.62 3.64 3.66 3.68 3.7 3.72

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 3

3.5 4 4.5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.10: Convergence ofYb_tⁿ₀^stop^,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 10⁴ 3.5

3.55 3.6 3.65 3.7 3.75

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.11: Convergence ofYb_tⁿ₀^stop^,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

4.3. First try to a more general approach for effective importance sampling 103

0 1 2 3 4 5 6 7 8

x 10⁴ 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 0.225

0.23 0.235 0.24 0.245 0.25 0.255 0.26

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.12: Convergence ofYb_tⁿ₀^stop^,Lin the case of linear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 10⁴ 3.54

3.56 3.58 3.6 3.62 3.64 3.66 3.68 3.7 3.72 3.74

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 3.4

3.6 3.8 4 4.2 4.4 4.6 4.8 5

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, direct simplex method. (b) At the money, sequential simplex method.

Figure 4.13: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

104 4.3. First try to a more general approach for effective importance sampling

0 1 2 3 4 5 6 7 8

x 10⁴ 3.55

3.6 3.65 3.7

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) At the money, sequential gradient method. (b) Out of the money, direct simplex method.

Figure 4.14: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

0 1 2 3 4 5 6 7 8

x 10⁴ 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

0 1 2 3 4 5 6 7 8

x 10⁴ 0.23

0.24 0.25 0.26 0.27 0.28 0.29

number of paths Y0

Mean and std of Y as a function of the number of paths

Importance sampling for lookback option Crude least−squares Monte Carlo

(a) Out of the money, sequential simplex method. (b) Out of the money, sequential gradient method.

Figure 4.15: Convergence ofYb_tⁿ₀^stop^,L in the case of nonlinear BSDE and lookback option with EIS and different optimization methods.

4.3. First try to a more general approach for effective importance sampling 105

case the mean is influenced by the change of measure see Figure 4.13 (b). Unfortunately, we have no explanation for this irregular result contradicting theory.

Quite astonishing, for the out of the money option both sequential optimization methods lead to conver-gence, though the limit is dependent on the starting point.

Overall, these findings are rather deflating since we now use more information about our models espe-cially the shape of f but anyhow get worse results than with methods only relying on the shape of φ.

Though, there are two reasons why we think that it is worth further examining this approach despite these shortcomings: On the first hand we only consider the class of constant processesh_t≡hand choose an ’optimal’ candidate among them. By contrast, Glasserman [20] allows in the Asian option example for a more flexible class of processes:h_tis time-dependent. Hence, it would be desirable to extend the class of considered processeshtfor example to time-dependent ones. First experiments in the Asian option case are yet not very successful, since we are then faced with high-dimensional optimization which creates more and more numerical difficulties. We again end up with a variance blow-up instead of a variance reduction.

The second reason why we proceed with the research on EIS in the BSDE framework is its very general approach. We do not need tailor-made algorithms for each special example and are able to use the same implementation only exchanging few quantities.

Im Dokument A Picard-type Iteration for Backward Stochastic Differential Equations : Convergence and Importance Sampling (Seite 107-117)