• Keine Ergebnisse gefunden

General goodness of ISPO-PingPong

In the previous section we showed that the optimality gap for our heuristic for real-world instances averagely amounts to0.19%and in the worst case to0.69%depending on the setting. Now we apply the heuristic on a small example to show that we cannot guarantee such small gaps in general. As in our accompanying example we consider only two branchesB={1,2}and two sizesS={S,L}. Only one lot-type is allowed for supplying the two branches, i.e.κ= 1. We consider all lot-types with at minimum one item per size, i.e. vmin = 1and at maximum two items per size,vmax = 2. We only allow lot-types with cardinality3. This yields the lot-types(1,2)and(2,1). The set of multiplicities is given byM = {1,2,3}. We set I = 0andI = 20. In this example we assume that pick-cost and lot-opening costs take value zero. We consider four sales periods including the sellout periodkmax = 3. The discounting factor is also set to zero, ρ= 0. Moreover we set the fixed and variable mark-down costs to zero, i.e. µv = µf = 0. We are given four prices including the salvage value. It is P = {0,1,2,3} withπ0 = 10, π1 = 9, π2 = 8andπ3 = 4. We number the price trajectoriestaccording to the following table:

index t

1 (0,0,0,3) 2 (0,0,1,3) 3 (0,0,2,3)

For simplicity’s sake we consider only one scenario with probability one. The demands per periodkand price indexpare given as stated in the following tables.

(1,S)

We state the single supply revenues computed by Algorithm6 in the table found below. In the last column we are given the additional revenue for the demand-exceeding numbers of items according to Observation 2. Because we are given no mark-down costs, the additional revenue always amounts toπkmax−ap = 5−4 =−1.

At the beginning of ISPO-PingPong a price trajectory for each scenario is fixed.

In this example at first we choose the “first” price trajectory1. For this trajectory the best supply in terms of lot-types is computed. Becauseκ= 1we can only choose one lot-type, either(1,2)or(2,1), for supply.

According to the single supply revenues an optimal supply for Price trajectory1is given by choosing Lot-type(1,2)in Multiplicity1for Branch 1 and in Multiplicity2 for Branch 2. This yields a revenue of27.

The next step is to reject the fixed price trajectory and if possible to choose one which is optimal for the given supply. Otherwise the heuristic converges.

Adding up the corresponding single supply revenues yields that no other price tra-jectory is better for the given supply. For the price trajectories2and3the revenues for the given supply also amount to27. The approach converges with objective function value27.

An alternative would be to start with the “best” price trajectory – that means the price trajectorytwhich yields for our scenarioethe highest objective function value of the single supply relaxation SLDP-CB({e→t}). This is Price trajectory3. The best revenue in terms of single supplies for this trajectory is51which is given by supply (4,5)for Branch 1 and(3,1)for Branch 2. Price trajectory1yields just an revenue of 30with supply(1,2)for Branch 1 and(2,1)for Branch 2. The revenue for Trajectory2 amounts to50and results from a delivery of(4,3)for Branch 1 and(3,1)for Branch 2.

That means we start with Price trajectory3and compute the best supply in terms ofκ= 1chosen lot-type. The optimal supply for this trajectory is given by supplying 3times Lot-type(1,2)to both branches. The resulting revenue amounts to42.

Now we have to reject the current price trajectory again and to compute the best trajectory in terms of the current supply. According to the single supply revenues this is also Price trajectory3. For Trajectory1the revenue amounts to18and for Trajectory2 it amounts to38.5. That means the approach converges with a revenue of42.

Solving ISPO for this example to optimality yields an optimal solution value of 46.50which results from supplying Lot-type(2,1)three times to Branch 1 and two times to Branch 2. Price trajectory1is optimal.

For this example neither using the “best” trajectory nor the “first” trajectory – which is also the “worst” – as start trajectories lead to a solution with small gaps. The gaps amount41.94%for the “first” and9.68%for the “best” trajectory.

Thus, in general we can not assume such a good performance as on our instances.

We want to state some specialties of our real-world instances: For our instances the range between the lower and upper bound as a rule amounts to about maximal10%of I+I−I2 , see Remark1. In this example it is much higher, namely200%. Moreover in the real-world case the deviation among the demands is with mostly zero or one sold items less.

Although at this point we can not evidence a general warranty of goodness for ISPO-PingPong, the bounds in terms of our real-world instances are small enough to justify practical use.

9.5 Conclusion of the chapter

We presented the Branch&Bound solver ISPO-BAB which solves the Integrated Size and Price Optimization Problem for all tested instances to optimality. We branch on maps “scenario to price trajectory”. Dual bounds are obtained by extensions of the wait-and-see solution from stochastic programming.

For practical use at our industrial partner we propose the heuristic ISPO-PingPong.

The principle is to alternate size and price optimization until convergence. This is possible because of the special structure of ISPO – the reversible recourse. We show that ISPO-PingPong for the tested real-world instances yields solutions with an average optimality gap of0.19% – the maximum gap amounts 0.69% – in averagely 12.77 minutes.

Chapter 10

DISPO in practical application – real-world experiments

The collaboration with our industrial partner gave us the opportunity to test the practical relevance of DISPO in a real-world field study.

With real-world experiments we want to verify that DISPO performs better than the method currently in use at the partner, i.e. the LDP together with a manual de-termination of mark-downs. For that reason the DISPO-team performed a so-called single-blind experiment where test and control branches compete against each other.

We give some basics about statistical experiments in Section 10.1before we apply them in Section10.2to our field studies.

During the cooperation also field studies only in terms of price optimization were performed. In Section10.3 we will outline the main results. We show how heavily mark-downs can directly affect the number of sales and that price optimization can also increase the realized revenue.

Because performing a field study is expensive in terms of work and money we es-timated the potential of improvement a change from LDP to ISPO-based supply would bring along. The results – which we outline in Section10.4– convinced our partner and the DISPO-team to perform a five-month field study.

With DISPO we could finally increase the realized revenue about more than 1.5 percentage points. Moreover by regarding the field study as a statistical experiment we can give a statement about the significance of the result. We present the details in Section10.5.

10.1 Performing statistical experiments

With real-world experiments we want to figure out if DISPO – or also price optimiza-tion as a part of DISPO – performs better than the methods currently implemented at our partner. Moreover we want the results to be statistically significant.

In order to apply statistical methods later on we want to introduce some statistical basics in this section. We start with a classification of blind experiments in Subsec-tion10.1.1. We outline the term statistical significance in Subsection10.1.2. To make a point about statistical significance we have to perform a test of significance. We mention the most common tests in Subsection10.1.3before we focus on theWilcoxon signed-rank testin Subsection10.1.4.

124

In this section we are mainly guided by [FPP07], [Raj06] and [Kan06].

10.1.1 Blind experiments

A blind experiment is a statistical experiment where not all people involved are in-formed about certain aspects to avoid bias.

Blind experiments are typically applied in medical tests. The group of probands is divided into a test and a control group where the test group gets the medicament and the control group just a placebo. If one wants to examine the effect of a medicament it is usual that the experimentees are not informed in which group they are. Otherwise the experimentees might be affected by this information. If all other involved persons – except the experimentees – have full information about the categorization we talk about asingle-blind experiment.

In some cases it is useful that also the researchers do not know about the category of the tested persons. They might treat the probands accordingly. In this case we would talk about adouble-blind experiment.

10.1.2 Statistical significance

If our new method performs worse or better than the method currently in use we want to state how big the role of chance for this result was. If the probability that the result could be caused by pure chance is not small enough we would not give general state-ments about a better or worse performance. The so-callednull hypothesissays that the method leads to no differences (or also to no better/worse performance). Analternative hypothesisthat it does. A test isstatistically significantif the probability that its out-come is the result of chance is smaller than a predefinedsignificance level. A common choice for the significance level is5%. To make a point about statistical significance a so-calledtest staticsticis computed. A test statistic is defined as a measurement of the difference between the data and the statement of the null hypothesis, [FPP07]. The test statistic follows atest distribution. If the probability to obtain the test statistic under the test distribution – the so-calledp-value– is smaller than the significance level – then we call the resultstatistically significant– we reject the null hypothesis and rely on the alternative hypothesis. Rejecting the null hypothesis does not mean that the alternative hypothesis is true. It is only an evidence that the result is not caused by random.

10.1.3 Statistical tests in general

To evidence statistical significance there are several statistical tests. There are tests for relatedsamples and tests forunrelatedsamples. A sample is called unrelated if groups of different individuals are compared. For related samples we compare groups with individuals related pairwise to an individual from the other group. Which test can be applied also depends on the kind of the data: Are the observations nominal, ordinal or given by a distribution? Parametrictests assume a specific distribution while non-parametrictests do not.

We distinguish betweentwo-sidedandone-sidedtests. A two-sided test considers both, a better and a worse performance of the test sample simultaneously. The null-hypothesis says that both methods perform the same way, the alternative null-hypothesis that they do not. With a one-sided test we are only interested in a better/worse performance of the “new method”. The null-hypothesis says that it performs not better/not worse,

the alternative that it does. Because in our case we are interested in a worse or better performance we focus on one-sided tests in the following.

A common approach for two unrelated normal distributed samples is Student’s t-test. The t-test compares differences between the means of the particular samples and compares them with the corresponding standard error to determine if the two samples arise from the same distribution. For related normal distributed samples the t-test can also be applied in a similar way. For further information see for example [FPP07].

In the case that no distribution for the observations can be assumed (but also for observations from a specific distribution) non-parametric tests can be applied. The tests for non-parametric ordinal data assign ranks to the observations. As test statistic rank-sumsare computed. The test distribution is the distribution of the rank-sums.

For unrelated samples the Mann-Whitney test is commonly used. At first all obser-vations are ordered increasingly and ranks are assigned in terms of the ordering. Then, by summing up all ranks of one sample the rank sum – the test statistic – is computed.

To determine the role of random one computes the p-value as the probability to get the observed rank-sum (or for one-sided tests the observed rank-sum or a higher/lower one) among all other possible rank-sums.

For related samples there is a similar approach named Wilcoxon signed-rank test.

10.1.4 Wilcoxon signed-rank test

In order to certify statistical significance for two related ordinal samples Wilcoxon signed-rank test is very common. The test is named after Frank Wilcoxon who pre-sented it together with the rank sum test for non-paired observations also called Mann-Whitney test in [Wil45]. Wilcoxon signed-rank test is an alternative to the Student’s t-test if no normal distribution can be assumed. It yields a statement about the symmet-ric distribution of the pair differences around the median.

The test can be performed as one-sided or two-sided test. For our purposes only the one-sided test is relevant. Therefore we formulate Wilcoxon signed-rank test as one-sided test.

It is checked if the differences of the ordered paired observations (test−control) are distributed symmetrically around or right of the medianx˜or symmetrically around or left of the medianx. Thus, the null hypothesis in the first case is˜

H0: ˜x≤0, (10.1)

and the alternative hypothesis

H1: ˜x >0. (10.2)

In the second case, the null-hypothesis is

H0: ˜x≥0, (10.3)

and the alternative hypothesis

H1: ˜x <0. (10.4)

In the first case the null hypothesis is equivalent to the statement that the distribu-tions of the paired observadistribu-tions for the two samples are identical or that the distribution of the test sample is shift to the left. In the second case that they are identical or that the distribution of the test sample is shift to the right.

We now describe the different steps for performing the Wilcoxon signed-rank test.

They are illustrated on the following small example.

We assume our observations for the test and the control sample are given as stated in the subsequent table.

observation 1 2 3

test 0.45 0.58 0.63

control 0.52 0.55 0.46

1. Computing the differences of the paired observations

We compute the differences of the observations for each test-control pair. This yields:

observation 1 2 3

difference −0.07 0.03 0.17

2. Ordering the absolute values increasingly and assigning ranks

The next step is to order the absolute values of the differences from above increasingly.

Additionally we store the sign of each difference: It says if the value for the observation from the test sample was higher or lower than the related observation from the control sample. The observations get ranks according to the ordering.

observation 2 1 3

abs. diff. 0.03 0.07 0.17

sign + +

rank 1 2 3

If absolute differences for test-control pairs are equal we talk aboutties. If for example the three signed differences−0.03,0.03and0.07were observed there would be no obvious ranking. Rank 1 or rank 2 could be assigned to both of the two first differences. In this case it is common to assign the mean of the ranks the observations would occupy. In our case the first and the second observation get rank 2+12 = 1.5 while the third observation gets rank3.

3. Computing the test-statistic – the rank-sum

Now the rank sum as test-statistic is computed by adding up all ranks with an associated difference with positive sign. If we want to check if the test distribution against the control distribution is shift to the right (as alternative hypothesis), withntest-control pairs the null-hypothesis is equivalent to that case that the rank sum isn(n+1)4 or lower.

If we are interested in a left shift of the test distribution the null-hypothesis is equivalent to that case that the rank sum is near n(n+1)4 or higher. (It is n(n+1)2 =Pn

i=1ithe sum of all ranks according to the Gaussian sum.)

In our example the rank sum is1 + 3 = 4.

4. p-value

With the rank-sum as test statistic and the distribution of the rank sums as test distribu-tion we are now able to compute the p-values for both one-sided tests.

In the first case – the test of a shift to the right for the test distribution – this is the probability of getting the observed rank-sum or a higher one by chance. In the case of checking a shift to the left we compute the probability of getting the observed rank-sum or a smaller one by chance.

For our small example this is easily done. We just have to compute the relative frequency of the observed rank sum and higher/smaller rank-sums among all possible rank-sums.

We denote the probability for getting a rank-sum ofkor higher for a sample of test-control pairs with sizenin the following byPn(X ≥k); the probability for getting a rank-sum ofkor smaller byPn(X≤k).

We consider all possible assignments of ranks and signs to our observations:

1 2 3 rank-sum

+ + + 6

+ + 3

+ + 4

+ + 5

+ 1

+ 2

+ 3

0

Our observed rank-sum for the example amounts to4. The probability to get a rank-sum of4or higher than4by chance isP3(X ≥4) = 38 = 37.50%. The probability for obtaining a rank-sum of4or lower by chance isP3(X≤4) = 68 = 75%.

In this example we cannot deny both null-hypothesises because both probabilities are greater than our significance level of 5%. Indeed the probability for randomly observing higher values for the test sample is smaller than the probability for randomly observing lower values. But both results are not significance. The probabilities that the results are caused by chance are too high.

For higher numbers of observations we can exploit the fact that the test statistic of the Wilcoxon rank-sum test can be approximated by the normal distribution, see for example [Mon10].

10.2 Performing our field-studies