• Keine Ergebnisse gefunden

5.4 Analysis

5.4.1 A simple special case

Consider the case of n = 3routines, executing on m = 2 worker machines. We will attempt to compute

Pr(Z t), whereZ is a random variable denoting the time of successful completion of a parallel step under eager scheduling.

The eager scheduling algorithm starts by placing Routine 1, which has runtime x1 with probability

f

I1 (x

1

), on Machine 1. This routine will finish, assuming Machine 1 does not fail first, at some unknown timex1

=c

1

= x

1. The scheduler will also assign Routine 2 (having runtimex2 with probabilityfI2 (x

2 )) to Machine 2, where it will finish at timex2=c2(again assuming that no fault occurs on Machine 2).

Now two cases must be distinguished: x1

<x

2 (an overview over the following case distinction is shown in Figure 5.1).2

1. x1<x2=c2 2. x1>x2=c2

5.4.ANALYSIS

Figure5.1:Overviewoverpossiblecasesforeagerschedulingofthreeroutinesontwomachines(P1,P2).Arrowsindicateschedulingsteps,grayedboxeseagerlyscheduledroutines,andcrossedoutcasesdonotappearforc

2>c

1.

55

After the first routine has finished, the scheduler will assign Routine 3 to the first machine asking for work.

Routine 3 has runtimex3with probabilityfI3 (x

3

). And again, the machine that has been assigned Routine 3 will become idle either before or after the other machine. Hence there are four cases so far:

1. x1

Note how adding one routine generates an additional inequality that bounds the runtime for the new routine by a linear combination of the runtimes of the previously scheduled routines. The bound is either from above or below, and the corresponding lower or upper bound is either 0, ormaxfc1;c2gt=c2t.

Now the “eager” part of eager scheduling comes into play. The first idle machine will be assigned a non-completed routine, which can be any of the three, but which is uniquely determined by the relative lengths of the three routines—as long as there are no faults. This extends the previous cases as follows:

1. x1

2, Routine 2 is eagerly scheduled on Machine 1, 2. x1

2, Routine 3 is eagerly scheduled on Machine 2, 3. x1

2, Routine 1 is eagerly scheduled on Machine 2, 4. x1>x2=c2andx1<x2=c2+x3=c2, Routine 3 is eagerly scheduled on Machine 1.

It depends on the actual length of the routines and on c2 whether or not the eagerly scheduled routine terminates before or after its first instance; both cases are possible. Hence we now have eight cases:

1. x1

2, Routine 2 is eagerly scheduled on Machine 1, andx1 +x

2, Routine 2 is eagerly scheduled on Machine 1, andx1 +x

2, Routine 1 is eagerly scheduled on Machine 2, and x1

<

2, Routine 3 is eagerly scheduled on Machine 1, andx1 +x

2(also impossible forc2

>c

2, Routine 3 is eagerly scheduled on Machine 1, andx1 +x

2has probability 0 for any distributions with continuous densities, which we have assumed above. In a discrete density case, ties can be broken arbitrarily.

5.4. ANALYSIS

In case a fault occurs, the other machine has to complete all unfinished routines. So conceptually, the above schedules can be extended by appending all routines that have not been scheduled on a given machine to this machine. The schedule finishes when all routines have been completed. Since a processor can (potentially) be assigned all three routines, it can fail during the execution of the first, second, or third routine or only fail after all three routines have been completed—this number of routines that a given machinejsurvives is indicated by

s

j. Of course, at least one processor must survive until the schedule is completed. Hence, for all the eight cases shown above, there are a number of subcases that enumerate the possible fault combinations and determine their respective termination time. Also note that after the first eager scheduling step has occurred, the relative execution times of tasks on the two machines is of no consequence since these redundant assignments are only executed if one machine has failed.

As an example, consider the last case from the eight cases shown above. Conceptually, Routine 2 is additionally scheduled on Machine 1, and Routine 1 is added to Machine 2, to compensate for a potential failing of the other machine already during its very first routine. Table 5.2 gives an overview of the termination times for this case with all pertaining combinations of faults. For the other cases, the termination times in this table would be different.

Table 5.2: Successful termination times of the various subcases of Case 7. Columns indicate the numbers1of routines that Machine 1 survives, rows indicates2.

Let us now begin to computePr(Z t). We are going to look at all possible combinations of x1,x2, andx3. For such a combination, we compute the probability of its occurrence. Given such a combination, we consider the combination of fault scenarios(s1

;s

2

)that can occur when the routines are executed. Hence, the law of total probability lets us start with the formulation (~x=(x1;x2;x3)and~s=(s1;s2)):

whereS is the set of fault combinations,Pr(~x;~s)is the probability that a certain fault scenario occurs for a given combination of xi, and h is a function that tests if the execution of the three routines with the given runtimes succeeds before timetunder a given fault scenario~s. Of course,c2is an implicit parameter ofh.

Since the functionhbasically requires the implementation of an eager scheduling algorithm, we want to break this down into simpler functions. To do so, we take advantage of the case distinctions introduced above.

The important point to note here is that any of the eight cases as defined above determine a subset of routine combinations such that all combinations in this set have the same behavior; namely, their scheduling order on the two given machines is the same. Moreover, as we have noted above, each additional routine introduces one additional inequality that can be used to bound its value, and can be directly used as a limit for the corresponding integral. The other bound is either0 or infinity, where we will refine infinity as an upper bound later.

The situation is complicated by the inequality introduced by the first eager scheduling step. This inequality cannot be directly mapped onto an integral. However, we can express this inequality by means of the Heaviside function: a<b,H(b a)=1, whereH(x)=1,x0and0otherwise.

This allows us to refine the above expression forPr(Z t)as follows (for Case 8 as an example):

Pr(Z

and similarly for the other seven subcases. As can be seen from Table 5.2, the set of fault scenarios Sis just the setf(0;3);(1;2);(1 ;3) ;( 2;1 );(2;2) ;(3 ;0 );( 3;1);(3 ;2 );( 3;3)g; all other fault scenarios do not result in a successful completion of the parallel step.

Instead of using1as upper limit,c2

tis also sufficient, since no routine longer than this has any chance of being completed beforet, even if it runs alone on the faster of the two workers. The feasibility test functionh only means comparing the termination time for this case and fault scenario to the actual timet. At this point, it is straightforward to write down the complete expression for the runtime distribution for this example. For a complete, rather lengthy, expression, please refer to [134].