Solution for m machines and routines with fixed runtimes

5.4 Analysis

5.4.4 Solution for m machines and routines with fixed runtimes

Unfortunately, the solution derived in Section 5.4.3 is computationally very expensive. It is difficult to com-pute the probability distribution in reasonable time with such general assumptions as have been made above.

However, it might be more feasible to do so if the assumptions are more restricted. In this section, fixed run-times of routines are assumed, i.e., routineⁱhas a precisely given runtime^aⁱ (on a machine of relative speed one).

These restrictions allow a more straightforward solution, which can be generalized to^mmachines right away. Much of the notation introduced in the previous section will be used here, with obvious generalizations from two machines to ^m machines where necessary. To avoid some purely technical special cases in the following derivation, we will assume thatⁿ^>^m.

The proof will use the notion of a state:

Definition 5.12 (State^qof an execution). A state ^q ⁼^(A;^S;^{R )} ² ^Q represents the progress of the execu-tion ofⁿroutines on^mmachines,^Qis the set of all possible states. Such a state consists of

an assignment^Asimilar to Definition 5.1, but each element extended by the absolute time of completion:

A=[(machine^j;routine^i;absolute completion time^o);^:^:^:^],

a fault scenario^Sas in Definition 5.8, extended to^mmachines,

a tuple^Rof lengthⁿthat represents the state of each individual routine; an element of^Rcan be either

waiting,^started, or^nished.

Note that ^Sj is the random variable corresponding to the lifetime of machine^j, while ^sj is the number of routines that machine ^j survives in a given state. Again we need some utility functions to conveniently formulate the definitions and proofs.

Definition 5.13 (The^donepredicate). Given a state^q⁼^(A;^S;^{R )},^doneis defined as

done

Definition 5.14 (The number of operational machines in a state). For a state ^q ⁼ ^(A;^S;^{R )}, the number of operational machines in a state is given by

operational

Definition 5.15 (The next routine to be scheduled). Given a state ^q ⁼ ^(A;^S;^{R )} that is not yet done, the next routine to be scheduled is given by

candidate

The assignments here are ordered according to absolute completion time. An ordered concatenation opera-tor allows to maintain this order when assignments are concatenated, and only assignements ordered according to completion are used in this proof.

Definition 5.16 (Ordered concatenation of assignments.). Given an assignment sequence^Aordered by com-pletion time, the ordered concatenation of^Awith a single assignment^(j;^i;^c)is defined as

A?(j;i;c)

Slightly more complicated—due to the need to consider faults and the startup phase of the scheduling algorithm—is the problem of determining the first machine that becomes idle in any given state^q.

Definition 5.17 (Idle information^(j;^l;^i;^o)). Given a state^q ⁼ ^(A;^S;^{R )}, ^j (the first machine to become idle), its index ^lin^A, the routineⁱit has been executing, and its absolute completion time^o, are defined as follows:

If^jfu ^:^{R (u)}⁼^{waiting gj}^> ⁿ ^m(not all machines have been assigned routines in this parallel step, and no routine has actually finished yet),

j=l=jfu:R (u)=startedgj+1and^o⁼ⁱ⁼⁰

and else (all machines have been assigned routines, and some routine actually finishes),

l=idleindex(A;S;

Here^J is a tuple of length^m, representing the number of routines that have been assigned to each machine, and^(J ^(k;^l))(u)^def⁼

(

J(u) if^u⁶⁼^k;

l else.

5.4. ANALYSIS

Lemma 5.10 (The idle information^(j;^l;^i;^o)is correct). Given a state^q ⁼^(A;^S;^{R )}, the idle information

(j;l;i;o)as defined in Definition 5.17 correctly describes^j, the next machine to become idle, its index^lin^A, the routineⁱthat it has executed (if any), and the absolut completion when this machine becomes idle. ⁱ⁼⁰ and^o⁼⁰indicate that not all machines have been assigned a routine in this parallel step (it corresponds to the start phase of the parallel step).

Proof. While there are more thanⁿ ^mroutines waiting, less than^mmachines have been assigned a routine.

This happens at the beginning of a parallel step, when no routine is finished yet. Hence the next idle machine is given by the number of routines started so far, plus 1.

Otherwise, the first machine to become idle is the machine that appears first in the assignment without any other routines being assigned to it later, where faults have to be taken into account. Faults are represented by the number of routines a machine survives, so the ^m-tuple ^J counts this number for each machine. If

S(j) J(j), machine^jhas survived all previous and the current routine, so if it is the last routine assigned to this machine, then the machine is idle and still working. The first machine for which this holds is the first idle machine (since assignement^Ais ordered by completion time).

If all machines fail before they have completed their final routine (and only then) will the value of the

idleindexfunction be infinity. Assignments have to assure that at least one machine survives until completion of a parallel step.

We can now construct a function that “executes” the scheduling of a routine by mapping a state^q to two succeeding states^q1and^q2, where^q1represents the normal progress of the computation and^q2represents the event that the machine on which the routine is scheduled fails during the execution of this routine.

Definition 5.18 (Eager scheduling on states). Given a state^q ⁼^(A;^S;^{R )}, the function^esis

If^done(q)holds,^es(q)⁼^fqg. Let^(j;^l;^i;^o)be the idle information of state^qas defined by Definition 5.17.

Then completion index to be used later). Otherwise, ⁱ^cand ⁼ candidate(R

) is the next routine to be scheduled.

Then

and ifoperational(S)>1,

q function^esgenerates all possible execution scenarios by either letting a machine survive the assignment of a new routine (^q1), or by assuming that it crashes during this assignment (^q2). This is proven in the following lemmas.

Lemma 5.11 (^esreflects an eager scheduling step).

Proof. Consider a state^q ⁼^(A;^S;^{R )}. If in this state all routines are done, the algorithm terminates.

If there is at least one routine that is not done, the algorithm selects the first machine that is or becomes idle and assigns this routine to it. The completion time of this new routine is its execution time divided by the relative machine speed, plus the time at which the machine becomes idle (which can be⁰).

For each assignment of a routine, there are two cases (as long as there are at least two operational ma-chines): either the machine survives the execution of this routine, or it does not. In both cases, the assignment is added to^A, but in the latter case,^S is marked to indicate that this machine does not survive the execution of this routine. N.B. that the number of operational machines is not allowed to fall below ¹. Therefore, the

idleindexfunction will never be¹.

Definition 5.19 (Set of all executions^Q^es). ^Q^es ⁼ ^lim^k!1^es^k^(q⁰⁾, where the application of^esto a set of states is defined per element,^{es (Q)}⁼^[q2Q

es(q),^Q²²^Qand the initial state is^q0

=([];(n;:::;n);(waiting;:::;waiting )). Lemma 5.12 (^Q^esis finite). ^Q^esis finite, and only a finite number of steps are necessary to generate it, i.e.,

there is a^k0such that^8k^k0

Proof. We have to show that for any state not done, there are only a finite number of successors. Think of the repeated application of^esas a tree. Obviously, the tree is of finite degree.

First note that a state for which^doneholds is a fixpoint of^es. Suppose there is an infinite path^(q^l⁾in the tree,^ql +1

)=false. However, every application of^esreduces the number of routines that are not started or not finished by one (since at least one machine must not fail), and this number cannot fall below zero. Contradiction.

Therefore, by K¨onig’s Lemma, the tree is finite, and^Q^esis finite. And since any state^qfor which^done(q) holds is a fixpoint of^es, a finite number of steps suffice to generate^Q^es.

Lemma 5.13 (^Qesreflects eager scheduling).

Proof. Follows immediately from Lemma 5.11 and Lemma 5.12: Every state in^Qescorresponds to an actual execution of eager scheduling and there are no other executions possible (by Lemma 5.11) since ordered eager scheduling behaves deterministically modulo faults, which are accounted for.

We finally have to compute the probability of any ^q ² ^Qes actually happening. Unlike the proof in Section 5.4.2 and Section 5.4.3, the only probabilistic element here are the machine faults. The occurring faults are described by the survival parameter^S in a state^q ⁼ ^(A;^S;^{R )}. The probability for each machine

1;:::;mbehaving as prescribed by^qis given below.

Lemma 5.14 (^pfjis correct). The probability of a state as defined by Definition 5.20 is correct.

Proof. Analogous to Lemma 5.10.

Hence the final theorem of this section can be formulated as follows.

Theorem 5.3 (Runtime distribution with fixed routine execution times). For ⁿroutines with fixed execu-tion times^ai,ⁱ⁼^1;^:^:^:^;ⁿon a processor of speed¹, and^m^<ⁿprocessors of relative speed^cj and lifetime

j,^j⁼^1;^:^:^:^;^m, the runtime distribution of the successful completion time^Pr(Z ^t)of eager scheduling is

5.5. SOME EXAMPLES

where^ctfis the completion time under faults as generalized from Definition 5.11.

Proof. By Lemma 5.13, ^Qes is the set of all possible successful executions of eager scheduling with the given parameters. By Lemma 5.20, the probability of such a state occurring is

Q independence assumption of machine failures). The state will be successfully completed before or at time ^t only if^H(t ^ctf(A;^S))⁶⁼⁰.

Im Dokument Responsive Execution of Parallel Programs in Distributed Computing Environments (Seite 85-89)