Theory of Parallel and Distributed Systems (WS2016/17)

(1)

(WS2016/17)

Kapitel 2 Sorting with a PRAM

Walter Unger

Lehrstuhl für Informatik 1

12:00 Uhr, den 30. Januar 2017

(2)

Inhalt I

1 Sorting

Simple Sorting Algorithm Improved Algorithm

2 Introduction to optimal Sorting

Lower Bound

Batchers Sorting Algorithm Sorting

3 Algorithmn of Cole Idea

(3)

Very simple Algorithm (Idea)

34 34 12

12 14

14 56

56 23

23 67

67 49

49 27

27 61

61 52

52 57

57 59

59 26

26 41

41 33

33 22

22

0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 8

34

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

12

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

14

1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 12 56

0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 4

23

1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 16

67

1 1 1 0 1 0 0 1 0 0 0 0 1 1 1 1 10 49

0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 1 6

27 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 15

61 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 11

52 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 13

57 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 14

59

0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 5

26

1 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 9

41

0 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 7

33 0

0 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 3 22 3

22

(4)

Very simple Sorting Algorithm

Idea: Compute the position for each element.

Compare pairwise all elements and count the number of smaller elements.

Usen² processors.

Programm: SimpleSort Eingabe:s1,· · ·,sn.

for allPi,j where 16i,j6ndo in parallel ifsi>sj thenPi,j(1)→Ri,j elsePi,j(0)→Ri,j

for alli where 16i 6ndo in parallel for allPi,j where 16j6ndo in parallel

ProcessorsPi,j bestimmenqi =Pn l=1Ri,l. Pi(si)→Rq_i+1.

Complexity:T(n) =O(logn)andP(n) =n². Efficiency: _n^O(n2·O(log^logⁿ⁾n)=O(¹_n).

Model: CREW.

(5)

Improved Algorithm for CREW

Work withP(n)processors (P(n)6n).

Split the input in blocks of sizeO(n/P(n)). O(1) Sort parallel each block. O(n/P(n)·log(n/P(n)))

Merge the blocks pairwise and parallel. O(n/P(n) +logn)·O(logP(n)) Complexity:T(n) =O(n/P(n)·logn+log²n).

Efficiency:Eff(n) = O(nlogn)

O(P(n))·O(n/P(n)·logn+log²n) = O(nlogn) O(n·logn+P(n)·log²n) IsO(1)for P(n)6n/logn.

(6)

Improved Algorithm EREW

Exchange the merge algorithm.

RecallTMerging(EREW)(n) =lsO(n/P(n) +logn·logP(n)).

T(n) =O(n/P(n)·log(n/P(n)) +O(n/P(n)·logP(n) +logn·log²P(n)) T(n) =O((n/P(n) +log²n)·logn)

Efficiency:

Eff(n) = O(nlogn)

O(P(n)·((n/P(n) +log²n)·logn)) IsO(1)ifP(n)<n/log²n.

(7)

Lower Bound

Theorem:

For any parallel sorting algorithmSrt withPSrt(n) =O(n)hold:

TSrt(n) = Ω(log(n)).

Proof:

Lower bound for sequential isΘ(nlogn).

One needsO(nlogn)comparisons.

In each parallel step are at mosto(n)comparisons possible

Thus with less steps we have a contradiction to the lower bound for sequential

Situation at this point:

Inefficient algorithms with:T(n) =O(logn)andP(n) =n². Nearly efficient algorithm with:T(n) =O(log²n)andP(n) =o(n).

(8)

Basic Operation for Sorting

Identify basic operation for sorting.

Assume: sorting key iss1,· · ·,sn. Programm: compare_exchange(i,j) ifsi >sj thenexchangesi ↔sj

Symbolic view (Batcher):

y

x

max(x,y)

min(x,y) Basic building block for sorting networks.

Base for Odd-Even merge

Form this we build the optimal algorithm by Cole

(9)

Odd-even Merge (Definition)

Input: SequenceS= (s1,s2,· · ·,sn). (O.E.d.A.neven)

LetOdd(S)[Even(S)] be the elements ofSwith odd [even] index.

LetS⁰= (s1⁰,s2⁰,· · ·,s_n⁰)be a second sequence.

Then we define:interleave(S,S⁰) = (s1,s1⁰,s2,s2⁰,· · ·,sn,sn⁰).

s1 s2 s3 s4 s5 s6 s7 s8 s1⁰ s2⁰ s3⁰ s4⁰ s5⁰ s6⁰ s7⁰ s8⁰

r1 r2 r3 r4 r5 r6 r7 r8 r9 r10r11r12 r13 r14r15r16

Tinterleave(n) =O(1)mitPinterleave(n) =O(n)

(10)

Odd-even Merge (Definition)

Programm: odd_even(S)

for alli where 1<i<nandi evendo in parallel compare_exchange(i,i+1).

s1

r1

s2

r2

s3

r3

s4

r4

s5

r5

s6

r6

s7

r7

s8

r8

s9

r9

s10

r10

s11

r11

s12

r12

s13

r13

s14

r14

s15

r15

s16

r16

Tcompare_exchange(n) =O(1)mitPcompare_exchange(n) =O(n)

(11)

Odd-even Merge (Definition)

Programm: join1(S,S⁰) odd_even(interleave(S,S⁰)) s1

r1

s2

r2

s3

r3

s4

r4

s5

r5

s6

r6

s7

r7

s8

r8

s9

r9

s10

r10

s11

r11

s12

r12

s13

r13

s14

r14

s15

r15

s16

r16

Tjoin1(n) =O(1)mitPjoin1(n) =O(n)

(12)

Sorting with Merging

Programm: odd_even_merge(S,S⁰)

if|S|=|S⁰|=1thenmerge withcompare_exchange.

Sodd=odd_even_merge(odd(S),odd(S⁰)).

Seven=odd_even_merge(even(S),even(S⁰)).

returnjoin1(Sodd,Seven).

Todd_even_merge(n) =O(logn)mitPodd_even_merge(n) =O(n) Theorem:

The algorithmodd_even_merge sorts two alread sorted sequences into one.

Proof follows.

(13)

Sorting Networks

Theorem:

There exists a sorting algorithm withT(n) =O(log²n)andP(n) =n.

Proof: use divide and conquer, and merging of depthO(logn).

Theorem:

There exists a sorting network of sizeO(nlog²n).

Proof: All calls tocompare_exchangeoperation are independent form the input (oblivious algorithm).

(14)

The 0-1 Principle

Theorem:

If a sorting networkX, resp. sorting algorithm is correct for all 0-1 inputs, then it is also correct for any input.

Proof (by contradiction):

Letf(x)be non-decreasing function:f(si)6f(sj)⇔si6sj.

IfX sorts the sequence(a1,a2,· · ·,an)to(b1,b2,· · ·,bn), then ifX gets (f(a1),f(a2),· · ·,f(an))then the output(f(b1),f(b2),· · ·,f(bn))is also sorted.

Assumebi >bi+1 andf(bi)6=f(bi+1), then we havef(bi)>f(bi+1)in the “sorted” sequence(f(b1),f(b2),· · ·,f(bn)). I.e errors may be kept under the functionf.

Choose nowf:f(bj) =0 forbj<bi andf(bj) =1 otherwise.

Thus the sequence(f(b1),f(b2),· · ·,f(bn))is not sorted, because of f(bi) =1 andf(bi+1) =0.

This is a contradiction.

(15)

Correctness of the Merging

Theorem:

The algorithmodd_even_merge sorts two sorted sequences into a singel one.

Proof:

S has the form:S=0^p1^m−pfor somep with 06p6m.

S⁰ has the form:S⁰=0^q1^m⁰^−qfor someq with 06q6m⁰. Thus the sequenceSodd has the form 0dp/2e+dq/2e

1^∗ AndSevenhas the form 0bp/2c+bq/2c1^∗.

Definiere:d=dp/2e+dq/2e −(bp/2c+bq/2c)

Depending ond we consider three cases:d=0,d=1 andd=2.

(16)

Correctness of the Merging

Ifd=0: Then we have:pandqare even.

Theinterleavestep ofjoin1 has the form:

interleave(Sodd,Seven) = (00)^(p+q)/21^m+m⁰^−p−q The resulting sequences is alread sorted.

Thecompare_exchangestep keeps the order.

Ifd=1: Then we have:pis odd andqis even.

interleave(Sodd,Seven) = (00)^b(p+q)/2c01^m+m⁰^−p−q The resulting sequences is alread sorted.

Ifd=2: Then we have:pandqare odd.

interleave(Sodd,Seven) = (00)^b(p+q)/2c101^m+m⁰^−p−q Thecompare_exchangestep will exchange the 1 on position 2r with the 0 on position 2r+1.

(17)

Testing the Correctness of a Networkeit

Correllar:

The correctness of a merge network may be tested in timeO(n²).

Proof: Test all inputs of the form(0^p1^m−p,0^q1^m⁰^−q).

Theorem:

The test for correctness of a sorting network is NP-hard.

Proof: Literature.

(18)

Situation

Aim: Fast optimal algorihtm.

So farT(n) =log²nbeiP(n) =O(n).

So far: Two loop for merging and sorting.

Idea: make one loop faster, i.e. the merging inO(1).

Problem: With no further information we needΘ(logn)steps.

Idea: compute this additional information during the sorting.

Choose as additional information nice splitting points for merging.

I.e choose positions which split the blocks to be merged of constants size.

Problem: How to compute these points?

Solution is the base for the algorithm of Cole.

(19)

The Merging-Tree, a View

Prozessors

Time

(20)

Idea

Before merging two sequences we will merge two sub-sequences.

Choose as sub-sequence eachk-th element of the original sequence.

These sub-sequences will be used as crutch/support to do the finial mergeing.

I.e. these sub-sequences are used as a kind of “preview”.

Using these crutch points we will be able to do the merging inO(1)time.

Total running time will beO(logn).

The additional effort should be at mostO(1).

(21)

Sorting Introduction to optimal Sorting Algorithmn of Cole

2:19 Idea Walter Unger 30.1.2017 12:00 WS2016/17 Z

The Merging-Tree, a View

Each Prozessor starts with 256 elements

TimeandSpaceSpaceSpaceSpaceSpace

↑each↑

has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→

(22)

Definition

LetJandK be two sorted sequences.

Note: without additional information we could not mergeJandKin O(1)time withO(n)processors.

LetLbe a third sequence, which will be called in the following good sampler forJandK.

Informal:|L|<|J|and the elements ofLare evenly spread inJ.

Leta<b,c is betweenaandbiffa<c 6b.

The rank ofe inSisrng(e,S) =|{x∈S|x<e}|.

Notation:RngA,B is the functionRngA,B :A7→N^|A|with RngA,B(e) =rng(e,B)for alle∈A.

RngA,B is called the rank betweenAandB.

Depending on the contextRngA,B could also be an array with|A|

elements.

(23)

Good Sampler

rng(e,S) =|{x∈S|x<e}|andRng_A,B:A7→N^|A|withRng_A,B(e) =rng(e,B)

Definition:

We callLa good sampler ofJ, iff:

LandJare sorted.

Between anyk+1 succeeding elements of{−∞} ∪L∪ {+∞}are at most 2·k+1 many elements inJ.

Example:

LetSbe a sorted sequence

LetS1 be the sequence consisting of each forth element ofS.

ThenS1 is a good sampler ofS.

LetS2 be the sequence consisting of each second element ofS.

ThenS1 is a good sampler ofS2. Example (k=1): 1,2,3,4.

Example (k=3): 1,2,3,4,5,6,7,8,9,10.

(24)

Merging using a Good Sampler

LetJ,KandLbe sorted sequences.

LetLbe a good sampler of bothJandK.

LetL= (l1,l2,· · ·,ls).

Programm: merge_with_help(J,K,L) for alli where 16i 6s do in parallel AssignJi ={x∈J|li−1<x6li}.

AssignKi ={x∈K|li−1<x 6li}.

Assignresi =merge(Ji,Ki).

return(res1,res2,· · ·,ress).

Situation:

K1

L1

l1

K2

L2

l2

K3

L3

l3

K4

L4

l4

K5

L5

l5

K6

L6

l6

K7

L7

l7

K8

L8

l8

K9

L9

(25)

Merging using a Good Sampler (Example)

K= (1,4,6,9,11,12,13,16,19,20) J= (2,3,7,8,10,14,15,17,18,21) L= (5,10,12,17)

Then we have:

i Ki Ji merge(Ki,Ji)

1 (1,4) (2,3) (1,2,3,4) 2 (6,9) (7,8,10) (6,7,8,9,10)

3 (11,12) ∅ (11,12)

4 (13,16) (14,15,17) (13,14,15,16,17) 5 (19,20) (18,21) (18,19,20,21)

Result:(1,2,3,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)

(26)

Merging with good sampler (running time)

Lemma:

IfLis a good sampler forK andJ.

IfRngL,J,RngL,K,RngK,LandRngJ,L is known, then we have:

Tmerge_with_help(J,K,L)=O(1)withPmerge_with_help(J,K,L)=O(|J|+|K|).

Proof:

The same way as in the merging introduced in the last chapter.

Each processor usesRngL,J resp.RngL,K to know the area to read its input sequences.

Each processor usesRngJ,LandRngK,Lto know the area to write its output sequence.

(27)

Properties of Good Samplers

Lemma:

IfX is a good sampler forX⁰ and Y is a good sampler forY⁰, then

merge(X,Y)is a good sampler forX⁰ [resp.Y⁰].

Proof:

ConsiderX as a good sampler forX⁰.

Any additional element make the good sampler just ’‘better”.

Note:

merge(X,Y)is not necessary a sampler for merge(X⁰,Y⁰).

X = (2,7)andX⁰= (2,5,6,7).

Y = (1,8)andY⁰= (1,3,4,8).

merge(X,Y) = (1,2,7,8)and merge(X⁰,Y⁰) = (1,2,3,4,5,6,7,8).

There are 5 elements between 2 and 7.

(28)

Properties of Good Samplers

Lemma:

LetX be a good sampler forX⁰ and letY be a good sampler forY⁰.

Then there are at most 2·r+2 elements of merge(X⁰,Y⁰)betweenr successive elements of merge(X,Y).

Proof:

W.l.o.g. containX andY elements−∞and+∞.

Let(e1,e2,· · ·,er)successive elements of merge(X,Y).

W.l.o.g. lete1∈X.

Consider now two cases:er ∈X ander∈Y. Let in the following be

x = |X∩ {e1,e2,· · ·,er}| and y = |Y ∩ {e1,e2,· · ·,er}|.

(29)

Properties of Good Samplers

(e1,e2,· · ·,er)successive elements of merge(X,Y)andx=|X∩{e₁,e2,· · ·,er}|andy=|Y∩{e₁,e2,· · ·,er}|ande1∈X

Lemma:

LetX be a good sampler forX⁰ and letY be a good sampler forY⁰. Then there are at most 2·r+2 elements of merge(X⁰,Y⁰)betweenr successive elements of merge(X,Y).

Proof: W.l.o.g. lete1∈X. If:er ∈X

Betweene1 ander are at most 2(x−1) +1 elements ofX⁰.

Betweene1 ander are at most 2(y+1) +1 elements ofY⁰, because they are betweeny+2 elements ofY.

Thus we get: 2(x−1) +1+2(y+1) +1=2·r+2.

Examplex=3 andy =2:

e1∈X e2∈Y e3∈X e4∈Y e5∈X

a∈Y b∈Y

(30)

Properties of Good Samplers

(e1,e2,· · ·,er)successive elements of merge(X,Y)andx=|X∩{e₁,e2,· · ·,er}|andy=|Y∩{e₁,e2,· · ·,er}|ande1∈X

Lemma:

LetX be a good sampler forX⁰ and letY be a good sampler forY⁰. Then there are at most 2·r+2 elements of merge(X⁰,Y⁰)betweenr successive elements of merge(X,Y).

Proof: W.l.o.g. lete1∈X. If:er ∈Y

Adde0∈Y withe0<e1 to the good sampler.

Adder+1∈X wither <er+1 to the good sampler.

The elements fromX⁰between(e1,e2,· · ·,er)are betweenx+1 elements fromX.

The elements fromY⁰ between(e1,e2,· · ·,er)are betweeny+1 elements fromY.

Thus we get: 2x+1+2y+1=2r+2.

Examplex=2 andy =2:

e1∈X e2∈Y e3∈X e4∈Y

e0∈Y e5∈X

(31)

Properties of good sampler

At most 2·r+2 elements of merge(X⁰,Y⁰)betweenrsuccessive elements of merge(X,Y)

Definition

Let reduce(X)be the operation, which chooses fromX every forth element.

Lemma:

IfX is a good sampler forX⁰ and Y is a good sampler forY⁰,

then reduce(merge(X,Y))is a good sampler for reduce(merge(X⁰,Y⁰)).

Proof:

Considerk+1 successive elements(e1,e2,· · ·,ek+1)of reduce(merge(X,Y)).

At most 4k+1 elements of merge(X,Y)are betweene1,e2,· · ·,ek+1

includinge1,ek+1.

At most 8k+4 elements of merge(X⁰,Y⁰)are between these 4k+1 elements.

At most 2k+1 elements of reduce(merge(X⁰,Y⁰))are between (e1,e2,· · ·,ek+1).

(32)

Overview to the Algorithm of Cole

We start with an explanation using a complete binary tree.

The leave contain the elements to be sorted.

Interior nodesv “cares” about as many elements as the number of leaves belowv.

A nodev receives from its sons sequences of already sorted sequences.

The “length” of the sequences doubles each time.

Nodev receives sequencesX1,X2,· · ·,Xr andY1,Y2,· · ·,Yr. Nodev sends to his father sequencesZ1,Z2,· · ·,Zr,Zr+1. Nodev updates a interior help-sequencwevalv.

It holds:|X1|=|Y1|=|Z1|=1.

It holds:|Xi|=2· |Xi−1|,|Yi|=2· |Yi−1|and|Zi|=2· |Zi−1|.

(33)

One basic Operation of an interior Node v

Receives from its sons the two sequencesX andY. Computes:valv =merge_with_help(X,Y,valv).

Sends to its father: reduce(valv)tillv has sorted all received sequences.

Sends to its father each second element fromvalv, ifv is done with sorting.

Sends to its fathervalv, ifv finishes sorting two steps before.

Example:

Step Left Right valv Father

1 7 8 7,8 ∅

2 3,7 5,8 3,5,7,8 8

3 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 4,8 4 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 2,4,6,8 5 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 1,2,3,4,5,6,7,8

(34)

Basic operation of a interior Node v

Receives from its sons the two sequencesX andY. Computes:valv =merge_with_help(X,Y,valv).

Sends to its father: reduce(valv)tillv has sorted all received sequences.

Sends to its father each second element fromvalv, ifv is done with sorting.

Sends to its fathervalv, ifv finishes sorting two steps before.

Thus we get the following pattern:

X1 X2 X3 X4 · · · Xr

Z1 Z2 · · · Zr Zr+1 Zr+2

If a nodex is finshed aftert steps, then will the father ofx be finished aftert+3 steps.

Thus we get a running time of 3 logn.

(35)

Invariant

Invariant:

EachXi is a good sampler ofXi+1. EachYi is a good sampler ofYi+1. EachZi is a good sampler ofZi+1. EachXi is half as big asXi+1. EachYi is half as big asYi+1. EachZi is half as big asZi+1.

|X1|=|Y1|=|Z1|=1.

(36)

Situation

Running time isO(logn).

The inner nodesv need|valv|many processors.

We still have to proof that the number of processors is inO(n).

PRAM Model has to be verified.

Important: The computation of the valuesRngX,Y has to be shown.

These values will be in the following also transmitted and updated.

(37)

Computing the Ranks

In each step will compute:merge_with_help(Xi+1,Yi+1,merge(Xi,Yi)).

Using the Lemma from above we have: merge(Xi,Yi)is a good sampler ofXi+1andYi+1.

LetL=merge(Xi,Yi),J=Xi+1andK=Yi+1.

We have to compute:RngL,J,RngL,K,RngJ,L andRngK,L. Invariant:

LetS1,S2,· · ·,Spbe a sequence of sequences at nodev. Then nodec also knows:RngS_i+1,S_i for 16i <p.

Furthermore for each sequneceS is known:RngS,S.

(38)

Computing the Ranks

Lemma:

LetS= (b1,b2,· · ·,bk)be a sortierted sequence, then we may compute the rank ofa∈Sin timeO(1)usingk processors.

Proof:

Programm: rng1(a,S)

for allPi where 16i 6k do in parallel ifbi <a6bi+1 then returni .

Note, the program has no write-conflicts.

Note, it could be changed, to avoid read-conflicts.

(39)

Computing the Ranks

we have rnk(a,S)

Lemma:

LetS1,S2,S be two sorted sequences withS=merge(S1,S2)andS1∩S2=∅.

Then we may compute RnkS₁,S₂ and RnkS₂,S₁ in timeO(1)usingO(|S|) processors.

Proof:

We do know Rnk_S,S, Rnk_S₁_,S₁ and Rnk_S₂_,S₂.

Furthermore we have: rnk(a,S2) =rnk(a,merge(S1,S2))−rnk(a,S1).

The claim follows directly.

(40)

Computing the Ranks

we have rnk(a,S)and Rnk_S₁_,S₂and Rnk_S₂_,S₁

Lemma:

LetX be a good sampler ofX⁰. LetY be a good sampler ofY⁰. LetU=merge(X,Y).

Aussume RnkX⁰,X and RnkY⁰,Y are known.

Then we may compute in timeO(1)usingO(|X|+|Y|)processors RnkX⁰,U, Rnk_Y0,U, Rnk_U,X0 and Rnk_U,X0.

Proof:

First we compute Rnk_X0,Uand Rnk_Y0,U. Then we compute Rnk_X_,X0 and Rnk_Y_,Y0. Finally we compute RnkU,X⁰ and RnkU,Y⁰.

(41)

Computing the Ranks (Rnk

_X⁰_,U

)

LetX= (a1,a2,· · ·,ak).

Let w.l.o.g.a0=−∞andak+1= +∞.

Using a good samplerX we splitX⁰ intoX1⁰,X2⁰,· · ·,X_k⁰,X_k+1⁰ . Note: Rnk_X0,X is known.

Splitting may be done in timeO(1)usingO(|X|)processors.

LetUi be the sequence of elements ofY which are betweenai−1 andai. Thus we get:

Programm: RnkX⁰,U

for alli where 16i 6k+1do in parallel for allx∈X_i⁰ do

rnk(x,U) =rnk(ai−1,U) +rnk(x,Ui) Running timeO(1)usingPk+1

i=1|Ui|processors.

(42)

Computing the Ranks (Rnk

_X_,X⁰

)

Letai ∈X.

Leta⁰ minimal element inXi+1⁰ .

The rank ofai inX⁰ is the same as the rank ofa⁰inX⁰. This rank is already known.

This may be computed in timeO(1)using one processor.

(43)

Computing the Ranks (Rnk

_U,X⁰

)

Note: Rnk_U,X0 consists of RnkX,X⁰ and RnkY,X⁰. RnkX,X⁰is alread known.

Still to compute: RnkY,X⁰.

RnkY,X may be computed using the previous lemma.

We compute rnk(a,X⁰)using rnk(a,X)and RnkX,X⁰.

Thus we compute Rnk_U,X0 withO(|U|)processors and timeO(1).

(44)

Computing the Ranks

Consider the step

merge_with_help(J=Xi+1,K=Yi+1,L=merge(Xi,Yi):

Using the invariant we know: RnkJ,X_i and RnkK,Y_i.

Using the above considerations we may compute: RnkL,J, RnkL,K, RnkJ,L

and RnkK,L.

Still to be computed: Rnkreduce(merge(X_i+1,Y_i+1)),reduce(merge(X_i,Y_i))

Known: Rnk_X_i+1_,merge(X_i_,Y_i₎and Rnk_Y_i+1_,merge(X_i_,Y_i₎. It is now easy to compute: RnkX_i+1,reduce(merge(X_i,Y_i)) and Rnk_Y_i+1,reduce(merge(X_i,Y_i)).

Also easy to compute: Rnkmerge(X_i+1,Y_i+1),reduce(merge(X_i,Y_i)).

(45)

Algorithmn of Cole

Theorem:

We may sortnvalues on a CREW PRAM usingO(n)processors in time O(logn)n.

Proof: discussed before.

Theorem:

We may sortnvalues on a EREW PRAM usingO(n)processors in time O(logn)n.

Proof: see literature.

Theorem:

There exists a sorting network withO(n)processors and depthO(logn).

Proof: see literature.

(46)

Literature

Literatur:

A. Gibbons, W. Rytter:

Efficient Parallel Algorithms. Cambridge University Press 1990.

Chapter 5.

(47)

Questions

Explain the motivation behind parallel systems.

Explain the ideas of the different sorting algorithms.

Explain the different running times of these sorting algorihtms.

Explain the different efficiency of these sorting algorihtms.

Explain the idea of the algorithm of Cole.

Explain the running time of the algorithm of Cole.

Explain the number of processors used in the algorithm of Cole.

(48)

Legende

: Nicht relevant

: Grundlagen, die implizit genutzt werden : Idee des Beweises oder des Vorgehens : Struktur des Beweises oder des Vorgehens : Vollständiges Wissen