• Keine Ergebnisse gefunden

Theory of Parallel and Distributed Systems (WS2016/17)

N/A
N/A
Protected

Academic year: 2022

Aktie "Theory of Parallel and Distributed Systems (WS2016/17)"

Copied!
48
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

(WS2016/17)

Kapitel 2 Sorting with a PRAM

Walter Unger

Lehrstuhl für Informatik 1

12:00 Uhr, den 30. Januar 2017

(2)

Inhalt I

1 Sorting

Simple Sorting Algorithm Improved Algorithm

2 Introduction to optimal Sorting

Lower Bound

Batchers Sorting Algorithm Sorting

3 Algorithmn of Cole Idea

(3)

Very simple Algorithm (Idea)

34 34 12

12 14

14 56

56 23

23 67

67 49

49 27

27 61

61 52

52 57

57 59

59 26

26 41

41 33

33 22

22

0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 8

34

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

12

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

14

1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 12 56

0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 4

23

1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 16

67

1 1 1 0 1 0 0 1 0 0 0 0 1 1 1 1 10 49

0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 1 6

27 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 15

61 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 11

52 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 13

57 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 14

59

0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 5

26

1 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 9

41

0 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 7

33 0

0 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 3 22 3

22

(4)

Very simple Sorting Algorithm

Idea: Compute the position for each element.

Compare pairwise all elements and count the number of smaller elements.

Usen2 processors.

Programm: SimpleSort Eingabe:s1,· · ·,sn.

for allPi,j where 16i,j6ndo in parallel ifsi>sj thenPi,j(1)→Ri,j elsePi,j(0)→Ri,j

for alli where 16i 6ndo in parallel for allPi,j where 16j6ndo in parallel

ProcessorsPi,j bestimmenqi =Pn l=1Ri,l. Pi(si)→Rqi+1.

Complexity:T(n) =O(logn)andP(n) =n2. Efficiency: nO(n2·O(loglogn)n)=O(1n).

Model: CREW.

(5)

Improved Algorithm for CREW

Work withP(n)processors (P(n)6n).

Split the input in blocks of sizeO(n/P(n)). O(1) Sort parallel each block. O(n/P(n)·log(n/P(n)))

Merge the blocks pairwise and parallel. O(n/P(n) +logn)·O(logP(n)) Complexity:T(n) =O(n/P(n)·logn+log2n).

Efficiency:Eff(n) = O(nlogn)

O(P(n))·O(n/P(n)·logn+log2n) = O(nlogn) O(n·logn+P(n)·log2n) IsO(1)for P(n)6n/logn.

(6)

Improved Algorithm EREW

Exchange the merge algorithm.

RecallTMerging(EREW)(n) =lsO(n/P(n) +logn·logP(n)).

T(n) =O(n/P(n)·log(n/P(n)) +O(n/P(n)·logP(n) +logn·log2P(n)) T(n) =O((n/P(n) +log2n)·logn)

Efficiency:

Eff(n) = O(nlogn)

O(P(n)·((n/P(n) +log2n)·logn)) IsO(1)ifP(n)<n/log2n.

(7)

Lower Bound

Theorem:

For any parallel sorting algorithmSrt withPSrt(n) =O(n)hold:

TSrt(n) = Ω(log(n)).

Proof:

Lower bound for sequential isΘ(nlogn).

One needsO(nlogn)comparisons.

In each parallel step are at mosto(n)comparisons possible

Thus with less steps we have a contradiction to the lower bound for sequential

Situation at this point:

Inefficient algorithms with:T(n) =O(logn)andP(n) =n2. Nearly efficient algorithm with:T(n) =O(log2n)andP(n) =o(n).

(8)

Basic Operation for Sorting

Identify basic operation for sorting.

Assume: sorting key iss1,· · ·,sn. Programm: compare_exchange(i,j) ifsi >sj thenexchangesi ↔sj

Symbolic view (Batcher):

y

x

max(x,y)

min(x,y) Basic building block for sorting networks.

Base for Odd-Even merge

Form this we build the optimal algorithm by Cole

(9)

Odd-even Merge (Definition)

Input: SequenceS= (s1,s2,· · ·,sn). (O.E.d.A.neven)

LetOdd(S)[Even(S)] be the elements ofSwith odd [even] index.

LetS0= (s10,s20,· · ·,sn0)be a second sequence.

Then we define:interleave(S,S0) = (s1,s10,s2,s20,· · ·,sn,sn0).

s1 s2 s3 s4 s5 s6 s7 s8 s10 s20 s30 s40 s50 s60 s70 s80

r1 r2 r3 r4 r5 r6 r7 r8 r9 r10r11r12 r13 r14r15r16

Tinterleave(n) =O(1)mitPinterleave(n) =O(n)

(10)

Odd-even Merge (Definition)

Programm: odd_even(S)

for alli where 1<i<nandi evendo in parallel compare_exchange(i,i+1).

s1

r1

s2

r2

s3

r3

s4

r4

s5

r5

s6

r6

s7

r7

s8

r8

s9

r9

s10

r10

s11

r11

s12

r12

s13

r13

s14

r14

s15

r15

s16

r16

Tcompare_exchange(n) =O(1)mitPcompare_exchange(n) =O(n)

(11)

Odd-even Merge (Definition)

Programm: join1(S,S0) odd_even(interleave(S,S0)) s1

r1

s2

r2

s3

r3

s4

r4

s5

r5

s6

r6

s7

r7

s8

r8

s9

r9

s10

r10

s11

r11

s12

r12

s13

r13

s14

r14

s15

r15

s16

r16

Tjoin1(n) =O(1)mitPjoin1(n) =O(n)

(12)

Sorting with Merging

Programm: odd_even_merge(S,S0)

if|S|=|S0|=1thenmerge withcompare_exchange.

Sodd=odd_even_merge(odd(S),odd(S0)).

Seven=odd_even_merge(even(S),even(S0)).

returnjoin1(Sodd,Seven).

Todd_even_merge(n) =O(logn)mitPodd_even_merge(n) =O(n) Theorem:

The algorithmodd_even_merge sorts two alread sorted sequences into one.

Proof follows.

(13)

Sorting Networks

Theorem:

There exists a sorting algorithm withT(n) =O(log2n)andP(n) =n.

Proof: use divide and conquer, and merging of depthO(logn).

Theorem:

There exists a sorting network of sizeO(nlog2n).

Proof: All calls tocompare_exchangeoperation are independent form the input (oblivious algorithm).

(14)

The 0-1 Principle

Theorem:

If a sorting networkX, resp. sorting algorithm is correct for all 0-1 inputs, then it is also correct for any input.

Proof (by contradiction):

Letf(x)be non-decreasing function:f(si)6f(sj)⇔si6sj.

IfX sorts the sequence(a1,a2,· · ·,an)to(b1,b2,· · ·,bn), then ifX gets (f(a1),f(a2),· · ·,f(an))then the output(f(b1),f(b2),· · ·,f(bn))is also sorted.

Assumebi >bi+1 andf(bi)6=f(bi+1), then we havef(bi)>f(bi+1)in the “sorted” sequence(f(b1),f(b2),· · ·,f(bn)). I.e errors may be kept under the functionf.

Choose nowf:f(bj) =0 forbj<bi andf(bj) =1 otherwise.

Thus the sequence(f(b1),f(b2),· · ·,f(bn))is not sorted, because of f(bi) =1 andf(bi+1) =0.

This is a contradiction.

(15)

Correctness of the Merging

Theorem:

The algorithmodd_even_merge sorts two sorted sequences into a singel one.

Proof:

S has the form:S=0p1m−pfor somep with 06p6m.

S0 has the form:S0=0q1m0−qfor someq with 06q6m0. Thus the sequenceSodd has the form 0dp/2e+dq/2e

1 AndSevenhas the form 0bp/2c+bq/2c1.

Definiere:d=dp/2e+dq/2e −(bp/2c+bq/2c)

Depending ond we consider three cases:d=0,d=1 andd=2.

(16)

Correctness of the Merging

Ifd=0: Then we have:pandqare even.

Theinterleavestep ofjoin1 has the form:

interleave(Sodd,Seven) = (00)(p+q)/21m+m0−p−q The resulting sequences is alread sorted.

Thecompare_exchangestep keeps the order.

Ifd=1: Then we have:pis odd andqis even.

Theinterleavestep ofjoin1 has the form:

interleave(Sodd,Seven) = (00)b(p+q)/2c01m+m0−p−q The resulting sequences is alread sorted.

Ifd=2: Then we have:pandqare odd.

Theinterleavestep ofjoin1 has the form:

interleave(Sodd,Seven) = (00)b(p+q)/2c101m+m0−p−q Thecompare_exchangestep will exchange the 1 on position 2r with the 0 on position 2r+1.

(17)

Testing the Correctness of a Networkeit

Correllar:

The correctness of a merge network may be tested in timeO(n2).

Proof: Test all inputs of the form(0p1m−p,0q1m0−q).

Theorem:

The test for correctness of a sorting network is NP-hard.

Proof: Literature.

(18)

Situation

Aim: Fast optimal algorihtm.

So farT(n) =log2nbeiP(n) =O(n).

So far: Two loop for merging and sorting.

Idea: make one loop faster, i.e. the merging inO(1).

Problem: With no further information we needΘ(logn)steps.

Idea: compute this additional information during the sorting.

Choose as additional information nice splitting points for merging.

I.e choose positions which split the blocks to be merged of constants size.

Problem: How to compute these points?

Solution is the base for the algorithm of Cole.

(19)

The Merging-Tree, a View

Prozessors

Time

(20)

Idea

Before merging two sequences we will merge two sub-sequences.

Choose as sub-sequence eachk-th element of the original sequence.

These sub-sequences will be used as crutch/support to do the finial mergeing.

I.e. these sub-sequences are used as a kind of “preview”.

Using these crutch points we will be able to do the merging inO(1)time.

Total running time will beO(logn).

The additional effort should be at mostO(1).

(21)

Sorting Introduction to optimal Sorting Algorithmn of Cole

2:19 Idea Walter Unger 30.1.2017 12:00 WS2016/17 Z

The Merging-Tree, a View

Each Prozessor starts with 256 elements

TimeandSpaceSpaceSpaceSpaceSpace

each

has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→ sends 4→ sends 16→ sends 64→ sends 256→ has 4→ has 16→ has 64→ has 256→

(22)

Definition

LetJandK be two sorted sequences.

Note: without additional information we could not mergeJandKin O(1)time withO(n)processors.

LetLbe a third sequence, which will be called in the following good sampler forJandK.

Informal:|L|<|J|and the elements ofLare evenly spread inJ.

Leta<b,c is betweenaandbiffa<c 6b.

The rank ofe inSisrng(e,S) =|{x∈S|x<e}|.

Notation:RngA,B is the functionRngA,B :A7→N|A|with RngA,B(e) =rng(e,B)for alle∈A.

RngA,B is called the rank betweenAandB.

Depending on the contextRngA,B could also be an array with|A|

elements.

(23)

Good Sampler

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

Definition:

We callLa good sampler ofJ, iff:

LandJare sorted.

Between anyk+1 succeeding elements of{−∞} ∪L∪ {+∞}are at most 2·k+1 many elements inJ.

Example:

LetSbe a sorted sequence

LetS1 be the sequence consisting of each forth element ofS.

ThenS1 is a good sampler ofS.

LetS2 be the sequence consisting of each second element ofS.

ThenS1 is a good sampler ofS2. Example (k=1): 1,2,3,4.

Example (k=3): 1,2,3,4,5,6,7,8,9,10.

(24)

Merging using a Good Sampler

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

LetJ,KandLbe sorted sequences.

LetLbe a good sampler of bothJandK.

LetL= (l1,l2,· · ·,ls).

Programm: merge_with_help(J,K,L) for alli where 16i 6s do in parallel AssignJi ={x∈J|li−1<x6li}.

AssignKi ={x∈K|li−1<x 6li}.

Assignresi =merge(Ji,Ki).

return(res1,res2,· · ·,ress).

Situation:

K1

L1

l1

K2

L2

l2

K3

L3

l3

K4

L4

l4

K5

L5

l5

K6

L6

l6

K7

L7

l7

K8

L8

l8

K9

L9

(25)

Merging using a Good Sampler (Example)

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

K= (1,4,6,9,11,12,13,16,19,20) J= (2,3,7,8,10,14,15,17,18,21) L= (5,10,12,17)

Then we have:

i Ki Ji merge(Ki,Ji)

1 (1,4) (2,3) (1,2,3,4) 2 (6,9) (7,8,10) (6,7,8,9,10)

3 (11,12) ∅ (11,12)

4 (13,16) (14,15,17) (13,14,15,16,17) 5 (19,20) (18,21) (18,19,20,21)

Result:(1,2,3,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)

(26)

Merging with good sampler (running time)

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

Lemma:

IfLis a good sampler forK andJ.

IfRngL,J,RngL,K,RngK,LandRngJ,L is known, then we have:

Tmerge_with_help(J,K,L)=O(1)withPmerge_with_help(J,K,L)=O(|J|+|K|).

Proof:

The same way as in the merging introduced in the last chapter.

Each processor usesRngL,J resp.RngL,K to know the area to read its input sequences.

Each processor usesRngJ,LandRngK,Lto know the area to write its output sequence.

(27)

Properties of Good Samplers

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

Lemma:

IfX is a good sampler forX0 and Y is a good sampler forY0, then

merge(X,Y)is a good sampler forX0 [resp.Y0].

Proof:

ConsiderX as a good sampler forX0.

Any additional element make the good sampler just ’‘better”.

Note:

merge(X,Y)is not necessary a sampler for merge(X0,Y0).

X = (2,7)andX0= (2,5,6,7).

Y = (1,8)andY0= (1,3,4,8).

merge(X,Y) = (1,2,7,8)and merge(X0,Y0) = (1,2,3,4,5,6,7,8).

There are 5 elements between 2 and 7.

(28)

Properties of Good Samplers

rng(e,S) =|{xS|x<e}|andRngA,B:A7→N|A|withRngA,B(e) =rng(e,B)

Lemma:

LetX be a good sampler forX0 and letY be a good sampler forY0.

Then there are at most 2·r+2 elements of merge(X0,Y0)betweenr successive elements of merge(X,Y).

Proof:

W.l.o.g. containX andY elements−∞and+∞.

Let(e1,e2,· · ·,er)successive elements of merge(X,Y).

W.l.o.g. lete1∈X.

Consider now two cases:er ∈X ander∈Y. Let in the following be

x = |X∩ {e1,e2,· · ·,er}| and y = |Y ∩ {e1,e2,· · ·,er}|.

(29)

Properties of Good Samplers

(e1,e2,· · ·,er)successive elements of merge(X,Y)andx=|X∩{e1,e2,· · ·,er}|andy=|Y∩{e1,e2,· · ·,er}|ande1X

Lemma:

LetX be a good sampler forX0 and letY be a good sampler forY0. Then there are at most 2·r+2 elements of merge(X0,Y0)betweenr successive elements of merge(X,Y).

Proof: W.l.o.g. lete1∈X. If:er ∈X

Betweene1 ander are at most 2(x−1) +1 elements ofX0.

Betweene1 ander are at most 2(y+1) +1 elements ofY0, because they are betweeny+2 elements ofY.

Thus we get: 2(x−1) +1+2(y+1) +1=2·r+2.

Examplex=3 andy =2:

e1∈X e2∈Y e3∈X e4∈Y e5∈X

a∈Y b∈Y

(30)

Properties of Good Samplers

(e1,e2,· · ·,er)successive elements of merge(X,Y)andx=|X∩{e1,e2,· · ·,er}|andy=|Y∩{e1,e2,· · ·,er}|ande1X

Lemma:

LetX be a good sampler forX0 and letY be a good sampler forY0. Then there are at most 2·r+2 elements of merge(X0,Y0)betweenr successive elements of merge(X,Y).

Proof: W.l.o.g. lete1∈X. If:er ∈Y

Adde0∈Y withe0<e1 to the good sampler.

Adder+1∈X wither <er+1 to the good sampler.

The elements fromX0between(e1,e2,· · ·,er)are betweenx+1 elements fromX.

The elements fromY0 between(e1,e2,· · ·,er)are betweeny+1 elements fromY.

Thus we get: 2x+1+2y+1=2r+2.

Examplex=2 andy =2:

e1∈X e2∈Y e3∈X e4∈Y

e0∈Y e5∈X

(31)

Properties of good sampler

At most 2·r+2 elements of merge(X0,Y0)betweenrsuccessive elements of merge(X,Y)

Definition

Let reduce(X)be the operation, which chooses fromX every forth element.

Lemma:

IfX is a good sampler forX0 and Y is a good sampler forY0,

then reduce(merge(X,Y))is a good sampler for reduce(merge(X0,Y0)).

Proof:

Considerk+1 successive elements(e1,e2,· · ·,ek+1)of reduce(merge(X,Y)).

At most 4k+1 elements of merge(X,Y)are betweene1,e2,· · ·,ek+1

includinge1,ek+1.

At most 8k+4 elements of merge(X0,Y0)are between these 4k+1 elements.

At most 2k+1 elements of reduce(merge(X0,Y0))are between (e1,e2,· · ·,ek+1).

(32)

Overview to the Algorithm of Cole

We start with an explanation using a complete binary tree.

The leave contain the elements to be sorted.

Interior nodesv “cares” about as many elements as the number of leaves belowv.

A nodev receives from its sons sequences of already sorted sequences.

The “length” of the sequences doubles each time.

Nodev receives sequencesX1,X2,· · ·,Xr andY1,Y2,· · ·,Yr. Nodev sends to his father sequencesZ1,Z2,· · ·,Zr,Zr+1. Nodev updates a interior help-sequencwevalv.

It holds:|X1|=|Y1|=|Z1|=1.

It holds:|Xi|=2· |Xi−1|,|Yi|=2· |Yi−1|and|Zi|=2· |Zi−1|.

(33)

One basic Operation of an interior Node v

Receives from its sons the two sequencesX andY. Computes:valv =merge_with_help(X,Y,valv).

Sends to its father: reduce(valv)tillv has sorted all received sequences.

Sends to its father each second element fromvalv, ifv is done with sorting.

Sends to its fathervalv, ifv finishes sorting two steps before.

Example:

Step Left Right valv Father

1 7 8 7,8 ∅

2 3,7 5,8 3,5,7,8 8

3 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 4,8 4 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 2,4,6,8 5 1,3,4,7 2,5,6,8 1,2,3,4,5,6,7,8 1,2,3,4,5,6,7,8

(34)

Basic operation of a interior Node v

Receives from its sons the two sequencesX andY. Computes:valv =merge_with_help(X,Y,valv).

Sends to its father: reduce(valv)tillv has sorted all received sequences.

Sends to its father each second element fromvalv, ifv is done with sorting.

Sends to its fathervalv, ifv finishes sorting two steps before.

Thus we get the following pattern:

X1 X2 X3 X4 · · · Xr

Z1 Z2 · · · Zr Zr+1 Zr+2

If a nodex is finshed aftert steps, then will the father ofx be finished aftert+3 steps.

Thus we get a running time of 3 logn.

(35)

Invariant

Invariant:

EachXi is a good sampler ofXi+1. EachYi is a good sampler ofYi+1. EachZi is a good sampler ofZi+1. EachXi is half as big asXi+1. EachYi is half as big asYi+1. EachZi is half as big asZi+1.

|X1|=|Y1|=|Z1|=1.

(36)

Situation

Running time isO(logn).

The inner nodesv need|valv|many processors.

We still have to proof that the number of processors is inO(n).

PRAM Model has to be verified.

Important: The computation of the valuesRngX,Y has to be shown.

These values will be in the following also transmitted and updated.

(37)

Computing the Ranks

In each step will compute:merge_with_help(Xi+1,Yi+1,merge(Xi,Yi)).

Using the Lemma from above we have: merge(Xi,Yi)is a good sampler ofXi+1andYi+1.

LetL=merge(Xi,Yi),J=Xi+1andK=Yi+1.

We have to compute:RngL,J,RngL,K,RngJ,L andRngK,L. Invariant:

LetS1,S2,· · ·,Spbe a sequence of sequences at nodev. Then nodec also knows:RngSi+1,Si for 16i <p.

Furthermore for each sequneceS is known:RngS,S.

(38)

Computing the Ranks

Lemma:

LetS= (b1,b2,· · ·,bk)be a sortierted sequence, then we may compute the rank ofa∈Sin timeO(1)usingk processors.

Proof:

Programm: rng1(a,S)

for allPi where 16i 6k do in parallel ifbi <a6bi+1 then returni .

Note, the program has no write-conflicts.

Note, it could be changed, to avoid read-conflicts.

(39)

Computing the Ranks

we have rnk(a,S)

Lemma:

LetS1,S2,S be two sorted sequences withS=merge(S1,S2)andS1∩S2=∅.

Then we may compute RnkS1,S2 and RnkS2,S1 in timeO(1)usingO(|S|) processors.

Proof:

We do know RnkS,S, RnkS1,S1 and RnkS2,S2.

Furthermore we have: rnk(a,S2) =rnk(a,merge(S1,S2))−rnk(a,S1).

The claim follows directly.

(40)

Computing the Ranks

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Lemma:

LetX be a good sampler ofX0. LetY be a good sampler ofY0. LetU=merge(X,Y).

Aussume RnkX0,X and RnkY0,Y are known.

Then we may compute in timeO(1)usingO(|X|+|Y|)processors RnkX0,U, RnkY0,U, RnkU,X0 and RnkU,X0.

Proof:

First we compute RnkX0,Uand RnkY0,U. Then we compute RnkX,X0 and RnkY,Y0. Finally we compute RnkU,X0 and RnkU,Y0.

(41)

Computing the Ranks (Rnk

X0,U

)

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

LetX= (a1,a2,· · ·,ak).

Let w.l.o.g.a0=−∞andak+1= +∞.

Using a good samplerX we splitX0 intoX10,X20,· · ·,Xk0,Xk+10 . Note: RnkX0,X is known.

Splitting may be done in timeO(1)usingO(|X|)processors.

LetUi be the sequence of elements ofY which are betweenai−1 andai. Thus we get:

Programm: RnkX0,U

for alli where 16i 6k+1do in parallel for allx∈Xi0 do

rnk(x,U) =rnk(ai−1,U) +rnk(x,Ui) Running timeO(1)usingPk+1

i=1|Ui|processors.

(42)

Computing the Ranks (Rnk

X,X0

)

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Letai ∈X.

Leta0 minimal element inXi+10 .

The rank ofai inX0 is the same as the rank ofa0inX0. This rank is already known.

This may be computed in timeO(1)using one processor.

(43)

Computing the Ranks (Rnk

U,X0

)

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Note: RnkU,X0 consists of RnkX,X0 and RnkY,X0. RnkX,X0is alread known.

Still to compute: RnkY,X0.

RnkY,X may be computed using the previous lemma.

We compute rnk(a,X0)using rnk(a,X)and RnkX,X0.

Thus we compute RnkU,X0 withO(|U|)processors and timeO(1).

(44)

Computing the Ranks

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Consider the step

merge_with_help(J=Xi+1,K=Yi+1,L=merge(Xi,Yi):

Using the invariant we know: RnkJ,Xi and RnkK,Yi.

Using the above considerations we may compute: RnkL,J, RnkL,K, RnkJ,L

and RnkK,L.

Still to be computed: Rnkreduce(merge(Xi+1,Yi+1)),reduce(merge(Xi,Yi))

Known: RnkXi+1,merge(Xi,Yi)and RnkYi+1,merge(Xi,Yi). It is now easy to compute: RnkXi+1,reduce(merge(Xi,Yi)) and RnkYi+1,reduce(merge(Xi,Yi)).

Also easy to compute: Rnkmerge(Xi+1,Yi+1),reduce(merge(Xi,Yi)).

(45)

Algorithmn of Cole

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Theorem:

We may sortnvalues on a CREW PRAM usingO(n)processors in time O(logn)n.

Proof: discussed before.

Theorem:

We may sortnvalues on a EREW PRAM usingO(n)processors in time O(logn)n.

Proof: see literature.

Theorem:

There exists a sorting network withO(n)processors and depthO(logn).

Proof: see literature.

(46)

Literature

we have rnk(a,S)and RnkS1,S2and RnkS2,S1

Literatur:

A. Gibbons, W. Rytter:

Efficient Parallel Algorithms. Cambridge University Press 1990.

Chapter 5.

(47)

Questions

Explain the motivation behind parallel systems.

Explain the ideas of the different sorting algorithms.

Explain the different running times of these sorting algorihtms.

Explain the different efficiency of these sorting algorihtms.

Explain the idea of the algorithm of Cole.

Explain the running time of the algorithm of Cole.

Explain the number of processors used in the algorithm of Cole.

(48)

Legende

: Nicht relevant

: Grundlagen, die implizit genutzt werden : Idee des Beweises oder des Vorgehens : Struktur des Beweises oder des Vorgehens : Vollständiges Wissen

Referenzen

ÄHNLICHE DOKUMENTE

1:3 Systolic Arrays and Vector Computer 1/6 Walter Unger 30.1.2017 11:52 WS2016/17 Z.

1:3 Systolic Arrays and Vector Computer Walter Unger 30.1.2017 12:00 WS2016/17 Z..

Each processor uses Rng J,L and Rng K,L to know the area to write its output sequence.... Rng L,K to know the area to read its

The following algorithm then performs the routing: Packets with color i route from their sources to submesh Mi inside the corresponding column Ab.. In each submesh Mi : Each packet

Thus, these requests can be routed along disjoint paths in B i by our induction hypothesis, so that the Disjoint Path Lemma follows.. We have to show how to choose the subnetworks

The routing time needed by any greedy scheduling policy is at most C · D steps because each packet can be delayed at most for C − 1 steps on each edge on its routing path...

Gossip Introduction First Results Lines Trees Cycles HQ Hypercube CCC and BF Telephone-Mode Odd Number of Nodes Telegraph-Mode Lower Bound Summary Telegraph-Mode Edge Disjoint