Theory of Parallel and Distributed Systems (WS2016/17)

(1)

(WS2016/17)

Kapitel 6 Routing II

Walter Unger

Lehrstuhl für Informatik 1

12:01 Uhr, den 30. Januar 2017

(2)

6 Inhaltsverzeichnis Walter Unger 30.1.2017 12:01 WS2016/17 Z

Inhalt I

1 Introduction Situation The Model Lower Bound Proof Application

2 Consistent Hashing Introduction Statements

3 Chord Network Introduction Statements

4 Randomized Oblivious Routing Introduction

5 Path Selection

Path Selection on the Hypercube: Valiant’s Trick

Analyzing a random routing problem

6 Packet Scheduling for the Hypercube The algorithm

Proof

7 Packet scheduling for general networks The algorithm

Proof

(3)

Situation

Current Situation:

every permutation could be routed on permutation network and mesches number of steps proportional to the diameter algorithm wascentralized

needed global knowledge about the sources and destinations of all packets

We now want to deviselocal-controlalgorithms.

Each nodes decides on the next step by some local information.

(4)

6:2 The Model Walter Unger 30.1.2017 12:01 WS2016/17 Z

The Model

The network is modelled by a graphG = (V,E).

Arouting problemRonG is defined by a finite set of packets each of which comes with a source and a destination node.

We assume that time proceeds in synchronous steps:

Before the first step, each packet is placed at its source.

In each step, each edge can forward at most one packet in each direction.

The routing is completed as soon as all packets have reached their destination.

The number of stepsT taken by an algorithm to deliver all packets is referred to as routing time.

(5)

Oblivious routing

Here: algorithms that belong to the class ofoblivious routing algorithms:

the path of each packet depends only on the source and the destination of this packet but not on the sources and destinations of other packets.

One specifies a path systemW with a pathPu,v fromutov, for every possible source-destination pair(u,v)∈V²,u6=v.

Every packet with sourceuand destinationv is sent along the path Pu,v.

Example: bit-fixing paths on the hypercube

(6)

6:4 Lower Bound Walter Unger 30.1.2017 12:01 WS2016/17 Z

Lower Bound by Borodin and Hopcroft

Theorem

LetG= (V,E)be any graph and

W be any path system with a pathPu,v fromutov, for every (u,v)∈V²,u6=v.

Letndenote the number of nodes and∆the maximum degree ofG. There exists a permutationπ:V →V and an edgee^∗∈E such that at least

pn/(2∆²) = Ω(√ n/∆)

of the paths selected byπfromW contain the edgee^∗∈E.

This is very bad newsabout deterministic oblivious routing:

The time complexity for permutation routing under this paradigm is lower bounded byΩ(√

n/∆), which is polynomial inn.

Even a small diameter, say logarithmic inn, does not help.

(7)

Proof of the lower bound by Borodin and Hopcroft

Definition

Forv∈V, letWv ={Pv,u|u∈V}.

For a positive numbert, a nodev ∈V, and an edgee∈E, we say thate ist-popular forv if at leastt paths fromWv containe.

Outline of the proof:

First, we prove a lemma showing that, for any given nodev ∈V, there are “many” edges that are “quite popular” forv.

Then we use the lemma to show that there is an edgee^∗ that is “quite popular” for “many” nodes, that is,e^∗ist-popular fort different nodes, fort= Ω(√

n/∆).

Given this, we will be able to construct a permutationπsuch thattof the paths selected byπcontaine^∗, which proves the lower bound.

(8)

6:6 Proof Walter Unger 30.1.2017 12:01 WS2016/17 Z

Proof of the lower bound by Borodin and Hopcroft

Definition

Fort>0, we define a 0-1 matrixA(t):

The matrix hasnrows and|E|columns.

Forv∈V, ande∈E, define Av,e(t) =

1 ife ist-popular forv, and 0 otherwise,

Forv∈V, letAv(t) =P

e∈EAv,e(t)denote the row sum ofv. Fore∈E, letAe(t) =P

v∈VAv,e(t)denote the column sum ofe.

(9)

One Lemma for the Proof of the lower bound

Lemma

∀v ∈V andt6(n−1)/∆ :Av(t)> _2∆tⁿ . Proof of lemma:

LetQ⊆V be the set of nodes from which there is a path tov that contains only edges that aret-popular forv.

LetL=V −QandB=E∩(L×Q), that is,Bis the set of those edges connecting a node inLwith a node inQ.

It holds

|B| ·(t−1)>|L|because, for each nodeu∈L, the pathPv,uleads through at least one edge inB and these edges are nott-popular so that each of them can be contained in at mostt−1 paths fromWv.

|B|6∆|Q|as each node inQ has at most∆incident edges.

(10)

Proof of the lemma

|B| ·(t−1)>|L|and|B|6∆|Q|

Combining the two equations, we obtain

∆|Q|(t−1) > |L| = n− |Q| , which implies

∆|Q|t > n .

and, hence,

|Q| > n

∆t .

Next we will show|Q|62A_v(t)which completes the proof of the lemma as it implies

Av(t) > |Q|

2 ≥ n

2∆t .

(11)

Proof of the lemma

Show:|Q|62Av(t)

LetE⁰ denote the set of edges that aret-popular forv. To complete the proof of the lemma, we have to show|Q|62|E⁰|=2Av(t).

At first, we obersve thatt6(n−1)/∆implies thatE⁰6=∅.

This is because

v has at most∆incident edges, and Wv containsn−1 paths

such that at least one of the edges incident tov is contained in at least (n−1)/∆>z paths fromWv.

Therefore, there is at least one edge that ist-popular forv.

Given thatE⁰ is non-empty, each node inQ is incident to an edge inE⁰. Consequently,|Q|62|E⁰|as each of the edges inE⁰ is incident to at most two nodes fromQ.

(12)

Proof of the lower bound by Borodin and Hopcroft

Show:∃e^∗:e^∗ist-popular fortdifferent nodes, fort= Ω(√ n/∆).

Our next goal is to show that there exists an edgee^∗that ist-popular for t nodes wheret= Ω(√

n/∆).

We observe that X

e∈E

Ae(t) =X

e∈E

X

v∈V

Ae,v(t) =X

v∈V

X

e∈E

Ae,v(t) =X

v∈V

Av(t)> n² 2∆t , where the inequality follows from the lemma.

Because of the “pigeonhole principle”, there has to exist an edgee^∗∈E such that

Ae^∗(t)>

n²

|E| ·2∆t

>l n 2∆²t

m , where the last step follows from|E|6∆n.

(13)

Proof of the lower bound by Borodin and Hopcroft

Show:∃e^∗:e^∗ist-popular fortdifferent nodes, fort= Ω(√

n/∆). We haveA_e∗(t)>d ⁿ 2∆2te.

Next we chooset such thatAe^∗(t) = _2∆ⁿ2t, that is, we set t=√

n/(√ 2∆).

Observe thatt=√ n/(√

2∆)implies t6(n−1)/∆, for anyn>2, so that the assumption aboutt that we made in the lemma is satisfied.

For this choice oft, our analysis gives Ae^∗(t)>d ⁿ

2∆²√ n/(√

2∆)e=dⁿ

√2∆

2∆²√ ne=d

√n

√ 2∆e.

Ae^∗(t)>dte, that is,

e^∗isdte-popular fordtenodes, wheret=√ n/(√

2∆).

(14)

Proof of the lower bound by Borodin and Hopcroft

Cconstruct a permutationπsuch thattof the paths selected byπcontaine^∗.

Finally, we construct a permutationπsuch thatdteof the paths selected byπ containe^∗:

LetV⁰ denote a set ofdtenodes for whiche^∗isdte-popular.

W.l.o.g.,V⁰={1, . . . ,dte}.

For everyv ∈V⁰, there exists a subsetUv ⊆V of cardinalitydtesuch that, for everyu∈Uv, the pathPv,ucontainse^∗.

Forv=1 todte, setπ(v) =uwhereuis chosen arbitrarily from Uv\ {π(1), . . . , π(v−1)}.

Forv=dte+1 ton, setπ(v) =u whereuis chosen arbitrarily from V \ {π(1), . . . , π(v−1)}.

By our construction,πande^∗satisfy the properties described in the theorem.

(15)

Application to the hypercube and Goal

For thed-dimensional hypercube withn=2^d nodes, the lower bound of Borodin and Hopcroft implies a lower bound ofΩ(√

n/logn)for permutation routing.

There is a permutationπsuch thatΩ(√

n)paths contain the same edge when using bit-fixing paths on the hypercube.

Consequently, when using bit-fixing paths the time complexity for permutation routing isΩ(√

n).

Our goal is to devise a distributed permutation routing algorithm with time complexityO(logn).

This will take some time.

(16)

6:14 Introduction Walter Unger 30.1.2017 12:01 WS2016/17 Z

Outline of the approach

We build a dynamic system of storage devices supporting the addition and removal of storage devices using dynamic hashing:

devices are mapped i.u.r.¹to the ring[0,1), that is, each devicei gets assigned a random adressa(i)∈[0,1)

data objects are mapped to the ring using a random hash function h:U→[0,1), that is, objectx is mapped to positionh(x) data objectx is stored on the device found next toh(x)in clock-wise direction on the ring

We assume an idealistic hash function, that is, the hash values are real numbers chosen i.u.r. from[0,1).

1independently, uniformly at random

(17)

Definition of successors

LetV be the set of storage devices at some point of time, and letn=|V|.

For addressA∈[0,1), define succ(A) =

argmin{a(i)>A|i ∈V} if∃i ∈V :a(i)∈[A,1), argmin{a(i)>0|i ∈V} otherwise.

pred(A) =

argmax{a(i)<A|i ∈V} if∃i ∈V :a(i)∈[0,A), argmax{a(i)<1|i ∈V} else.

Objectx ∈Uis mapped to devicesucc(h(x)).

(18)

Quality of the load balancing

The quality of the load balancing depends on the distribution of the sizes of the ring for which the storage devices are responsible.

Definition (weight of a device)

For devicei∈V, define the weight of devicei by Wi =

a(i)−a(pred(a(i)) ifa(pred(a(i))<a(i), 1−(a(pred(a(i))−a(i)) otherwise.

LetW =max_i∈[n]Wi.

Ideally, we would haveW =W0=W1=. . .=Wn−1=¹_n. We will show thatW =O(^log_nⁿ), w.h.p.²

2The term “w.h.p.” abbreviates “with high probability” and means with probability at least 1−n^−α, for any constantα >0.

(19)

Quality of the load balancing

Lemma

LetT ⊆[0,1)andt=|T|the mass (length) ofT. Suppose thatM points are chosen i.u.r. from[0,1). The probability that none of these points is fromT is at moste^−tM.

Proof:

Pr[no point inT] = (1−t)^M = ((1−t)^1/t)^tM ≤ e^−tM as, for everyx >0, it holds(1−¹_x)^x 6 ¹e.

(20)

6:18 Statements Walter Unger 30.1.2017 12:01 WS2016/17 Z

Quality of the load balancing

Pr[no point inT]6e^−tM.

Theorem

W =O(^log_nⁿ), w.h.p.

Proof:

Fixj∈V. Supposej’s addressa(j)is fixed arbitrarily.

A necessary condition for the eventWj>t,t∈[0,1), is that no addresses of the othern−1 devices falls into the interval froma(j)−t toa(j).

for anyα >0, Pr

Wj>2(α+1)lnn n

≤ e^−2(α+1)^lnⁿⁿ⁽ⁿ⁻¹⁾

≤ e^−(α+1)lnⁿ=n^−(α+1) and, hence,

Pr

W >2(α+1)lnn n

≤ X

j∈V

Pr

Wj>2(α+1)lnn n

≤ n^−α.

(21)

Improved quality of the load balancing

We haveW=O(^log_nⁿ), w.h.p.

In order to improve the load balancing, we usek virtual nodes for each device. LetV⁰denote the set ofkn“virtual” nodes.

Each of these nodes gets an address from [0,1) chosen i.u.r.

For addressA∈[0,1), re-define succ(A) =

argmin{a(i)>A|i ∈V⁰} if∃i ∈V⁰:a(i)∈[A,1), argmin{a(i)>0|i ∈V⁰} otherwise.

Objectx ∈Uis mapped to nodesucc(h(x))and stored on the device to which this node belongs.

LetWi denote the weight of devicei, i.e., the sum of the lengths of the intervals corresponding toi’s nodes, andW =maxi∈[n]Wi.

(22)

Improved quality of the load balancing

Theorem

For anyk>1,W = ¹_n·O(1+^log_kⁿ), w.h.p.

Corollary

Ifk>lognthenW =O(¹_n), w.h.p.

Proof of the Theorem:

Consider devicej and suppose the address of theknodes of this device are fixed arbitrarily.

For anyt∈[0,1), we want to upper-boundPr[Wj>t].

(23)

Improved quality of the load balancing

Exact condition:

The eventWj>thappensif and only ifthere arek intervals left of thek addresses ofj’s nodes so that

these intervals have a total length oft, and

none of the otherk(n−1)nodes have an address that falls into these intervals.

In order to be able to enumerate all possibilities for choosing thesek intervals, we look at a slightly strongernecessary conditionfor the event Wj>t.

(24)

Improved quality of the load balancing

Exact condition:

The eventWj>thappensif and only ifthere arek intervals left of thek addresses ofj’s nodes so that

these intervals have a total length oft, and

Necessary condition:

If the eventWj>thappens then there arekintervals left of thekaddresses of j’s nodes so that

the length of each of these intervals is a multiple of _kn¹

these intervals have a total length oft⁰wheret⁰ is the largest multiple of

1

kn such thatt⁰6t−¹_n, and

(25)

Improved quality of the load balancing

The number of possibilities to choose these intervals corresponds to the number of possibilities to choosekintegersq1, . . . ,qksuch that Pk

i=1qi =q, forq=t⁰kn.

Theqi’s can be encoded bijectively into binary strings withk−1 ones andqzeros.

Thus, the number of possibilities to choose theqi’s and, hence, the intervals is at most

q+k−1 k−1

!

≤ q+k k

!

≤

e(q+k) k

k

.

Nowq+k=t⁰kn+k6(t−¹_n)kn+k=tkn, so that this number is at most

etkn k

k

= (etn)^k .

(26)

Improved quality of the load balancing

Once the intervals are fixed, the probability that these intervals with a total length oft⁰>t−²_n are not hit by one otherk(n−1)addresses is at most

e^−t⁰^k(n−1)≤e⁻(^t−²_n)^k(n−1) which follows analogously to the lemma on slide 17.

This gives

Pr[Wi >t] ≤ (etn)^k·e⁻(^t−²_n)^k(n−1) . Now chooset=^β_n, where the value forβwill be specified later.

This giveetn=eβand

t−2 n

(n−1)>

t−2

n n

2 = β 2 −1 assumingn>2.

(27)

Improved quality of the load balancing

Consequently, Pr

Wi > β

n

≤

eβ·e^−β/2+1k

= eβ·e^−β/2+1· 4

3 β!k

3 4

βk

.

Now observe thateβ·e^−β/2+1· ⁴₃β

decreases exponentially inβ since e¹² > ⁴3. Forβ>25, this term is less than 1. Consequently,

Pr

Wi > β n

≤ 3

4 βk

≤ 3

4

(α+1)log_4/3n

= n^−(α+1),

forβ> ^(α+1)^log_k ^4/3ⁿ. It followsPr

W > ^βn

6n^−α, forβ=O(1+^log_kⁿ), which proves the theorem.

(28)

Overlay network

Now we connect the nodes from the consistent hashing scheme by an overlay network called Chord running on top of the Internet.

Each node holds a so-called finger table, i.e., a table with the IP addresses of only a few other nodes.

We say that nodev has a link to nodeu ifu’s IP address is stored in the finger table ofv.³

The Chord network allows that devices enter and leave the system dynamically and supports the efficient search for data objects.

3For practical purposes it might be usefull that nodes do not only store addresses of outgoing but also of incoming links.

(29)

Definition of the Chord edges

LetV denote the set of (virtual) nodes at some point of time.

The set of links (directed edges) is defined by E = {(v,succ(a(v) +2⁻ⁱ)

| {z }

=:e(v,i)

|v ∈V,i ∈N} .

The parameteri is called the index of the link.

Observe that the set of links is finite. Forv ∈V, letd(v)denote the smallest integer such that

∀i ∈N,i>d(v) :e(v,i) = (v,succ(a(v)) . The outdegree ofv is at mostd(v). LetD=max{d(v)|v∈V}.

(30)

Upper-bounding the outdegree

Theorem

D=O(logn), w.hp., wheren=|V|.

Proof:

Consider any nodev ∈V.

Let`(v)denote the length of the interval (ring segment) froma(v)to a(succ(a(v))).

All edgese(v,i)withi >d(v)point tosucc(a(v)).

In particular, it holds 2^−d(v)6`(v), which gives d(v) =

log

1

`(v)

.

(31)

Upper-bounding the outdegree

Fix the adress ofa(v)on the ring arbitrarily.

For anyβ∈[0,1],

Pr[`(v)6β] ≤ (n−1)β ≤ nβ

if at least one ot the othern−1 nodes falls into the interval

[a(v),a(v) +β)which, for each of these nodes, happens with probability β.

(32)

Upper-bounding the outdegree

Now letα >0 be chosen arbitrarily. We obtain Pr[d(v)>(α+3)logn] ≤ Pr

log

1

`(v)

>(α+3)logn

≤ Pr

log 1

`(v)

>(α+2)logn

= Pr

log 1

`(v)

>(α+2)logn

≤ Prh

`(v)6n^−(α+2)i

≤ n·n^−(α+2) ≤ n^−(α+1) . Hence, the probability that there exists a nodev ∈V for which d(v)>(α+3)lognis at mostn^−α.

(33)

Routing in Chord

Suppose a nodev (or the device corresponding tov) wants to access a data objectx.

The object can be found found by applying the following routing algorithm:

First,v checks wethersucc(h(x)) =v. If yes, then stop.

Otherwise,v sends a message along the outgoing link with smallest index such that the link does not overleaph(x)on the ring[0,1).

The node receiving the message continues the routing in the same fashion recursively until the node holdingx is found.

The number of hops needed for finding an object is at mostD and, thus, O(logn), w.h.p., because the index of the outgoing links is increasing with every hop on the routing path.

(34)

Oblivious Randomized Routing

Definition

One specifies a path systemWcontaining a set of pathsWu,v fromutov together with a probability distributionDu,v :Wu,v →[0,1],

for every possible source-destination pair(u,v)∈V².

For each packet with sourceuand destinationv one chooses a path P∈Wu,v

independently at random with probabilityDu,v(P)and forwards the packet alongPto its destination.

Example:

For any two nodesu,v ∈V,u6=v, one specifies two alternative paths, that is,|Wu,v|=2.

LetDu,v denote the uniform distribution onWu,v.

When sending a packet fromutov chooseP∈Wu,v with probability Du,v(P) =1/2.

(35)

Packet Scheduling Problem and Scheduling Policies

Definition (Packet Scheduling Problem) Input: collection of pathsP, one for each packet.

Task: One needs to specify which packet should be forwarded along which edge in which time step.

We will address the packet scheduling problem by describing ascheduling policy

specifying which packet can go first and

which packets have to wait if two or more packets are contending for the same edge.

Examples:

FCFS

(first-come-first-serve) FTG (Farthest-to-go) Random Rank (as defined later)

A scheduling policy is called greedyif a packetphas to wait in a stept before using the next edgee on its path only because there is another packetp⁰ usinge in this step.

We say thatp isdelayedbyp⁰ at edgee in time stept.

(36)

Congestion and Dilation

Definition (Dilation)

The dilationD of a path collectionP is the length (number of edges) on the longest path inP.

Definition (Congestion)

The congestionC of a path collectionPis the maximum number of paths fromPthat share the same edge (in the same direction).

In the following, we assume that every undirected edge is replaced by two edges in opposite direction.

For a (directed) edgee∈E,C(e)denotes the number of paths fromP that containe.

The congestion is thus defined byC=maxe∈EC(e).

(37)

Trivial bounds on the routing time

Observation (Lower Bound)

The routing time needed by any scheduling policy is at least max{C,D}= Ω(C+D)because

there is a packet which has path lengthDand thus needs at leastD steps to reach its destination, and

there is an edge that needs to forward at leastC packets which takes at leastC steps.

Observation (Upper Bound)

The routing time needed by any greedy scheduling policy is at mostC·D steps because each packet can be delayed at most forC−1steps on each edge on its routing path.

(38)

6:36 Path Selection on the Hypercube: Valiant’s Trick Walter Unger 30.1.2017 12:01 WS2016/17 Z

Valiant’s trick

We study permutation routing on thed-dimensional hypercube with n=2^d nodes.

For each packetp, with source nodespand destinationdpwe pick a node vpindependently, uniformly at randomfromV.

The packet is routed first along bit-fixing paths fromsp tovp. Then the packet is routed along bit-fixing paths fromvp todp. The nodevp is thus used asintermediate destination.

(39)

Valiant’s trick

To simplify our analysis, we assume that Valiant’s routing algorithm proceeds into two phases:

Phase 1:All packets are routed from their source nodes to their intermediate destinations.

Phase 2:All packets are routed from their intermediate destinations to their final destinations.

Valiant’s trick reduces a “worst-case permutation routing problem” to two

“random routing problems”:

One with randomly picked destination nodes (phase 1) and one with randomly picked source nodes (phase 2).

Observe that Valiant’s trick follows the paradigm of randomized oblivious routing.

(40)

6:38 Analyzing a random routing problem Walter Unger 30.1.2017 12:01 WS2016/17 Z

Analyzing a random routing problem

In the following, we present an analysis of phase 1.

The same result can be shown for phase 2.

Lemma

The congestionC in phase 1 (phase 2) isO(logn/log logn), w.h.p.

(41)

Proof of the lemma

Letebe an edge of dimensioni, i.e., an edge that flips thei-th bit.

LetIN(e)be the set of nodes from whiche is reachable by a bit-fixing path. It holds|IN(e)|=2^d−i⁻¹.

LetOUT(e)be the set of nodes that are reachable fromeby a bit-fixing path. It holds|OUT(e)|=2ⁱ.

Fix any node inIN(e). The path of the packet starting atv containse if the packet’s intermediate destination is inOUT(e).

As intermediate destinations are picked uniformly at random Pr[v’s packet traversese] = |OUT(e)|

n = 2ⁱ

2^d = 2^i−d.

(42)

Proof of the lemma

For a subsetX ⊆IN(e), letA(X,e)denote the event that the paths of all packets starting fromX containe.

LetC(e)be a random variable describing the congestion at edgee, i.e., C(e)is the number of paths containinge.

Letkbe any natural number.

Pr[C(e)>k] = Pr[∃X ⊆IN(e),|X|=k:A(X,e)]

(Union Bound)

≤ X

X⊆IN(e),|X|=k

Pr[A(X,e)]

= X

X⊆IN(e),|X|=k

2^i−dk

= |IN(e)|

k

!

2^i−dk

.

(43)

Proof of lemma

Binomial coefficients can be estimated by a

b b

6 a

b

!

6e·a b

b

,

wheree=2.71. . .is the Eulerian number.

This gives

Pr[C(e)>k] ≤

e|IN(e)|

k k

2^i−dk

=

e2^d−i−1 k

k

2ⁱ^−dk

= e

2k k

.

(44)

Proof of lemma

The congestion is defined to beC=max{C(e)|e∈E}.

Pr[C >k] = Pr[∃e∈E :C(e)>k]

≤ X

e∈E

Pr[C(e)>k]

≤ |E| e 2k

k

≤ n² 1

2 k

.

The last bound follows from|E|6dn6n²and _2k^e 6 ¹₂, where we assume k>3.

Now we chooseksuch thatPr[C >k]6n^−α, for constantα >0.

In particular, we setk=d(α+2)logne>3 which gives Pr[C>k] ≤ n²2^−(α+2)^logⁿ ≤ n²n^−(α+2) = n^−α, which showsC =O(logn), w.h.p.

(45)

Proof of lemma

We have shownC=O(logn), w.h.p., which is slightly weaker than the bound in the lemma.

In order to showC=O(logn/log logn), w.h.p.,we need to choosekin a more clever way.

We set

k=max e

2

√

d,2(α+2) d logd

=O

logn log logn

which gives

Pr[C>k] ≤ n² e 2k

k

≤ n² 1

√d k

≤ n² 1

√d

_log²_d!(α+2)d

= n² 1

2 (α+2)d

= n²·n^−(α+2) = n^−α .

(46)

Congestion of h-relations

Definition (h-to-h-routing problem)

Anh-relation is a routing problem in which every node is the source ofh packets and the destination ofhpackets.

Observe that a “1-relation” is a “permutation routing problem”.

Lemma

Suppose we use Valiant’s trick for routing an arbitraryh-relation on the hypercube. The congestionC isO(logn+h), w.h.p.

Proof: Exercise

(47)

Scheduling on the hypercube

We study the problem of forwarding packets along prespecified paths on the d-dimensional hypercube.

Theorem

Suppose we are given a set of packets each of which coming with a bit-fixing path along which it should be sent from its source to its destination.

LetC denote the congestion of the paths.

There is a distributed, randomized scheduling protocol that delivers all packets in timeO(C+logn), w.h.p.

Combining this result with Valiant’s trick gives:

Corollary

There is a distributed algorithm that routes anyh-relation in timeO(h+logn), w.h.p., on the hypercube.

(48)

6:46 The algorithm Walter Unger 30.1.2017 12:01 WS2016/17 Z

Randomized scheduling policy

The random rank protocol:

LetRdenote a sufficiently large integer whose value will be specified later.

Every packetpis assigned independently and uniformly at random arank r(p)∈[R].

Besides every packet is assumed to have a unique integer ident number (id).

If two or more packets contend for the same edge in a step, then the one with smallest rank is forwarded and the others have to wait.

In case of equal ranks, packets with smaller ids are preferred.

(49)

Delay sequence analysis

Our analysis uses the following witness structure.

Definition (delay sequence)

A delay sequenceDS of lengthsconsists of

1 a delay pathP= (e(1), . . . ,e(L)), 16L6d, with edges of increasing dimension (like a bit-fixing path in reverse order)

2 s numbers`1, ..., `s∈ {1, . . . ,L}with`16`26· · ·6`s;

3 s+1 distinct delay packetsp0,p1, . . . ,ps such that, for 16i6s, edge e(`i)is contained in the paths of packetpi−1 and packetpi;

4 s+1 numbers k0,k1, . . . ,ks∈[R]withk0>k1>· · ·>ks. Definition (active delay sequence)

DSis called active ifr(pi) =ki, for 06i 6s.

(50)

Delay sequence analysis

Lemma

If the random rank protocol needsT >d steps, then there exists an activeDS of length at leastT−d.

Proof:

Consider any packet packet arriving at its destination in stepT. As T >d, this packet must have been delayed for at least one step. We call this packetp0.

We follow the path ofp0 backwards from its destination until we reach an edge where it has been delayed by a packet that we callp1.

Now we followp1backwards through time until we reach a time step where this packet has been delayed before by another packet that we call p2 (possibly at the same edge).

Next we follow packetp2 and so on until we reach a packetps,s>1, that was not delayed before. We follow this packet back to its source.

Our tour backward through time coversT steps and we observeds delays. LetLdenote the number of edges on the recorded path.

(51)

Delay sequence analysis

From this tour backwards through time, we can now construct an active DS as follows.

1 The path that we have recorded by this process in reverse order gives us the delay pathP= (e(1), . . . ,e(L)).

2 The packetsp0, . . . ,psare defined to be the delay packets. By our construction, these packets are distinct.

3 For 16i 6s, we choose`i ∈ {1, . . . ,L}soe(`i)is the edge on whichpi−1was delayed bypi.

4 Observe that the path ofpi−1 and the path of packetpi traverse edgee(`i), and`16`26· · ·6`s.

5 For 06i 6s, we setki=r(pi). Observe that this gives

k0≥k1>· · ·>ks as packetpi−1 is delayed by packetpi and the protocol prefers packets with smaller rank.

This ends the proof of the lemma.

(52)

Delay sequence analysis

Now we bound the probability that there exists an activeDS. Our analysis begins with counting delay sequences.

Lemma

The number of delay sequences of lengths is at most

n²· L−1+s s

!

·C^s+1· R+s s+1

! .

Proof:

1) Counting delay paths:

The number of ways to choose a delay path isn(n−1)6n² as this path corresponds to a bit-fixing path (in reverse order) that is determined by specifying the first and the last node on the path.

(53)

Delay sequence analysis

The number of delay sequences of lengthsis at mostn²·_L−1+s s

·C^s+1·_R+s s+1

.

2) Counting the ways to choose the`i’s and thek_i⁰s:

How many ways are there to choose the integers`1, . . . , `ssuch that 16`16`26· · ·6`s6d?

These integers can be encoded into a binary string as follows 0^`¹⁻¹10^`²^−`¹10^`³^−`²1. . .10^`^s^−`^s−110^d−`^s .

Observe that this string containss ones and the number of zeros in this string is

`1−1+

s

X

i=2

(`i−`i−1)

!

+d−`s=d−1 .

(54)

Delay sequence analysis

·C^s+1·_R+s s+1

.

Consequently, there is a one-to-one mapping between the`i’s and the binary strings withd−1 zeros andsones. Hence, the number of ways to choose the`i’s corresponds to the number of such strings which is

d−1+s s

! .

Analogously the number of ways to choosek0, ...,ks∈[R]such that k0>k1>· · ·>ks is equal to the number of binary strings consisting of R−1 zeroes ands+1 ones, which is

R+s s+1

! .

(55)

Delay sequence analysis

·C^s+1·_R+s s+1

.

3) Counting the ways to choose delay packets:

Now suppose that the delay pathP and the`i’s are fixed.

Then, for each delay packet, we know an edge that is contained in its path: In particular, we know that packetpi, for 16i6s, uses edgee(`i) and packetp0 uses edgee(`1).

How may possibilities are there to choose a packet whose path is leading through a known edge? – At mostC since each edge is contained in the paths of at mostC packets.

Hence, there are at mostC possibilities to choosepi and, hence, at most C^s+1 possibilities to choose all delay packetsp0, . . . ,ps.

(56)

Delay sequence analysis

·C^s+1·_R+s s+1

.

Lemma

The probability that a givenDSof lengths is active isR^−(s+1). Proof:

For every delay packetpi, the probability that the packet’s rank iski is 1/R because ranks are chosen uniformly at random from[R].

Thus, the probability that alls+1 delay packets have the prescribed rank isR^−(s+1)because the ranks of different packets are chosen

independently.

(57)

Delay sequence analysis

By the first Lemma, if the algorithm needsT >d+s steps, then there exists an active delay sequence of length at leasts.

Cutting this sequence after packetps gives an active delay sequence of length exactlys.

LetDS(s)denote the set of delay sequences of lengths. It holds Pr[T >d+s] ≤ Pr[∃DS∈ DS(s) :DSis active]

≤ X

DS∈DS(s)

Pr[DSis active]

(third Lemma)

= X

DS∈DS(s)

R^−(s+1)

(second Lemma)

≤ n²· d−1+s s

!

·C^s+1· R+s s+1

!

·R^−(s+1).

(58)

Delay sequence analysis

We have so far:

Pr[T >d+s]6n²· d−1+s s

!

·C^s+1· R+s s+1

!

·R^−(s+1)

Using _b^a

62âand â_b 6 êa_bb

to upper-bound the binomial coefficients, we obtain

Pr[T >d+s] ≤ n²·2^d−1+s·C^s+1·

e(R+s) s+1

s+1

·R^−(s+1)

≤ n³·

2eC(R+s) (s+1)R

s+1

.

ChoosingR>s yieldsR+s62R and, hence, Pr[T >d+s] ≤ n³·

4eC s+1

s+1

.

(59)

Delay sequence analysis

We have so far:

Pr[T >d+s]6n³· 4eC

s+1 s+1

Let us chooses=dmax{8eC,(α+3)logn}e −1=O(C+logn).

This gives

Pr[T >d+s] ≤ n³· 1

2 s+1

≤ n³· 1

2

(α+3)logn

≤ n^−α.

Hence, with probability at least 1−n^−α, the random rank protocol needs at mostd+s−1=O(C+logn)steps.

This end the proof of the theorem.

(60)

6:58 The algorithm Walter Unger 30.1.2017 12:01 WS2016/17 Z

Scheduling along shortest paths in general networks

Consider anyn-node networkG = (V,E).⁴We study the problem of forwarding packets along arbitrary shortest paths inG.

Theorem

Suppose we are given a set ofN>npackets each of which coming with a shortest path inG along which it should be sent from its source to its destination.

LetC andD denote the congestion and the dilation of the paths, respectively.

There is a distributed, randomized scheduling protocol that delivers all packets in timeO(C+D+logN), w.h.p.

4Recall that edges are assumed to be directed. In order to represent an undirected network, one replaces each edge by two directed edges in oposite direction.

(61)

Randomized scheduling policy with increasing ranks

The growing rank protocol:

LetRdenote a sufficiently large integer being a multiple ofD.

Every packetpis assigned independently and uniformly at random a rank r(p)∈[R].

Whenever the packet moves along an edge, its rank is increased by the valueR/D.

Besides every packet is assumed to have a unique integer ident number (id).

If two or more packets contend for the same edge in a step, then the one with smallest rank is forwarded and the others have to wait.

In case of equal ranks, packets with smaller ids are preferred.

(62)

Randomized scheduling policy

Observation

As the initial rank is at mostR−1 and the rank of a packet is increased at mostD times byR/D, the final rank of a packet is at most 2R−1.

Letre(p)∈[2R]denote the rank of packetpin those time steps in whichp contends for being forwarded along edgee.

(63)

Delay sequence analysis

We adapt the definition of a delay sequence as follows.

Definition (delay sequence)

A delay sequenceDS of lengthsconsists of

1 a delay pathP= (e(1), . . . ,e(L)), forL62D, with edges in reverse direction, that is,(e(L), . . . ,e(1))is a path inG;

2 s numbers`1, ..., `s∈ {1, . . . ,L}with`16`26· · ·6`s;

3 s+1 distinct delay packetsp0,p1, . . . ,ps such that, for 16i6s, edge e(`i)is contained in the paths of packetpi−1 and packetpi;

4 s+1 numbers k0,k1, . . . ,ks∈[2R]withk0>k1≥ · · ·>ks. Definition (active delay sequence)

DSis active ifre(`_i)(pi) =ki, for 16i6s, andre(`₁)(p0) =k0.

(64)

Delay sequence analysis

Lemma

If the growing rank protocol needsT >2Dsteps, then there exists an active DSof length at leastT −2D.

Proof:

Consider any packet packet arriving at its destination in stepT. AsT >2D, this packet must have been delayed for at least one step. We call this packetp0. We follow the path ofp0backwards through time from its destination until we reach an edge where it has been delayed by a packet that we callp1.

Now we followp1backwards through time until we reach a time step where this packet has been delayed before by another packet that we callp2, and so on ...

... until we reach a packetps, for somes>1, that was not delayed before. We follow this packet back to its source.

(65)

Delay sequence analysis

From this tour backwards through time, we can now construct an activeDSas follows.

The path that we have recorded by this process in reverse order gives us the delay pathP= (e(1), . . . ,e(L)).

The packetsp0, . . . ,ps are defined to be the delay packets.

For 16i6s, we choose`i ∈ {1, . . . ,L}so thate(`i)is the edge on whichpi−1 was delayed bypi.

For 16i6s, we setki =re(`_i)(pi), andk0=re(`1)(p0).

Exercise:Show that the packetsp0, . . . ,psare distinct, that is, no packet appears more than once in the delay sequence.⁵

5This is the only part of the analysis where we need to assume that the paths of the packets are shortest paths inG.

(66)

Delay sequence analysis

Observe thatk0>k1>· · ·>ks as the ranks of the delay packets do not increase on our tour. More specifically:

whenever we switch from packetpi to packetpi+1 on our tour, the rank ofpi+1is not larger than the rank of packetpi becausepi+1delayspi and the protocol prefers packets with smaller rank, and

whenever we add an edge to the delay path and follow this edge, the rank of the currently observed packet is decreased (byR/d) as we follow the packet backwards in time.

(67)

Delay sequence analysis

It only remains to proveL62Dands>T −2D.

The final rank ofp0 is at most 2R−1.

During our tour backwards through time, the sequence of observed ranks is not increasing.

In particular, whenever we add an edge to the delay path, the rank of the packet that we follow is decreased byR/D.

Hence, the rank of packetpsat its source is at most 2R−1−L(R/D).

As ranks are non-negative, we obtain(2R−1)−L(R/D)>0 which gives L6(2R−1)/(R/D)62D.

Finally,T =L+s impliess=T −L>T−2D.

(68)

Delay sequence analysis

Now we bound the probability that there exists an activeDS. Our analysis begins with counting delay sequences.

Lemma

The number of delay sequences of lengths is at most 2D−1+s

s

! 2R+s s+1

! N C^s.

Proof:

Analogously to the analysis for the hypercube the number of ways to choose the`i’s and thek_i⁰ can be bounded by

2D−1+s s

! 2R+s s+1

! .

(69)

Delay sequence analysis

Now we assume that the`i’s are fixed and we count the number of ways to choose the delay packets and the edges on the delay path:

There areN possibilities to choose packetp0.

Oncep0is fixed, we can construct the first part of the delay path from edgee(1)up edgee(`1)by following the path ofp0 backwards from its destination.

Now, as the path ofp1 containse(`1), there are at mostC possibilities to choosep1.

Oncep1is fixed, we can determine the delay path up toe(`2).

As the path ofp2containse(`2), there are again at mostC possibilities to choosep2, and so on until packetps.

Thus, the number of possibilities to to choose the delay packets and to construct the delay path is at mostNC^s.

(70)

Delay sequence analysis

Lemma

The probability that aDS of lengths is active is at mostR^−(s+1). Proof:

Supposee(`i)is thejth edge on the path of packetpi.

The rank at edgee(`i)is equal toki if its initial rank is equal to k_i⁰=ki−(j−1)·R/D, which happens with probability 1/R ifk_i⁰∈[R], and probability 0, otherwise.

That is, the probability that the rank ofpi at edgee(`i)is equal toki is at most 1/R.

Consequently, the probability that alls+1 delay packets have the prescribed rank is at mostR^−(s+1).

(71)

Delay sequence analysis

Now we proceed analogously to the analysis for the hypercube.

Pr[T >2D+s] ≤ Pr[∃DS∈ DS(s) :DS is active]

≤ X

DS∈DS(s)

Pr[DSis active]

≤ 2D−1+s s

! 2R+s s+1

!

N C^sR^−(s+1)

≤ 2^2D−1+s

e(2R+s) s+1

s+1

N C^sR^−(s+1)

≤ 2^2D 6Ce

s+1 s+1

N , where the last equation assumesR>s.

(72)

Delay sequence analysis

Finally, we sets=dmax{12eC,(α+1)logN+2D}e−1=O(C+D+logN).

This gives

Pr[T >2D+s] ≤ 2^2DN 1

2 s+1

≤ 2^2DN 1

2

(α+1)logN+2D

≤ N^−α ≤ n^−α usingn6N.

Hence, with probability at least 1−n^−α, the growing rank protocol needs at most 2D+s−1=O(C+D+logN)steps.

(73)

Literature

B. Vöcking. Theory of Distributed Systems, Lecture Summer 2012 D. Peleg. Distributed Computing: A Locality-Sensitive Approach, Society for Industrial and Applied Mathematics (SIAM), 2000

H. Attiya, J. Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics, John Wiley and Sons, 2004

F. T. Leighton. Introduction to Parallel Algorithms and Architectures:

Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, 1991

J. Kleinberg, E. Tardos.Algorithm Design, Addison Wesley Pearson, 2005 J.F. Kurose, K.W. Ross. Computer Networking: A Top-Down Approach Featuring the Internet, Addison Wesley Longman, 1999

N. Nisan, T. Roughgarden, E. Tardos, V. Vazirani. Algorithmic Game Theory, Cambridge University Press, 2007

(74)

7 Inhaltsverzeichnis Walter Unger 30.1.2017 12:01 WS2016/17 Z

Legende

: Nicht relevant

: Grundlagen, die implizit genutzt werden : Idee des Beweises oder des Vorgehens : Struktur des Beweises oder des Vorgehens : Vollständiges Wissen