• Keine Ergebnisse gefunden

Data Management

N/A
N/A
Protected

Academic year: 2021

Aktie "Data Management"

Copied!
55
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Wolf-Tilo Balke Sascha Tönnies

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Peer-to-Peer

Data Management

(2)

9.1 Review 9.2 Pastry

9.3 Symphony 9.4 Viceroy

9.5 CAN

9.6 Summary

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

9. Distributed Hash Table Algorithms

(3)

Objects need unique key

Key is hashed to integer value

Huge key space, e.g. 2128

Key space partitioned

Each peer gets its key range

DHT Goals

Efficient routing to the responsible peer Efficient routing table maintenance

9.1 DHT Basics

„Purple Rain“

Hash-funktion (e.g. SHA-1) 2313

3485 - 610

1622 - 2010 611 -

709

2011 - 2206

2207- 2905

(3485 - 610) 2906 -

3484 1008 -

1621

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(4)

9.1 DHT Design Space

• Minimal routing table

Peer state O(1), Avg. path length O(n)

Brittle network

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

3485 - 610

1622 - 2010 611 -

709

2011 - 2206

2207- 2905

2906 - 3484 1008 -

1621

3485 - 610

1622 - 2010 611 -

709

2011 - 2206

2207- 2905

2906 - 3484 1008 -

1621

• Maximal routing table

Peer state O(n), Path length O(1)

Very inefficient routing table maintenance

(5)

9.1 DHT Routing Tables

• Usual routing table

Peer state O(log n), Path length O(log n)

Compromise between routing efficiency and maintenance efficiency

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

3485 - 610

1622 - 2010 611 -

709

2011 - 2206

2207- 2905

2906 - 3484 1008 -

1621

(6)

9.2 Pastry Basics

128 bit circular id space

Routing table elements

Leaf set: Key space proximity

Routing table: long distance links Neighborhood set: network

proximity

Basic routing

If (target key in key space proximity)

Use direct leaf set link

else

Use link from routing table

to resolve next digit of target key

nodeIds

(7)

9.2 Pastry: Leaf sets

• Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively.

routing efficiency/robustness fault detection (keep-alive) application-specific local

coordination

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(8)

L nodes in leaf set

log2b N Rows

(actually log2b 2128= 128/b)

2b columns

L network neighbors

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

9.2 Pastry: Routing table

(9)

9.2 Pastry: Routing

• log2b N steps

• O(log N) state

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

d46a1c

Route(d46a1c)

d462ba d4213f

d13da3

65a1fc

d467c4 d471f1

(10)

9.2 Pastry: Routing procedure

If (destination is within range of our leaf set) forward to numerically closest member else

let l = length of shared prefix

let d = value of l-th digit in D’s address if (Rld exists)

forward to Rld else

forward to a known node* that (a) shares at least as long a prefix

(b) is numerically closer than this node

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

*from LeafSet, RoutingTable, or NetworkNeigbors

(11)

9.2 Pastry: Routing Properties

O(log N) routing table size

2b * log2b N + 2l

O(log N) message forwarding steps

Network stability:

guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds

Number of routing hops:

No failures: < log2b N average, 128/b + 1 max

During failure recovery O(N) worst case, average case much better

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(12)

9.2 Pastry: Node addition

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

X=d46a1c

Route(d46a1c)

d462ba d4213f

d13da3

A = 65a1fc

Z=d467c4 d471f1

New node: X=d46a1c

(13)

9.2 Routing table maintenance

Leaf set

Copy from neighbor

Extend by sending request to right/left boundary leaf link

Routing table

Collect routing tables from peers encountered during network entry

Works because peers encountered share same prefix

Can be incomplete

Network neighbor set

Probe nodes from collected routing tables

Request neighborhood sets for known nearby nodes

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(14)

9.2 Pastry: Locality properties

• Assumption: scalar proximity metric

e.g. ping/RTT delay, # IP hops, geographical distance a node can probe distance to any other node

• Proximity invariant:

Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix.

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(15)

9.2 Pastry: Geometric Routing in proximity space

VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4 d471f1

d467c4

65a1fc d13da3

d4213f

d462ba

Proximity space NodeId space

Network distance for each routing step is exponentially increasing (entry in row l is chosen from a set of nodes of size N/2bl)

Distance increases monotonically (message takes larger and larger strides)

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(16)

9.2 Pastry: Locality properties

• Each routing step is local, but there is no guarantee of globally shortest path

• Nevertheless, simulations show:

Expected distance traveled by a message in the proximity space is within a small constant of the minimum

• Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig 16

(17)

9.2 Pastry: Node addition details

• New node X contacts nearby node A

• A routes “join” message to X, which arrives to Z, closest to X

• X obtains leaf set from Z, i’th row for routing table

from i’th node from A to Z

• X informs any nodes that need to be aware of its arrival

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig 17

(18)

9.2 Node departure/failure

• Leaf set repair (eager – all the time):

Leaf set members exchange keep-alive messages request set from furthest live node in set

• Routing table repair (lazy – upon failure):

get table from peers in the same row, if not found – from higher rows

• Neighborhood set repair (eager)

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(19)

9.2 Pastry: Average # of hops

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

1000 10000 100000

Number of nodes

Average number of hops

Pastry Log(N)

L=16, 100k random queries

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(20)

9.2 Pastry distance vs IP distance

0 500 1000 1500 2000 2500

0 200 400 600 800 1000 1200 1400

Distance between source and destination

Distance traveled by Pastry message

Mean = 1.59

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

GATech top., .5M hosts, 60K nodes, 20K random messages

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(21)

9.2 Pastry Summary

• Usual DHT scalability

Peer state log(N)

Avg. path length log(N)

• Very robust

Different routes possible

Lazy routing table update sufficient

• Network proximity aware

No IP network detours

(22)

9.3 Symphony

Symphony DHT

Map the nodes and keys to the ring

Link every node with its successor and predecessor

Add k random links with probability proportional to 1/(d·log N),

where d is the distance on the ring Lookup time O(log2 N)

If k = log N lookup time O(log N) Easy to insert and remove nodes

(perform periodical refreshes for the links)

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(23)

9.3 Symphony in a Nutshell

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

Nodes arranged in a unit circle(perimeter = 1) Arrival --> Node chooses positionalong circle

uniformly at random

Each node has 1 short link (next node on circle) and k long links

node long link short link

Fault Tolerance:

No backups for long links! Only short links are fortified for fault tolerance.

Adaptation of Small World Idea: [Kleinberg00]

Long links chosen from a probability distribution function: p(x) = 1/(x log n) where n = #nodes.

Simple greedy routing:

“Forward along that link that minimizes the absolute distance to the destination.”

Average lookup latency = O((log2 n) / k)hops

n ?

(24)

9.3 Network Size Estimation Protocol

x = Length of arc 1/x = Estimate of n

Problem:What is the current value of n, the total number of nodes?

3 arcs are enough.

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(25)

9.3 Step 0: Symphony

0 ¼ ½ 1

Probability Distribution

p(x) = 1 / (x log n)

Symphony:

“Draw from the PDF k times”

Distance to long distance neighbor

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(26)

9.3 Step 1: Step-Symphony

0 ¼ ½ 1

Probability Distribution

p(x) = 1 / x log n

Step-Symphony:

Draw from the discretized PDF k times

Distance to long distance neighbor

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(27)

9.3 Step 2: Divide PDF into log n Equal Bins

0 ¼ ½ 1

Step-Partitioned-Symphony:

Draw exactly once from each of k bins

Distance to long distance neighbor

Probability Distribution

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(28)

9.3 Step 3: Discrete PDF

VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig

0 ¼ ½ 1

Distance to long distance neighbor

Chord:

Draw exactly once from each of log n bins

Each bin is essentially a point.

Probability Distribution

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(29)

9.3 Two Optimizations

Bi-directional Routing

Exploit both outgoing and incoming links!

Route to the neighbor that minimizes absolute distance to destination

Reduces avg latency by 25-30%

1-Lookahead

List of neighbor’s neighbors Reduces avg. latency by 40%

Also applicable to other DHTs

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(30)

9.3 Symphony: Summary (1)

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

Distributed Hashing in a Small World

Like Chord:

– Overlay structure: ring – Key ID space partitioning

Unlike Chord:

– Routing Table

• Two short links for immediate neighbors

• k long distance links for jumping

• Long distance links are built in a probabilistic way

• Peers are selected using a Probability Distribution Function (pdf)

• Exploit the characteristics of a small-world network

– Dynamically estimate the current system size

(31)

Symphony: Summary (2)

• Each node has k = O(1) long distance links

– Lookup:

• Expected path length: O((log2N)/k) hops

– Join & leave

• Expected: O(log2N) messages

• Comparing with Chord:

– Discard the strong requirements on the routing table (finger table)

– rely on the small world to reach the destination.

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(32)

9.4 Viceroy network

• Arrange nodes and keys on a ring

As usual

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(33)

9.4 Viceroy network

• Assign to each node a level value

chosen uniformly from the set {1,…,log n}

estimate n by taking the inverse of the distance of the node

with its successor easy to update

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(34)

9.4 Viceroy network

• Create a ring of nodes within the same level

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(35)

9.4 Downward links

For peer with key x at level i

Direct successor peer on level i+1 Long link to peer x+2i on level i+1

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(36)

9.4 Upward links

For each peer with key x at level i

Predecessor link on level i-1

Long link to peer at x-2i on level i-1

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(37)

9.4 Butterfly links

Each node x at level i has two downward links to level i+1

a left link to the first node of level i+1 after position x on the ring a right link to the first node of level i+1 after position x + (½)i

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(38)

9.4 Viceroy

Emulating the butterfly network

Logarithmic path lengths between any two nodes in the network

Constant degree per node

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

level 1 level 2

level 4 level 3

000 001 010 011 100 101 110 111

(39)

9.4 Viceroy Summary

• Scalability: Optimal peer state

Peer state log(1)

Avg. path length log(N)

• Complex algorithm

• Network proximity not taken into account

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(40)

9.5 CAN: Overview

Early and successful algorithm

Simple & elegant

Intuitively to understand and implement

many improvements and optimizations exist Sylvia Ratnasamy et al. in 2001

Main responsibilities:

CAN is a distributed system that maps keys onto values Keys hashed into d dimensional space

Interface:

insert(key, value)

retrieve(key)

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(41)

9.5 CAN

Virtual d-dimensional Cartesian coordinate system on a d-torus

Example: 2-d [0,1]x[1,0]

Dynamically partitioned among all nodes

Pair (K,V) is stored by

mapping key K to a point P in the space using a uniform hash function and storing (K,V) at the node in the zone containing P

Retrieve entry (K,V) by applying the same hash function to map K to P and retrieve entry from node in zone containing P

If P is not contained in the zone of the requesting node or its

neighboring zones, route request to neighbor node in zone nearest P

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(42)

9.5 CAN

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

x State of the system at time t

Peer Resource

Zone

In this 2-dimensional space a key is mapped to a point (x,y)

(43)

9.5 CAN: Routing

d-dimensional space with n zones

2 zones are neighbours if d-1 dimensions

overlap

Algorithm:

Choose the neighbor nearest to the

destination

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

Peer

Q(x,y) (x,y)

Q(x,y)

key

(44)

9.5 CAN: Construction - Basic Idea

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(45)

9.5 CAN: Construction

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

Bootstrap node

new node

(46)

9.5 CAN: Construction

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

I Bootstrap

node

new node 1) Discover some node “I” already in CAN

(47)

9.5 CAN: Construction

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

2) Pick random point in space

I

(x,y)

new node

(48)

9.5 CAN: Construction

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(x,y)

3) I routes to (x,y), discovers node J I

J

new node

(49)

9.5 CAN: Construction

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

J new

4) split J’s zone in half… new owns one half

(50)

9.5 CAN-Improvement: Multiple Realities

• Build several

CAN-networks

• Each network is called a reality

• Routing

Jump between realities Chose reality in which

distance is shortest

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(51)

9.5 CAN-Improvement: Multiple Dimensions

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(52)

9.5 CAN: Multiple Dimensions vs. Multiple Realities

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

More dimensions

shorter paths

More realities

more robustness

Trade-off?

(53)

9.5 CAN: Summary

• Inferior scalability

Peer state O(d)

Avg. path length O(d N1/d)

• Useful for spatial data!

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

(54)

9.6 Spectrum of DHT Protocols

VDBMS und P2P – Wolf Tilo Balke - Institut für Informationssysteme - TU Braunschweig

Protocol #links latency CAN O(d) O(d N1/d) Chord O(log N) O(log N)

Viceroy O(1) O(log N)

Pastry O((2b-1)(log2 N)/b) O((log N) / b)

Symphony O(k) O((log2 N)/k)

Deterministic Topology

Partly

Randomized Topology

Completely Randomized Topology

(55)

9.6 Latency vs State Maintenance

VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig

# TCP Connections Average Latency 5 10 15

0 10 20 30 40 50 60 Viceroy

x x CAN

Pastry

x x Chord

X Pastry

Network size: n=215 nodes

Symphony x x

x x

x

x x x

x

x

Referenzen

ÄHNLICHE DOKUMENTE

In comparison to economic and social statistics, monitoring of the environment is a recent development and is the weakest area of monitoring in the SDG framework based on the

Among the recent data management projects are the final global data synthesis for the Joint Global Ocean Flux Study (JGOFS) and the International Marine Global

who kindly assisted us may not have had access to all of the neccessary documentation to retrace the inconsistencies and were working only from information supplied by NGDC (or

Section 3 will review the different historic definitions and approaches to the issue and by presenting a new definition of a barrier to entry that unifies the

1 Department of Radiology, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of

Deswegen wurde für die 3 positiven Substanzen, bei denen die Stimulationsindices für die getesteten Konzentrationen alle über 3 lagen, versucht, näherungsweise einen „EC 3 -analogen

Semantic Web, Event-Driven Architecture, OWL, Active Knowledge Base, Intensional Updates, Hybrid Reasoning, Description Logic, F-

Theorem 6 The stochastic single node service provision problem with a xed number of scenarios can be solved in pseudo-polynomial time.. Proof Consider the following DP that has