Data Management

(1)

Peer-to-Peer

Data Management

Hans-Dieter Ehrich

Institut für Informationssysteme

Technische Universität Braunschweig

http://www.ifis.cs.tu-bs.de

(2)

7. Unstructured P2P Networks

The transparencies of this chapter are based on the package

Structured Peer-to-Peer Networks by

Wolf-Tilo Balke and Wolf Siberski 31.10.2007

● Original slides partially provided by

►

K. Wehrle, S. Götz, S. Rieche

(University of Tübingen)

(3)

8. Structured P2P Networks

1. Distributed Management and Retrieval of Data

1.

Comparison of strategies for data retrieval

2.

Central server

3.

Flooding search

4.

Distributed indexing

5.

Comparison of lookup concepts

2. Fundamentals of Distributed Hash Tables

1.

Distributed management of data

2.

Addressing in Distributed Hash Tables

3.

Routing

4.

Data Storage 3. DHT Mechanisms

1.

Node Arrival

2.

Node Failure / Departure 4. DHT Interfaces

5. Example: Chord

(4)

Distributed Management and Retrieval of Data

●

Essential challenge in (most) Peer-to-Peer systems?

►

Location of a data item among systems distributed

 Where shall the item be stored by the provider?

 How does a requester find the actual location of an item?

►

Scalability: keep the complexity for communication and storage scalable

►

Robustness and resilience in case of faults and frequent changes

D

?

Data item „D“

distributed system

7.31.10.25 peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18 planet-lab.org

berkeley.edu 89.11.20.15

I have item „D“.

Where to place „D“?

I want item „D“.

Where can I find „D“?

(5)

Comparison of Strategies for Data Retrieval

●

Strategies to store and retrieve data items in distributed systems

►

Central server

►

Flooding search

►

Distributed indexing

D

?

Data item „D“

distributed system

7.31.10.25 peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18 planet-lab.org

berkeley.edu 89.11.20.15

I have item „D“.

Where to place „D“?

I want item „D“.

Where can I find „D“?

(6)

Transmission: D Node B



“Where is D ?”



“A stores D”



Node A Node B

Server S

“A stores D”



“A stores D”

Approach I: Central Server

● Simple strategy: Central Server

►

Server stores information about locations

 Node A (provider) tells server that it stores item D

 Node B (requester) asks server S for the location of D

 Server S tells B that node A stores item D

 Node B requests item D from node A

(7)

Approach I: Central Server

●

Advantages

►

Search complexity of O(1) – “just ask the server”

►

Complex and fuzzy queries are possible

►

Simple and fast

●

Problems

►

No Scalability

 O(N) node state in server

 O(N) network and system load of server

►

Single point of failure or attack (also for law suites ;-)

►

Non-linear increasing implementation and maintenance cost

(in particular for achieving high availability and scalability)

►

Central server not suitable for systems with massive numbers of users

●

But overall, …

►

Best principle for small and simple applications!

(8)

Approach II: Flooding Search

●

Fully Distributed Approach

►

Central systems are vulnerable and do not scale

►

Unstructured Peer-to-Peer systems follow opposite approach

►

No information on location of a content

►

Content is only stored in the node providing it

●

Retrieval of data

►

No routing information for content

►

Necessity to ask as much systems as possible / necessary

►

Approaches

 Flooding: high traffic load on network, does not scale

 Highest degree search: quick search through large areas – large number of messages needed for unique identification

(9)

& Transmission: D  Node B

“I have D ?”



“B searchesD”

Node B

“I store D”



^











 Fully Decentralized Approach: Flooding Search

 No information about location of data in the intermediate systems

 Necessity for broad search

 Node B (requester) asks neighboring nodes for item D

- Nodes forward request to further nodes (breadth-first search / flooding)

 Node A (provider of item D) sends D to requesting node B

 

Approach II: Flooding Search

(10)

Motivation Distributed Indexing – I

● Communication overhead vs. node state

Comm un ica tio n Ov er hea d

Node State

Flooding

Central Server O(N)

O(N) O(1)

O(1)

O(log N) O(log N)

Bottleneck:

•Communication Overhead

•False negatives

Bottlenecks:

•Memory, CPU, Network

•Availability

?

Scalable solution between both

extremes?

(11)

Motivation Distributed Indexing – II

● Communication overhead vs. node state

Comm un ica tio n Ov er hea d

Flooding

O(N) O(1)

O(1)

O(log N) O(log N)

Bottleneck:

•False negatives

Bottlenecks:

•Availability Distributed

Hash Table

 Scalability: O(log N)

 No false negatives

 Resistant against changes

 Failures, Attacks

 Short time users

(12)

Distributed Indexing

●

Goal is scalable complexity for

►

Communication effort: O(log(N)) hops

►

Node state: O(log(N)) routing entries

H(„ my data“ )

= 3107

2207

7.31.10.25 peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18 planet-lab.org berkeley.edu

3485 2906 1622 2011 709 1008

611

89.11.20.15

?

Routing in O(log(N)) steps to the node

storing the data

Nodes store O(log(N)) routing information to

other nodes

(13)

Distributed Indexing

●

Approach of distributed indexing schemes

►

Data and nodes are mapped into same address space

►

Intermediate nodes maintain routing information to target nodes

 Efficient forwarding to „destination“ (content – not location)

 Definitive statement of existence of content

●

Problems

►

Maintenance of routing information required

►

Fuzzy queries not primarily supported (e.g, wildcard searches)

H(„ my data“ )

= 3107

2207

3485 2906 1622 2011 709 1008

611

?

H(„ my data“ )

= 3107

2207

3485 2906 1622 2011 709 1008

611

?

(14)

8. Structured P2P Networks

1. Distributed Management and Retrieval of Data

1.

Comparison of strategies for data retrieval

2.

Central server

3.

Flooding search

4.

Distributed indexing

5.

Comparison of lookup concepts

2. Fundamentals of Distributed Hash Tables

1.

Distributed management of data

2.

Addressing in Distributed Hash Tables

3.

Routing

4.

Data Storage 3. DHT Mechanisms

1.

Node Arrival

2.

Node Failure / Departure 4. DHT Interfaces

5. Example: Chord

(15)

Fundamentals of Distributed Hash Tables I

● Characteristics of Hash Tables

►

Basic idea: keys are mapped via a common function to smaller fingerprints (hashes)

 Every number defines a position in an array (bucket)

 Keys mapped onto the same hash are put into the same bucket

 Look-up works by hashing the query and searching the respective bucket

►

Hash Function

 Poor choice leads to clustering, i.e. probability of keys mapping to the same hash bucket (collision) is great and the performance degrades

 Good choices should be easy to compute, result in few collisions, and show a uniform distribution of hash values

►

Hash Tables

 provide constant-time O(1) lookup on average, regardless of the number

of items in the table

(16)

Fundamentals of Distributed Hash Tables II

● Challenges for designing Distributed Hash Tables

►

Desired Characteristics

 Flexibility

 Reliability

 Scalability

►

Equal distribution of content among nodes

 Crucial for efficient lookup of content

►

Permanent adaptation to faults, joins, exits of nodes

 Assignment of responsibilities to new nodes

 Re-assignment and re-distribution of responsibilities

in case of node failure or departure

(17)

Distributed Management of Data

Sequence of operations

1. Mapping of nodes and data into same address space

► Peers and content are addressed using flat identifiers (IDs)

► Common address space for data and nodes

► Nodes are responsible for data in certain parts of the address space

► Association of data to nodes may change since nodes may disappear

2. Storing / Looking up data in the DHT

► Search for data = routing to the responsible node

 Responsible node not necessarily known in advance

 Deterministic statement about availability of data

(18)

Addressing in Distributed Hash Tables

● Step 1: Mapping of content/nodes into linear space

►

Usually: 0, …, 2

^m

-1 >> number of objects to be stored

►

Mapping of data and nodes into an address space (with hash function)

 E.g., Hash(String) mod 2

^m

: H(„my data“)  2313

►

Association of parts of address space to DHT nodes

H(Node Y)=3485

3485 - 610

1622 - 2010 611 -

709

2011 - 2206

2207- 2905

(3485 - 610) 2906 -

3484 1008 -

1621

Y

X

2^m-1 0

Often, the address space is viewed as a circle.

Data item “D”:

H(“D”)=3107 H(Node X)=2906

(19)

Association of Address Space with Nodes

● Each node is responsible for part of the value range

► Often with redundancy (overlapping of parts)

► Continuous adaptation

► Real (underlay) and logical (overlay) topology are (mostly) uncorrelated

Logical view of the Distributed Hash Table

Mapping on the real topology

2207

3485 2906 1622 2011 709 1008

611

Node 3485 is responsible for data items in range 2907 to 3485

(in case of a Chord-DHT)

(20)

Step 2: Routing to a Data Item

● Step 2:

Locating the data (content-based routing)

● Goal: Small and scalable effort

► O(1) with centralized hash table

 But:

Management of a centralized hash table is very costly (server!)

► Minimum overhead with distributed hash tables

 O(log N): DHT hops to locate object

 O(log N): number of keys and routing information per node (N = # nodes)

(21)

Step 2: Routing to a Data Item

● Routing to a K/V-pair

►

Start lookup at arbitrary node of DHT

►

Routing to requested data item (key)

(

3107, (ip, port)

)

Value = pointer to location of data Key = H(“my data”)

Node 3485 manages keys 2907-3485,

Initial node (arbitrary) H(„my data“)

= 3107

2207

2906 3485

1622 2011 1008

709

611

(22)

Step 2: Routing to a Data Item

● Getting the content

►

K/V-pair is delivered to requester

►

Requester analyzes K/V-tuple

(and downloads data from actual location – in case of indirect storage) H(„my data“)

= 3107

2207

2906 3485

1622 2011 1008

709

611

Get_Data(ip, port)

Node 3485 sends

(3107, (ip/port)) to requester

In case of indirect storage:

After knowing the actual Location, data is requested

(23)

Association of Data with IDs – Direct Storage

● How is content stored on the nodes?

►

Example:

H(“my data”) = 3107 is mapped into DHT address space

● Direct storage

►

Content is stored in responsible node for H(“my data”)

 Inflexible for large content – o.k., if small amount data (<1KB)

D D

134.2.11.68

2207

3485 2906 1622 2011 709 1008

611

H_SHA-1(„D“)=3107

D

(24)

Association of Data with IDs – Indirect Storage

● Indirect storage

►

Nodes in a DHT store tuples like (key,value)

 Key = Hash(„my data”)  2313

 Value is often real storage address of content:

(IP, Port) = (134.2.11.140, 4711)

►

More flexible, but one step more to reach content

2207

3485 2906 1622 2011

709 1008

611

H_SHA-1(„D“)=3107

Item D: 134.2.11.68

D

134.2.11.68

(25)

8. Structured P2P Networks

1. Distributed Management and Retrieval of Data

1.

Comparison of strategies for data retrieval

2.

Central server

3.

Flooding search

4.

Distributed indexing

5.

Comparison of lookup concepts

2. Fundamentals of Distributed Hash Tables

1.

Distributed management of data

2.

Addressing in Distributed Hash Tables

3.

Routing

4.

Data Storage 3. DHT Mechanisms

1.

Node Arrival

2.

Node Failure / Departure 4. DHT Interfaces

5. Example: Chord

(26)

Node Arrival

● Joining of a new node

1.

Calculation of node ID

2.

New node contacts DHT via arbitrary node

3.

Assignment of a particular hash range

4.

Copying of K/V-pairs of hash range (usually with redundancy)

5.

Binding into routing environment

2207

3485 2906 1622 2011

709 1008

611

ID: 3256 134.2.11.68







(27)

Node Failure / Departure

● Failure of a node

►

Use of redundant K/V pairs (if a node fails)

►

Use of redundant / alternative routing paths

►

Key-value usually still retrievable if at least one copy remains

● Departure of a node

►

Partitioning of hash range to neighbor nodes

►

Copying of K/V pairs to corresponding nodes

►

Unbinding from routing environment

(28)

8. Structured P2P Networks

1. Distributed Management and Retrieval of Data

1.

Comparison of strategies for data retrieval

2.

Central server

3.

Flooding search

4.

Distributed indexing

5.

Comparison of lookup concepts

2. Fundamentals of Distributed Hash Tables

1.

Distributed management of data

2.

Addressing in Distributed Hash Tables

3.

Routing

4.

Data Storage 3. DHT Mechanisms

1.

Node Arrival

2.

Node Failure / Departure 4. DHT Interfaces

5. Example: Chord

(29)

DHT Interfaces

●

Generic interface of distributed hash tables

►

Provisioning of information

 Publish(key,value)

►

Requesting of information (search for content)

 Lookup(key)

►

Reply

 value

●

DHT approaches are interchangeable (with respect to interface)

Put(Key,Value) Get(Key)

Value Distributed Application

Node 1 Node 2 Node 3 . . . . Node N Distributed Hash Table

(CAN, Chord, Pastry, Tapestry, …)

(30)

Comparison: DHT vs. DNS

● Comparison DHT vs. DNS

►

Traditional name services follow fixed mapping

 DNS maps a logical node name to an IP address

►

DHTs offer flat / generic mapping of addresses

 Not bound to particular applications or services

 „value“ in (key, value) may be o an address

o a document

o or other data …

(31)

Comparison: DHT vs. DNS

Domain Name System

►

Mapping:

Symbolic name IP address

►

Is built on a hierarchical structure with root servers

►

Names refer to administrative domains

►

Specialized to search for computer names and services

Distributed Hash Table

►

Mapping: key  value can easily realize DNS

►

Does not need a special server

►

Does not require special name space

►

Can find data that are

independently located of computers

(32)

Conclusions

● Properties of DHTs

►

Use of routing information for efficient search for content

►

Keys are evenly distributed across nodes of DHT

 No bottlenecks

 A continuous increase in number of stored keys is admissible

 Failure of nodes can be tolerated

 Survival of attacks possible

►

Self-organizing system

►

Simple and efficient realization

►

Supporting a wide spectrum of applications

 Flat (hash) key without semantic meaning

 Value depends on application

(33)

Next …

● Specific examples of Distributed Hash Tables

►

Chord

UC Berkeley, MIT

►

Pastry

Microsoft Research, Rice University

►

CAN

UC Berkeley, ICSI

►

P-Grid

EPFL Lausanne

●

… and there are plenty of others: Kademlia, Symphony, Viceroy, …

(34)

8. Structured P2P Networks

1. Distributed Management and Retrieval of Data

1.

Comparison of strategies for data retrieval

2.

Central server

3.

Flooding search

4.

Distributed indexing

5.

Comparison of lookup concepts

2. Fundamentals of Distributed Hash Tables

1.

Distributed management of data

2.

Addressing in Distributed Hash Tables

3.

Routing

4.

Data Storage 3. DHT Mechanisms

1.

Node Arrival

2.

Node Failure / Departure 4. DHT Interfaces

5. Example: Chord

(35)

Chord

Ion Stoica Robert Morris David Karger

M. Frans Kaashoek Hari Balakrishnan (2001)

(36)

Chord: Overview

● Early and successful algorithm

● Simple & elegant

►

easy to understand and implement

►

many improvements and optimizations exist

►

Ion Stoica et al. in 2001

● Main responsibilities:

►

Routing

 Flat logical address space: l-bit identifiers instead of IP addresses

 Efficient routing in large systems: log(N) hops with N total nodes

►

Self-organization

 Handle node arrival, departure, and failure

(37)

Chord: Topology

● Hash-table storage

►

put (key, value) inserts data to Chord

►

Value = get (key) retrieves data from Chord

● Identifiers

►

Derived from hash function

 E.g. SHA-1, 160-bit output → 0 <= identifier < 2^160

►

Key associated with data item

 E.g. key = sha-1(value)

►

ID associated with host

 E.g. id = sha-1 (IP address, port)

(38)

Chord: Topology

● Keys and IDs on ring, i.e., all arithmetic modulo 2^160

● (key, value) pairs managed by clockwise next node: successor

6

1

2 6

0

4

2 6

5

1

3 7

Chord 2 Ring

Identifier Node

X Key

successor(1) = 1

successor(2) = 3 successor(6) = 0

(39)

Chord: Topology

● Topology determined by links between nodes

►

Link: knowledge about another node

►

Stored in routing table on each node

● Simplest topology: circular linked list

►

Each node has link to clockwise next node

0

4

2 6

5

1

3 7

(40)

Chord: Routing

● Primitive routing:

►

Forward query for key x until successor(x) is found

►

Return result to source of query

● Pros:

►

Simple

►

Little node state

● Cons:

►

Poor lookup efficiency:

O(1/2 * N) hops on average (with N nodes)

►

Node failure breaks circle

0

4

2 6

5

1

3 7

1

2 6

Key 6?

Node 0

(41)

Chord: Routing

● Advanced routing:

►

Store links to z next neighbors

►

Forward queries for k to farthest known predecessor of k

►

For z = N: fully meshed routing system

 Lookup efficiency: O(1)

 Per-node state: O(N)

►

Still poor scalability

● Scalable routing:

►

Linear routing progress scales poorly

►

Mix of short- and long-distance links required:

 Accurate routing in node’s vicinity

 Fast routing progress over large distances

 Bounded number of links per node

(42)

Chord: Routing

● Chord’s routing table: finger table

►

Stores log(N) links per node

►

Covers exponentially increasing distances:

 Node n: entry i points to successor(n + 2^i) (i-th finger)

0

4

2 6

5

1

3 7

finger table

i succ.

keys 1 0

1 2

3 3 0 start 2 3 5

finger table

i succ.

keys 2 0

1 2

0 0 0 start 4 5 7 1

2 4

1 3 0 finger table

start succ.

keys 6 0

1 2 i

(43)

● Chord’s routing algorithm:

►

Each node n forwards query for key k clockwise

 To farthest finger preceding k

 Until n = predecessor(k) and successor(n) = successor(k)

 Return successor(n) to source of query

63

4 7

16 14 13

19

23 26 37 30

39 42 45 49

52 54

56

60

i 2^i Target Link

0 1 53 54

1 2 54 54

2 4 56 56

3 8 60 60

4 16 4 4

5 32 20 23 i 2^i Target Link

0 1 24 26

1 2 25 26

2 4 27 30

3 8 31 33

4 16 39 39 5 32 55 56 i 2^i Target Link

0 1 40 42

1 2 41 42

2 4 43 45

3 8 47 49

4 16 55 56

5 32 7 7

45 42 49

i 2^i Target Link

0 1 43 45

1 2 44 45

2 4 46 49

3 8 50 52

4 16 58 60

5 32 10 13 ⁴⁴

lookup (44) lookup (44) = 45

Chord: Routing

(44)

Chord: Self-Organization

● Handle changing network environment

►

Failure of nodes

►

Network failures

►

Arrival of new nodes

►

Departure of participating nodes

● Maintain consistent system state for routing

►

Keep routing information up to date

 Routing correctness depends on correct successor information

 Routing efficiency depends on correct finger tables

►

Failure tolerance required for all operations

(45)

Chord: Failure Tolerance: Storage

● Layered design

►

Chord DHT mainly responsible for routing

►

Data storage managed by application

 persistence

 consistency

 fairness

● Chord soft-state approach:

►

Nodes delete (key, value) pairs after timeout

►

Applications need to refresh (key, value) pairs periodically

►

Worst case: data unavailable for refresh interval after node failure

(46)

Chord: Failure Tolerance: Routing

● Finger failures during routing

►

query cannot be forwarded to finger

►

forward to previous finger (do not overshoot destination node)

►

trigger repair mechanism: replace finger with its successor

● Active finger maintenance

►

periodically check liveness of fingers

►

replace with correct nodes on failures

►

trade-off: maintenance traffic vs. correctness & timeliness

63

4 7

16 14 13

19

23 26 33 30

37 39 42 45 49

52 54

56

60

45 42 49

44

(47)

Chord: Failure Tolerance: Routing

● Successor failure during routing

►

Last step of routing can return failed node to source of query -> all queries for successor fail

►

Store n successors in successor list

 successor[0] fails -> use successor[1] etc.

 routing fails only if n consecutive nodes fail simultaneously

● Active maintenance of successor list

►

periodic checks similar to finger table maintenance

►

crucial for correct routing

(48)

Chord: Node Arrival

● New node picks ID

● Contact existing node

● Construct finger table via standard routing/lookup()

● Retrieve (key, value) pairs from successor

0

4

2 6

5

1

3 7

finger table

i succ.

keys 1 0

1 2

3 3 0 start 2 3 5

finger table

i succ.

keys 2 0

1 2

0 0 0 start 4 5 7 1

2 4

1 3 0 finger table

start succ.

keys 6 0

1 2 i

7 0 2

0 0 3 finger table

start succ.

keys 0

1 2 i

(49)

Chord: Node Arrival

● Examples for choosing new node IDs

►

random ID: equal distribution assumed but not guaranteed

►

hash IP address & port

►

place new nodes based on

 load on existing nodes

 geographic location, etc.

● Retrieval of existing node IDs

►

Controlled flooding

►

DNS aliases

►

Published through web

►

etc.

0

4

2 6

5

1

3 7

ID = ?

ID = rand() = 6

DNS

entrypoint.chord.org?

182.84.10.23

(50)

Chord: Node Arrival

● Construction of finger table

►

iterate over finger table rows

►

for each row: query entry point for successor

►

standard Chord routing on entry point

● Construction of successor list

►

add immediate successor from finger table

►

request successor list from successor

0

4

2 6

5

1

3 7

7 0 2

0 0 3 finger table

start succ.

keys 0

1 2

i succ(7)?

succ(0)?

succ(2)?

succ(7) = 0 succ(0) = 0 succ(2) = 3

successor list 0 1

successor list 1 3

(51)

Chord: Node Departure

● Deliberate node departure

►

clean shutdown instead of failure

● For simplicity: treat as failure

►

system already failure tolerant

►

soft state: automatic state restoration

►

state is lost briefly

►

invalid finger table entries: reduced routing efficiency

● For efficiency: handle explicitly

►

notification by departing node to

 successor, predecessor, nodes at finger distances

►

copy (key, value) pairs before shutdown

(52)

Chord: Summary

● Complexity

►

Messages per lookup: O(log N)

►

Memory per node: O(log N)

►

Messages per management action (join/leave/fail): O(log² N)

● Advantages

►

Theoretical models and proofs about complexity

►

Simple & flexible

● Disadvantages

►

No notion of node proximity and proximity-based routing optimizations

►

Chord rings may become disjoint in realistic settings

● Many improvements published

►

e.g. proximity, bi-directional links, load balancing, etc.

(53)

The Architectures of 1

^st

and 2

^nd

Gen. P2P

Client-Server Peer-to-Peer

1. Server is the central entity and only provider of service and content.

Network managed by the Server

2. Server as the higher performance system.

3. Clients as the lower performance system

Example: WWW

1. Resources are shared between the peers

2. Resources can be accessed directly from other peers 3. Peer is provider and requestor (Servent concept)

Unstructured P2P Structured P2P

Centralized P2P Pure P2P Hybrid P2P DHT-Based

1. All features of Peer-to- Peer included 2. Central entity is

necessary to provide the service

3. Central entity is some kind of index/group database

Example: Napster

1. All features of Peer-to- Peer included

2. Any terminal entity can be removed without loss of functionality

3. No central entities Examples: Gnutella 0.4,

Freenet

3. dynamic central entities

Example: Gnutella 0.6, JXTA

3. No central entities 4. Connections in the

overlay are “fixed”

Examples: Chord, CAN

(54)

Reminder: Distributed Indexing

● Communication overhead vs. node state

Comm un ica tio n Ov er hea d

Node State

Flooding

O(N) O(1)

O(1)

O(log N) O(log N)

Bottleneck:

•False negatives

Bottlenecks:

•Availability Distributed

Hash Table

 Scalability: O(log N)

 No false negatives

 Resistant against changes

 Failures, Attacks

 Short time users

(55)