Wolf-Tilo Balke Sascha Tönnies
Institut für Informationssysteme
Technische Universität Braunschweig
Peer-to-Peer
Data Management
1. Reliability in Distributed Hash Tables
2. Storage Load Balancing in Distributed Hash Tables
1. Power of Two Choices 2. Virtual Server
3. Content Distribution
1. Swarming 2. Bit Torent
11. Content Distribution
11.1 “Stabilize” Function
• The Stabilize Function corrects inconsistent connections
• Remember:
– Periodically done by each node n
– n asks its successor for its predecessor p – n checks if p equals n
– n also periodically refreshes random finger x
• by (re)locating successor
• Successor-List to find new successor
– If successor is not reachable use next node in successor-list – Start stabilize function
… … 11.1 Reliability of Data in Chord
• Original
– No Reliability of data
• Recommendation
– Use of Successor-List
– The reliability of data is an application task
– Replicate inserted data to the next f other nodes
– Chord informs application of arriving or failing nodes
11.1 Properties
• Advantages
– After failure of a node its successor has the data already stored
• Disadvantages
– Node stores f intervals
• More data load
– After breakdown of a node
• Find new successor
• Replicate data to next node
– More message overhead at breakdown
11.1 Multiple Nodes in One Interval
• Fixed positive number f
– Indicates how many nodes have to act within one interval at least
• Procedure
– First node takes a random position
– A new node is assigned to any existing node
– Node is announced to all other nodes in same interval
…
…
…
…
9 10 1
2 3
4 5
6 7 8
11. 1Multiple Nodes in One Interval
• Effects of algorithm
– Reliability of data
– Better load balancing – Higher security
…
12 45 67 910…
11.1 Reliability of Data
• Insertion
– Copy of documents
• Always necessary for replication
– Less additional expenses
• Nodes have only to store pointers to nodes from the same interval
– Nodes store only data of one interval
…
…
…
…
11.1 Reliability of Data
• Reliability
– Failure: no copy of data needed
• Data are already stored within same interval
– Use stabilization procedure to correct fingers
• As in original Chord
…
12 45 67 910…
11.1 Properties
• Advantages
– Failure: no copy of data needed
– Rebuild intervals with neighbors only if critical – Requests can be answered by f different nodes
• Disadvantages
– Less number of intervals as in original Chord
• Solution: Virtual Servers
11.1 Fault Tolerance
• Replication
– Each data item is replicated K times
– K replicas are stored on different nodes
• Redundancy
– Each data item is split into M fragments
• K redundant fragments are computed
– Use of an "erasure-code“ (see e.g. V. Pless: Introduction to the Theory of Error- Correcting Codes. Wiley-Interscience, 1998)
• Any M fragments allow to reconstruct the original data
– For each fragment we compute its key
11.2 Storage Load Balancing in DHT
• Suitable hash function (easy to compute, few collisions)
• Standard assumption 1: uniform key distribution
– Every node with equal load – No load balancing is needed
• Standard assumption 2: equal distribution
– Nodes across address space – Data across nodes
• But is this assumption justifiable?
– Analysis of distribution of data using simulation
11. 2 Storage Load Balancing in DHT
• Analysis of distribution of data
• Example
– Parameters
• 4,096 nodes
• 500,000 documents
– Optimum
• ~122 documents per node
• No optimal distribution in Chord without load
Optimal distribution of documents across nodes
11.2 Storage Load Balancing in DHT
• Number of nodes without storing any document
– Parameters
• 4,096 nodes
• 100,000 to 1,000,000 documents
– Some nodes without any load
• Why is the load unbalanced?
• We need load balancing to keep the complexity of DHT management low
11.2 Definitions
• Definitions
– System with N nodes
– The load is optimally balanced,
• Load of each node is around 1/N of the total load.
– A node is overloaded (heavy)
• Node has a significantly higher load compared to the optimal distribution of load.
– Else the node is light
11.2 Load Balancing Algorithms
• Problem
– Significant difference in the load of nodes
• Several techniques to ensure an equal data distribution
– Power of Two Choices (Byers et. al, 2003) – Virtual Servers (Rao et. al, 2003)
– Thermal-Dissipation-based Approach (Rieche et. al, 2004) – A Simple Address-Space and Item Balancing (Karger et. al,
2004) – …
11.2 Overview
• Algorithms
– Power of Two Choices (Byers et. al, 2003) – Virtual Servers (Rao et. al, 2003)
• John Byers, Jeffrey Considine, and Michael Mitzen-
macher: “Simple Load Balancing for Distributed Hash Tables“ in Second International Workshop on Peer- to-Peer Systems (IPTPS), Berkeley, CA, USA, 2003.
11.2 Power of Two Choices
• Idea
– One hash function for all nodes
• h0
– Multiple hash functions for data
• h1, h2, h3, …hd
• Two options
– Data is stored at one node only – Data is stored at one node &
other nodes store a pointer
11.2 Power of Two Choices
• Inserting Data
– Results of all hash functions are calculated
• h1(x), h2(x), h3(x), …hd(x)
– Data is stored on the retrieved node with the lowest load – Alternative: other nodes store pointer
– The owner of the item has to insert the document periodically
• Prevent removal of data after a timeout (soft state)
11.2 Power of Two Choices
• Retrieving
– Without pointers
• Results of all hash functions are calculated
• Request all of the possible nodes in parallel
• One node will answer
– With pointers
• Request only one of the possible nodes.
• Node can forward the request directly to the final node
11.2 Power of Two Choices
• Advantages
– Simple
• Disadvantages
– Message overhead at inserting data – With pointers
• Additional administration of pointers lead to even more load
– Without pointers
11.2 Overview
• Algorithms
– Power of Two Choices (Byers et. al, 2003) – Virtual Servers (Rao et. al, 2003)
• Ananth Rao, Karthik Lakshminarayanan, Sonesh
Surana, Richard Karp, and Ion Stoica “Load Balancing in Structured P2P Systems” in Second International Workshop on Peer-to-Peer Systems (IPTPS),
Berkeley, CA, USA, 2003.
11.2 Virtual Server
• Each node is responsible for several intervals
– "Virtual server"
• Example
– Chord
Chord Ring
11.2 Rules
• Rules for transferring a virtual server
– From heavy node to light node
1. The transfer of an virtual server makes the receiving node not heavy
2. The virtual server is the lightest virtual server that makes the heavy node light
3. If there is no virtual server whose transfer can make a node light, the heaviest virtual server from this
node would be transferred
11.2 Virtual Server
• Each node is responsible for several intervals
– log (n) virtual servers
• Load balancing
– Different possibilities to change servers
• One-to-one
• One-to-many
• Many-to-many
– Copy of an interval is like
removing and inserting a Chord Ring
L L L
L H L
H
11.2 Scheme 1: One-to-One
• One-to-One
– Light node picks a random ID
– Contacts the node x responsible for it – Accepts load if x is heavy
L1
L2
L3 H3
H1
D1
L5
11.2 Scheme 2: One-to-Many
• One-to-Many
– Light nodes report their load information to directories
– Heavy node H gets this information by contacting a directory – H contacts the light node which can accept the excess load
H3
H2 H1
D1
D2
L4 L1
L2 L3
L4 L5
11.2 Scheme 2: Many-to-Many
• Many-to-Many
– Many heavy and light nodes rendezvous at each step
– Directories periodically compute the transfer schedule and report it back to the nodes, which then do the actual transfer
11.2 Virtual Server
• Advantages
– Easy shifting of load
• Whole Virtual Servers are shifted
• Disadvantages
– Increased administrative and messages overhead
• Maintenance of all Finger-Tables
– Much load is shifted
[Rao 2003]
11.2 Simulation
• Scenario
– 4,096 nodes (comparison with other measurements) – 100,000 to 1,000,000 documents
• Chord
– m= 22 bits.
– Consequently, 222 = 4,194,304 nodes and documents
• Hash function
– sha-1 (mod 2m) – random
• Analysis
Power of Two Choices
+ Simple
+ Lower load
– Nodes w/o load
11.2 Results
Without load balancing
+ Simple + Original
– Bad load balancing
Virtual server
+ No nodes w/o load – Higher max. load than
Power of Two Choices
11.3 Content Distribution
• Sometimes large amounts of data have to be distributed over networks
– Software updates, video on demand, etc.
• Early approaches: Napster/Gnutella/Fasttrack
– Download whole file from one peer
– If download fails: repeat search, resume download from alternative source
• Issues
– No load distribution
– Poor performance due to asymmetric uplink/downlink bandwidth (ADSL)
– Low reliability (except for small files)
11.3 Swarming Approach
• Idea: Chunks
– Split large files into small chunks
– Identify/protect chunks via hash values
• Parallelization
– Download different chunks from different sources – Utilize upload capacity of multiple sources
0x9A3C 0x7C23 0x194F 0xDE6A
Sources:
11.3 Swarming Properties
• Advantages
– Peer failures: no loss of files, only chunks – Increased throughput
• Strategies
– Chunk selection
• Avoid scarcity
• Best overall availability?
– Fairness
• Free-Riding
• Bandwidth allocation
• Systems
– BitTorrent
– Microsoft Avalanche
11.3 BitTorrent Overview
• Bittorrent or BitTorrent
– Torrent = big stream
– Author: Bram Cohen, 2003
– Only for file distribution, no search features
• Designed for
– Content providers – Flash crowds
• Central components
– Web server for search
11.3 BitTorrent
• Definitions
– Peers – Torrent
• Contains metadata about the files
• Contains the address of a tracker
– Specification of backup trackers possible
– Swarm
• All peers sharing a torrent are called a swarm
– Tracker
• Keeps track of which peers are in a swarm
• Coordinates communication between the peers
new leecher
11.3 BitTorrent – Joining a Torrent
Peers divided into:
• seeds: have the entire file
• leechers: still downloading
data request peer list
torrent
join
1
2 3
4
seed/leecher website
tracker
1. obtain the torrent 2. contact the tracker
3. obtain a peer list (contains seeds & leechers)
● Download sub-pieces in parallel
● Verify pieces using hashes
!
11.3 BitTorrent – Exchanging Data
I have leecher A
● Advertise received pieces to the entire peer list
● Look for the rarest pieces
seed leecher B
leecher C
11.3 Torrent
• A Torrent file
– Passive component
– Files are typically fragmented into 256KB pieces – Typically hosted on a web server
• Metadata file structure
– Describes the files in the torrent
• URL of tracker
• File name
• File length
• Piece length
• SHA-1 hashes of pieces
– Allow peers to verify integrity
11.3 Tracker
• Peer cache
– IP, port, peer id
• State information
– Completed – Downloading – Clients report
status periodically to tracker
• Returns random list
– 50 random leechers/seeds
– Client first contacts 20-40 of them
11.3 Tracker
11.3 Tracker-less approaches
• Tracker issues
– Single point of failure – Scalability
• Piratebay tracker nearly overloaded (>5 Mio. Peers)
• Decentralized tracker
– Replace with DHT (Kademlia)
– Does not tackle distributed search – Currently not widely used
11.3 Chunk Selection
• Which chunk next?
1. Strict Priority
– Finish active chunks
2. Rarest First
– Improves availability of rare chunks – Delays download of common chunks
3. Random First Chunk
– Get first chunk quickly (rarest chunk probably slow to get)
4. Endgame Mode
– Send requests for last sub-chunks to all known peers
11.3 Game Theory
• Basic Ideas of Game Theory
– Studies situations where players choose different actions in an attempt to maximize their returns
– Studies the ways in which strategic interactions among rational
players produce outcomes with respect to the players’ preferences – The outcomes might not have been intended by any of them
– Game theory offers a general theory of strategic behavior – Described in mathematical form
• Plays an important role in
– Modern economics – Decision theory – Multi-agent systems
11.3 Game Theory
• Developed to explain the optimal strategy in two-person interactions.
– von Neumann and Morgenstern
• Initially: zero-sum games
– John Nash
• Works in game theory and differential geometry
– Nonzero-sum games – Nash equilibrium
• 1994 Nobel Prize in Economics
– Harsanyi, Selten
11.3 Definitions
• Games
– Situations are treated as games.
• Rules
– The rules of the game state who can do what – And when they can do it.
• Player's Strategies
– Plan for actions in each possible situation in the game
• Player's Payoffs
– Is the amount that the player wins or looses in a particular situation
• Dominant Strategy
– If players best strategy doesn’t depend on what other players do
11.3 Prisoner's Dilemma
• Famous example of game theory
• A and B are arrested by the police
– They are questioned in separate cells
• Unable to communicate with each other.
– They know how it works
• If they both resist interrogation and proclaim their mutual innocence, they will get off with a three year sentence for robbery.
• If one of them confesses to the entire string of robberies and the other does not, the confessor will be rewarded with a light, one year sentence and the other will get a severe eight year sentence.
B
Confess Not Confess
A Confess 4 years each 1 year for A and 8 years for B Not
Confess
8 years for A and
1 year for B 3 years each
11.3 Prisoner's Dilemma
11.3 A’s Decision Tree
• There are two cases to consider
If B Confesses
A
4 Years in Prison
8 Years in Prison Not Confess
Confess
If B Does Not Confess
1 Year in Prison
3 Years in Prison A
Confess Not Confess
Best
Strategy Best
Strategy
The dominant strategy for A is to confess
11.3 Repeated Games
• A repeated game
– Game that the same players play more than once
– Differ from one-shot games because people's current actions can depend on the past behavior of other
players.
– Cooperation is encouraged
• Book recommendation
– “Thinking strategically” by A.Dixit and B Nalebuff
11.3 Tit for Tat
• Tit for tat
– Highly effective strategy
– An agent using this strategy will initially cooperate
– Then respond in kind to an opponent's previous action
– If the opponent previously was cooperative, the agent is cooperative.
– If not, the agent is not.
• Dependent on four conditions
– Unless provoked, the agent will always cooperate – If provoked, the agent will retaliate
– The agent is quick to forgive
11.3 Choking
• Choking
– Temporary refusal to upload – Downloading occurs as normal – Connection is kept open
• No Setup costs
• TCP congestion control
• Choking mechanism
– Ensures that nodes cooperate – Eliminates the free-rider problem
– Cooperation involves uploaded sub-pieces that you have on your peer
• Based on game-theoretic concepts
– Tit-for-tat strategy in repeated games
11.3 Unchoking
• Periodically calculate data-receiving rates
• Upload to (unchoke) the fastest downloaders
• Optimistic Unchoking
– Each BitTorrent peer has a single “optimistic unchoke” which is uploaded regardless of the current download rate from it
leecher A
seed leecher B
leecher C leecher D
11.3 Choking Details
• BitTorrent Details
– A peer always unchokes a fixed number of its peers
• Default of 4
– Choking decision based on current download rates
• Evaluated on a rolling 20-second average
– Choking evaluation performed every 10 seconds
• Prevents wastage of resources by rapidly choking/unchoking peers
11.3 Anti-Snubbing
• Choking policy
– When over a minute has gone by without receiving a
single sub-piece from a particular peer, do not upload to it except as an optimistic unchoke
• Problem
– A peer might find itself being simultaneously choked by all its peers that it was just downloading from
– Download will lag until optimistic unchoke finds better peers
• Solution
11.3 Choking for Seeds
• Open issue: upload-only choking
– Once download is complete, a peer has no download
rates to use for comparison nor has any need to use them – The question is, which nodes to upload to?
• Policy
– Upload to those with the best upload rate.
• Advantages
– Ensures that pieces get replicated faster
– Peers that have good upload rates are probably not being served by others
11.3 BitTorrent Summary
• Optimized file transfer system
– No file search, no fancy GUI, etc.
• Very effective
– High throughput & scalability
– Nearly perfect utilization of bandwidth
– Fairness and load distribution not optimal, but good enough
• Commercially successful
– Distribution of RedHat distribution
– BBC evaluates the distribution of TV content (not in real-time)
• Centralized
11.3 Swarming Summary
• Solves the problem of efficient file distribution
– Scalable
– Handles flash crowds
• Areas for optimization
– Incentive models
– Tracker-less approaches
– Further endgame improvements
• Next step: content streaming
– Real-time constraints – Chunk order