Wolf-Tilo Balke Christoph Lofi
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Distributed Data Management
8.0 Content Distribution 8.1 Swarming
8.2 BitTorrent
8.3 Anonymous P2P 8.4 Tornado Codes
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2
8.0 Content Provisioning
• Sometimes large amounts of data have to be distributed over networks
– Software updates, video on demand, etc.
• Early approaches: Napster, Gnutella, Fasttrack, Kaazaa, …
– Use P2P network to locate a
node offering the requested content
– Download whole file from a single peer
• If download fails: repeat search, resume download from alternative source
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.1 Swarming
• Issues
– Poor performance due to asymmetric uplink/downlink bandwidth
• Most common network home network connection technology:
ADSL
– Asynchronous Digital Subscriber Line
– e.g. ADSL2+ 16.000 kb/sec download, 1024 kb/sec upload
– No load distribution
• Popular files may have extremely low download speed due to congestion of the offering node
– Low reliability (except for small files)
• Connected glitches may severely hamper download
• Frequent re-connects and resumes necesarry
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 4
8.1 Swarming
• Idea: Chunks
– Split large files into small chunks – Assign hash values to chunks
• Identification: simple and deterministic labeling of chunks
• Transfer Protection: download chunk and compute hash
– Compare computed hash with hash provided by offering peer – If comparison fails, a transfer error ⇒ occurred reload chunk
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.1 Swarming
0x9A3C 0x7C23 0x194F 0xDE6A
Original File
• Parallelization
– Locate the swarm of all peers hosting the required file (and thus the required chunks)
• Download different chunks from different sources simultaneously
– Utilize upload capacity of multiple sources
• Overall download speed may thus exceed upload capabilities of individual sources
• Usually, upload capacity is the bottleneck due to asymmetrical connections
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 6
8.1 Swarming
Sources:
Destination:
• So called swarming strategy
– Download chunks in parallel from a swarm of peers
• Swarming Advantages
– Peer failures: no loss of files, only chunks
• Discord unfinished chunks and download them anew
• No complicated resume mechanism necessary
– Increased throughput
• Download chunks in parallel from different sources
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 7
8.1 Swarming
• Swarming Issues
– Chunk selection: in which order should chunks be requested from which peer
• Avoid scarcity
• Best overall availability?
– Fairness: how can the protocol ensure fair usage of bandwidth
• Avoid free-riding: all peers should contribute to the networks
• Bandwidth allocation: single pairs should not be overwhelmed by request while others are idling
• Systems implementing swarming
– BitTorrent – Avalanche
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 8
8.1 Swarming
• BitTorrent
– Torrent = big stream
– Author: Bram Cohen, 2001
– Protocol for swarming file distribution, no search features
• Designed for
– Protocol to quickly and decentrally distribute large content
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• Implements swarming strategy for content distribution
– Especially suited for flash crowds, i.e. content which is high in demand for a short period of time
• Central components
– Web server for search (torrent site)
• “Classic” web server maintains list of available content (so called “torrents”)
• Provides search functionalities
• Content is represented as a torrent file containing required meta data
– e.g. address of the tracker
– Tracker for peer coordination
• A tracker is a centralized service maintaining the peer swarms
– i.e. which peers have which chunks of which torrents
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 10
8.2 BitTorrent
• Workflow for download
– A user uses a torrent site to obtain a torrent file
• Torrent file contains content meta data
– The user‟s node connects to the responsible tracker
• Tracker URL and content identifiers (“info hash”) are provided by torrent file
– Node registers itself with the tracker and the corresponding torrent
• i.e. joins the swarm
• Node obtains a list of all pairs offering the torrent
– Contact some swarm peers
• Obtain a list of available chunks
– Download chunks from peers
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• Each offered file is split in chunks between 32KB and 4MB each
– Each chunk is identified by an 160-Bit SHA-Hash
• With respect to a certain torrent, each pair cam full fill one of the following roles
– Seeders
• Have all chunks of the torrent and are actively seeding (uploading) those chunks to the swarm
– Leechers
• Do not have all chunks of a torrent
• Download missing chunks from other leechers or seeders
• Upload chunks to other leecher
– There is no download-only role
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 12
8.2 BitTorrent
• Torrent File metadata structure
– Describes the files in the torrent
• URL of tracker
• File name
• File length
• Piece length
• SHA-1 hashes of pieces
– Allow peers to verify integrity
• Creation date
– An info hash is created from some fields of the torrent file
• This hash uniquely identifies a torrent
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 13
8.2 BitTorrent
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 14
8.2 BitTorrent
Fields being hashed in InfoHash
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
torrent file
torrent site tracker new leecher peer other swarm peers
register swarm peer list
request info chunk information
request chunk send chunk
• :-)
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 16
8.2 BitTorrent
• Which chunk next?
– Priority Actives
• Finish active chunks
– Rarest First
• Improves availability of rare chunks
• Delays download of common chunks
– Random First Chunk
• Get first chunk quickly (rarest chunk probably slow to get)
– Endgame Mode
• Send requests for last chunks to all known peers
• End of download not stalled by slow peers
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• Basic Ideas of Game Theory
– Game theory offers a general theory of strategic behavior
• Described in mathematical form
– Situations in which players may choose different actions to maximize their returns
– Situations in which strategic interactions among rational players produce outcomes with respect to the players‟
preferences
– The outcomes might not have been intended by any of them
• Plays an important role in
– Modern economics – Decision theory
– Multi-agent systems
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 18
Game Theory
• Early game theory tries to explain the optimal strategy in two-person interactions.
• von Neumann and Morgenstern, 1944
– Initially: zero-sum games
– Expected utility hypothesis
• Players will rationally decide for the option with the highest expected outcome
Game Theory
• John Nash
– Works in game theory and differential geometry
• Non-zero-sum games
• Nash equilibrium 1950
– Strategic equilibrium in which no player gains any advantage when changing strategies (while knowing the opponents startegy)
– 1994 Nobel Prize in Economics
• Harsanyi and Selten
– Incomplete information
– Also 1994 Nobel Prize in Economics
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 20
Game Theory
• Games
– Situations are treated as games.
• Rules
– The rules of the game state which actions and decisions are possible
• Player's Strategies
– Plan for actions in each possible situation in the game
• Player's Payoffs
– Is a players expected gain or loss when winning or loosing in a particular situation
• Dominant Strategy
– If players best strategy doesn‟t depend on what other players do
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 21
Game Theory
• Famous example: Prisoners Dilemma
• A and B are arrested by the police during a robbery
– They are interrogated in separate cells
• Unable to communicate with each other
– Following conditions are known
• If they both resist interrogation and proclaim their mutual
innocence, they both will get off with a three year sentence for robbery
• If one of them confesses to the entire string of robberies and the other does not, the confessor will be rewarded with a light, one year sentence and while the other will get a severe eight year sentence
• If they both confess, then the judge will sentence both to a moderate four years in prison
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 22
Game Theory
• Prisoner Dilemma
– Possible outcomes
Game Theory
B - Confession B - No Confession A - Confession 4 years each 1 year for A
8 years for B A - No Confession 8 years for A
1 year for B 3 years each
• Decision Tree of A
• The dominant strategy for A is to confess
– No matter what B does, confessing is better choice
• Nash equilibrium: both A and B will confess
– Also, dominant strategy of B is to confess
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 24
Game Theory
B Confesses
4 Years in Prison
8 Years in Prison A: Does Not Confess A: Confess
1 Year in Prison
3 Years in Prison B does not confesses
A: Confess A: Does Not Confess
Best strategy Best strategy
A
• A repeated game
– Game that the same players play more than once
– Differ from one-shot games because people's current actions can depend on the past behavior of other
players.
– Cooperation is encouraged
• Book recommendation
– “Thinking strategically” by A.Dixit and B Nalebuff
• German translation: “Spieltheorie für Einsteiger”
Game Theory
• We can employ game theory for designing a swarming protocol
– Swarming “Game” Decisions
• While seeding, who should get the chunks?
• While leeching, from whom to download chunks?
– Goal
• Available resources should be optimally used
• Free-riding should be prevented
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 26
8.2 BitTorrent
• Possible strategy in repeated games: Tit for Tat
– An player using this strategy will initially cooperate
• Player will adapt to opponent
– If the opponent previously was cooperative, the agent is cooperative.
– If not, the agent is not.
• Depends on four conditions
– Unless provoked, the agent will always cooperate – If provoked, the agent will retaliate
– The agent is quick to forgive
– The agent must have a good chance of competing against the opponent more than once
• Get-to-know each other
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• BitTorrent uses a so-called choking mechanism for distributing chunks
• Base idea
– Prefer uploading chunks to peers
which also offered chunks for download
• i.e. aim for bi-directional communication channels
– Bi-directional communication will benefit the whole swarm most
• Tit-for-tat
– Punish peers which seem to be free-riding
• i.e. who only download but provide no upload
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 28
8.2 BitTorrent
• Choking as leecher
– Open a bi-directional transfer to another leecher
• Mutually exchange missing chunks
– If a peer does not upload any chunks for more than a minute, choke it
• Temporary refuse to upload
• Downloading continues as usual
• TCP Connection is kept open
– No Setup costs
– TCP congestion control
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 29
8.2 BitTorrent
• Using only this choking mechanism may endanger the health of the swarm
– New leechers will automatically be choked because they cannot offer upload
– Two nodes which would have a good connection won‟t use it as no node takes initiative
– A node may be choked by all other nodes due to unlucky circumstances / network weaknesses
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 30
8.2 BitTorrent
• Solution: Optimistic Unchoking
– Randomly initiate a new connection to a currently
unconnected leecher in the swarm and start uploading
• “Take initiative”
• Hope for a good cooperation
– e.g. that this new node provides a high upload rate in the bi-directional transfer
• Allows finding better peers
• Allows new peers to integrate themselves in the swarm
– Other peers voluntarily start uploading to them
– Randomly unchoke some currently choked connections
• “Quick to forgive”
• Helps locked-out peers to return to the swarm
– If a leecher is currently choked by all its peers, it initiates even more unchoking connection
• “Anti-Snubbing”
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• Open issue: upload-only choking
– Once a download is complete, no bi-directional transfer connections are required anymore by that peer
• Peer becomes a seeder
• Which nodes to upload to?
• Seeding Policy
– Upload to those peers with
the best upload / download ratio – Probable Advantages
• Ensures that chunks are replicated faster within the swarm
• Leechers that have a good upload rates are probably not being served by others
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 32
8.2 BitTorrent
• Download chunks in parallel
– Look for the rarest pieces
– Verify each chunk by checking hash, download again if hash fails
• Advertise received pieces to all connected peers
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
! I have
leecher A
seed leecher B
leecher C
• Periodically calculate data-receiving rates
• Upload to (unchoke) the fastest downloaders
• Optimistic unchoking
– Periodically select a peer at random and upload to it – Continuously look for the fastest partners
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 34
8.2 BitTorrent
leecher A
seed
leecher B
leecher C leecher D
• Remember: Bit Torrent uses centralized trackers for managing peer lists
• Tracker Issues
– Single Point of Failure and Attack – Scalability
• Piratebay tracker nearly overloaded (>5 Mio. Peers)
• Solution: Decentralized Tracker
– Replace with DHT (Kademlia)
– Does not tackle distributed search
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• Kademlia DHT Tracker
– Each torrent is identified by its infohash
– All BitTorrent nodes using an compatible client may join the DHT tracker
• Not part of the core BitTorrent protocol
• Authors of client usually also provide bootstrapping nodes to the DHT tracker
• The DHT takes over the trackers responsibility
– DHT Key-Value pairs
• Key: info hash
• Value: swarm peer listing
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 36
8.2 BitTorrent
• Kademlia (as a generic DHT protocol)
– Kademlia also uses a SHA-1 160-Bit addressed ring hashing data and nodes
• Similar to pastry, but uses a more sophisticated routing mechanism requiring less maintenance
– Each key-value pair is stored redundantly on multiple nodes
• Usually k nodes neighboring the node which is responsible for the range of the infohash
• Nodes storing a certain key frequently synchronize their data with the other responsible peers
• Peer arrivals and departures can be tolerated without data loss
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• BitTorrent DHT: a new node joins
– Obtain the torrent infohash
• e.g. from torrent file or from a magnet link
– Contact a bootstrap node and join the DHT tracker
• Take over responsibility for a certain range of torrents
– i.e. host some (redundant) peer listings for some torrent swarms
– New node announces itself
• i.e. contacts some nodes hosting the peer lists for the required torrent
• The new node is added to the respective peer listing
• The node obtains the full peer list
– No central authority / tracker is required
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 38
8.2 BitTorrent
• Peer Exchange (PEX) and Multi-Tracker
– To further increase the performance and resilience of a torrent, multiple trackers can be used
• Easiest case
– Torrent is registered with multiple trackers which are all explicitly specified in the torrent file
• More complex solution: Peer Exchange
– Start connecting to any one tracker
– Ask other connected peers in the swarm for additional peers and / or trackers
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.2 BitTorrent
• PEX Example
– Obtain a torrent and connect to tracker
• i.e. official Ubuntu torrent providing two trackers
• Obtain peer lists
– Additionally, connect to DHT tracker
• Obtain even more peers
– Peers using the same torrent but are trackerless or use DHT and another tracker
– Start peer exchange
• Again, obtain more peers
– i.e. receive peers not using DHT from a DHT node which is using, e.g. the openbittorrent-tracker
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 40
8.2 BitTorrent
Client Example 𝛍Torrent
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 42
Client Example 𝛍Torrent
Client Example 𝛍Torrent
• Peer cache
– IP, port, peer id
• State information
– Completed – Downloading – Clients report
status periodically to tracker
• Returns random list
– 50 random leechers/seeds
– Client first contacts 20-40 of them and more if some do not respond
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 44
Tracker
• Info Hashes
Tracker Info Hashes
• Recently, magnet links have become quite popular
– Magnet links define an URI scheme for any content located in a P2P network
• May encode all data necessary to identify or find content, e.g.
– Protocol, Name, Size, Protocol-Specific metadata, etc
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 46
Magnet Links
• Example magnet link for BitTorrent
– magnet:?xt=urn:btih:3e16157f0879eb43e9e51f45 d485feff90a77283
&dn=Ubuntu+10.04+LTS+x32
&tr=http%3A%2F%2Ftracker.openbittorrent.com
%2Fannounce
– Other protocols might include search keywords, web sources, bootstrap nodes, etc
Magnet Links
eXact Topic BitTorrent InfoHash InfoHash
Display Name
TRacker URL
• BitTorrent is provides absolutely no anonymity
– Peer list of a torrent contain all participating nodes
• IP address, port, sometimes upload/download ratio, etc.
– Thus, it is very easy to identify all people downloading / uploading to a torrent
• User behavior can be tracked quite easily by simply introducing a spy node into the swarm
– Privacy implication
• Also, no “pure” downloading is possible BitTorrent as download-only nodes will be choked
– Possibly legal implication for copyrighted content
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 48
8.3 Anonymous P2P
• Real anonymity is hard to archive
– Nodes need to communicate and contact each other
• Identities (addresses must be known)
• However, the number of nodes knowing a nodes identity can be limited to only trusted nodes
– So-called dark nets
• Base idea
– Only connect to a few trusted “friend” nodes – Never communicate directly with a non-friend
– Friends forward any message anonymously to their friends
• If network is designed correctly, most parts should be reachable via friend-of-a-friend routing
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.3 Anonymous P2P
• Two notable systems
– Freenet: Pure P2P network using small world- properties and anonymous routing
– OneSwarm: BitTorrent extension based on friend-to-friend-routing
• Friend-To-Friend Routing
– e.g. B passes a message from A to C
• C does not know that message originated from A
• A does not know that B passes message to C
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 50
8.3 Anonymous P2P
A
B
C
requester
provider
– If the request‟s time-to-live expires or a node does not have neighbors to send the file to, a backtracking
„request failed‟ message is sent
– If the request is successful, the file is sent back via the routing nodes and each node saves the file and adds the sending node’s address to its local routing table
• i.e., frequently requested files are replicated
– If the routing table is full, the least recently used (LRU) entry is evicted
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.3 Anonymous P2P
– If the request‟s time-to-live expires or a node does not have neighbors to send the file to, a backtracking
„request failed‟ message is sent
– If the request is successful, the file is sent back via the routing nodes and each node saves the file and adds the sending node’s address to its local routing table
• i.e., frequently requested files are replicated
– If the routing table is full, the least recently used (LRU) entry is evicted
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 52
8.3 Anonymous P2P
• Example of Freenet Routing
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.3 Anonymous P2P
A
B
C
D
E
F
key = 9
B’s routing table Key Pointer
6 C
15 D
D’s routing table Key Pointer
9 F
1 E
? key=9
9?
Sorry!
9? 9?
9?
C’s routing table empty
9 9 9
9 F
• When transferring content, various failures may occur
– Transmission failures (e.g., packet loss) – Noisy channels
– Storage failures (e.g. hardware breakdown, churn)
• Error correcting codes can help battling failures
• Basic idea:
– Encode information of length 𝑛 in (𝑛 + 𝑘) symbols
• 𝑘-symbol redundancy!
– The information can be recovered from any 𝑛 of the (𝑛 + 𝑘) symbols
• Examples
– Check sums detect and correct errors in noisy channels – RAID-5 storage systems (Parity bits)
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 54
8.4 Erasure Codes
• Thought Experiment: Err-mail
– Err-mail works just like e-mail, except
• About half of all the mail gets lost
• Messages longer than 5 characters are illegal
• Sending a message is very expensive (similar to air-mail)
• “Alice wants to send her telephone number (555629) to Bob”
• Naïve approach
– Split into two packets (555, 629) and send separately – Chances are, one of them gets lost
– Even repetitive sending doesn‟t help much, Bob will receive probably redundant packets
– Acknowledge messages by Bob are an option, but expensive
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Erasure Codes
• Alice devises the following scheme
– She breaks her telephone number up into two parts
• 𝑎 = 555, 𝑏 = 629
• Sends 2 messages – "𝐴 = 555" and "𝐵 = 629" – to Bob.
– She constructs a linear function, 𝑓 𝑛 = 𝑎 + 𝑏 − 𝑎 𝑛 − 1
• in this case 𝑓(𝑛) = 555 + 74(𝑛 − 1)
– She computes the values 𝑓(3), 𝑓(4), and 𝑓(5), and then transmits three redundant messages
• "𝐶 = 703", "𝐷 = 777" and "𝐸 = 851".
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 56
8.4 Erasure Codes
• Bob knows that the form of 𝑓(𝑛) is 𝑓(𝑛) = 𝑎 + (𝑏 −
𝑎)(𝑛 − 1), where a and b are the two parts of the telephone number
• Now suppose Bob receives "𝐷 = 777" and "𝐸 = 851"
• Bob can reconstruct Alice's phone number by computing the values of a and b from the values (𝑓(4) and 𝑓(5))
• Bob can perform this procedure using any two err-mails, so the erasure code in this example has a rate of 40%
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 57
8.4 Erasure Codes
• Tornado Codes are an important class of erasure codes for practical applications
• Characteristics
– Easy coding/decoding:
linear codes with explicit construction
– Fast coding/decoding: each check bit depends on only a few message bits
• M. Luby, M. Mitzenmacher, M. A. Shokrollahi, D. A. Spielman, V.
Stemann: Practical Loss-Resilient Codes. ACM Symposium on the Theory of Computing, 1997
• J. W. Byers, M. Luby, M. Mitzenmacher: Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed Up Downloads.
INFOCOM 1999
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 58
8.4 Erasure Codes
• Scenario
– Application sends a real-time data stream of symbols – Network experiences unpredictable losses of at most a
fraction of 𝑝 symbols
– We know the positions of the lost bits (packet indexes)
• Insurance policy
– Let 𝑛 be the block length
– Instead of sending 𝑛 symbols, place (1 − 𝑝)𝑛 symbols in each block
– Fill block to length 𝑛 with 𝑝𝑛 redundant symbols
• Scheme provides optimal loss protection if message symbols can be recovered from any set of (1 −
𝑝)𝑛 symbols in the block
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Erasure Codes
• Forward Error Correction
– Interleave message bits and check bits in a stream
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 60
8.4 Erasure Codes
n
(1-p)n
pn
• Properties of a good code
– There should be “few” check bits – Linear time encoding
• Average degree on the left should be a small constant
– Easy error detection/decoding
• Each set of message bits should influence many check bits
• Existence of unshared neighbors
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Erasure Codes
• Tornado code model: Bipartite Graph
• Each message bit is used in only a few check bits
– Low degree bipartite graph
– Check bits are computed as orthogonal combination of message bits (usually XOR)
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 62
8.4 Tornado Codes
Message bits Check bits
c6 = m3 m7
• Properties
– Expansion: every small subset (𝑘 ≤ 𝑛) on left has many (≥ 𝑘) neighbors on right
– Low degree – not technically part of the definition, but typically assumed
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Tornado Codes
k bits
(𝑘 ≤ 𝛼𝑛) 𝑏𝑘 𝑏𝑖𝑡𝑠
• Important parameters: 𝑠𝑖𝑧𝑒(𝑛), 𝑑𝑒𝑔𝑟𝑒𝑒(𝑑), 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛(𝑏)
• Randomized constructions
– A random 𝑑-regular graph is an expander with a high probability
– Construct by choosing 𝑑 random perfect matchings
• Perfect matching: all nodes on the left side get exactly one edge to a node on the right side
• Repeat d times: every node on the left side has d edges to the right side
– Time consuming and cannot be stored compactly
• Explicit constructions
– Cayley graphs, Ramanujan graphs etc
– Typical technique – start with a small expander, apply operations to increase its size
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 64
8.4 Tornado Codes
• Will use d-regular bipartite graphs with (1 −
𝑝)𝑛 nodes on the left and 𝑝𝑛 on the right (e.g., 𝑝 = 0.5)
• Will need 𝑏 > 𝑑/2 expansion
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Erasure Codes
m1 m2 m3
m(1-p)n
c1
cpn
degree = 2d degree = d
• Encoding
– Why is it linear time?
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 66
8.4 Tornado Codes
Computes the sum modulo 2 of its neighbors
m1 m2 m3
c1
cpn m(1-p)n
• Decoding
– Assume that all the check bits are intact
– Find a check bit such that only one of its neighbors is erased (an unshared neighbor)
– Fix the erased code, and repeat
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Tornado Codes
m1
m2 c1
cpn m(1-p)n
• Decoding
– Need to ensure that we can always find a check bit – “Unshared neighbors” property
• Consider the set of corrupted message bit and their neighbors.
• Suppose this set is small at least one message bit has an unshared neighbor.
– Can we always find unshared neighbors?
• Theorem: Expander graphs give us this property if 𝑏 > 𝑑/2
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 68
8.4 Tornado Codes
m1
m2 c1
cpn unshared
neighbor
m(1-p)n
• Cascading
– Use another bipartite graph to construct another level of check bits for the check bits
– Final level is encoded by some other code, e.g., Reed-Solomon
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig
8.4 Tornado Codes
𝑘 𝑝𝑘
𝑝2𝑘
• Swarming & BitTorrent
– Segment a file into multiple chunks
– Download chunks from multiple peers in parallel
• Seeder and leecher peers form a swarm
• Increased throughput
• Faster dissemination of new content
– i.e. for countering flash crowds
– Main question: which chunks should be downloaded / uploaded to best benefit the whole swarm?
• Prevent free-riding
• Discourage parasitic behavior
• Reward cooperation
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 70
Content Provisioning
• Solution: use concepts from game theory
– Tit-for-tat strategy
• Encourages strong bi-directional links among leecher
– Choking
• If a node in a bidirectional pipe is not cooperative (provides upload bandwidth), choke it by refusing further uploads to that peer
– Optimistic Unchoke
• Randomly unchoke some choked connections
• Take initiative and voluntarily upload to an unconnected node
– May discover better and more reliable partners
Content Provisioning
• Erasure Codes
– Help securing message transfers and data storage – Base idea
• Split payload into chunks
• Interleave payload chunks with redundant chunks which can be used to reconstruct the message in case of failures
– Tornado Code
• Popular erasure code implementation
• Uses linear functions to encode and decode message
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 72
Content Provisioning
• Load Balancing & Data Durability
– Data caching, replication, etc.
• P2P and Databases
– Building database link systems on top of P2P
• Toward cloud storage
Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig