Profr. Dr. Wolf-Tilo Balke
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Distributed Data Management
Network Models (2 nd Part)
• 7.4 Scale-Free Networks
• 7.5 Comparing Graphs
• 7.6 Models in P2P Content Distribution
• 8.1 Swarming
• 8.2 BitTorrent
• 8.3 Anonymous P2P
Network Models and Content Provisioning
• In 1999, Albert-László Barabási (Univ. of Notre Dame) crawled
parts of the WWW to investigate its actual structure
– The node degree is power-law distributed
• i.e., the probability that a node in the network is connects to k other nodes is 𝑃 𝑘 ~ 𝑘 − 𝛾
– (usually with 2 < 𝛾 ≤ 3)
– Most nodes have a small degree of around 1 to 2 – Few nodes have an extremely high node degree – High-degree vertices are called ‘hubs‘
• Albert-László Barabási. “Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life”. Plume.
2003. ISBN 978-0452284395
7.4 Scale-Free Networks
• Definition: Graphs with a power-law node degree distribution form ‘scale-free’ networks
– Also called power-law networks
• What kind of network model can generate this more realistic degree distribution?
– Barabási–Albert model builds a certain subset of scale-free networks
• Albert-László Barabási & Réka Albert."Emergence of scaling in random networks". Science, 1999 doi:10.1126/science.286.5439.509.
7.4 Scale-Free Networks
• Barabási–Albert model: Basic Idea
– In its simplest form denoted as 𝒈_𝒃𝒂
𝒏,𝒎• 𝑛 is the number of nodes in the graph
• 𝑚 is the number of edges added per time step
– The total number of edges is thus 𝑛 ∗ 𝑚
– Start with any initial graph of size 𝑛
0• 𝑛
0≥ 2 and degree of any node deg(𝑣) ≥ 1
• Often, just 𝑚 connected nodes are used as default initial network
– If initial network is not connected, the result network cannot be guaranteed to be connected
– Barabási–Albert graph is constructed iteratively by adding new nodes one by one until target size 𝑛 is reached
• Represents one time step in a simulated network growth
– i.e. Discrete Time Modeling
• Add nodes until target size 𝑛 is reached
• Each new node is connected to 𝒎 existing nodes
7.4 Barabási–Albert Graphs
– New edges are not added randomly, but favor higher-degree nodes
• “The rich get richer“
• Preferential attachment to higher-degree nodes
– The higher the degree of a possible target node, the higher the probability that the new node will attach to it
– Preferential attachment defines the probability
∏(𝒗) for vertex 𝑣 to get an edge to a new node
• In general, is proportional to the node degree, i.e.
∏ 𝒗 ~ 𝐝𝐞𝐠(𝒗)
• Most common definition is
∏ 𝑣 =
deg 𝑣7.4 Barabási–Albert Graphs
• Example: 𝒈_𝒃𝒂 𝟓,𝟏
7.4 Barabási–Albert Graphs
𝒕 = 𝟎 𝒕 = 𝟏 − 𝜺
•
Initial graph
•Add new node 𝑣
3•
Probability for connecting any old
node 𝑣 to 𝑣
3is given by ∏ 𝑣 =
deg 𝑣𝑤∈𝑉deg 𝑤
•
e.g., connect to 𝑣
1•
Random decision steered by preferential attachment
𝑣1 𝑣2
𝑣1 𝑣2
𝑣
∏(𝑣2) =1
∏(𝑣1) = 1 2 2
𝑣1 𝑣2
𝑣 𝒕 = 𝟏
• Example: 𝒈_𝒃𝒂 𝟓,𝟏
7.4 Barabási–Albert Graphs
𝒕 = 𝟐 − 𝜺
•
Add new node 𝑣
4•
Evaluate preferential attachment
•
e.g. connect to 𝑣
1∏(𝑣3) =1 4
𝑣1 𝑣2
∏(𝑣2) = 1
∏(𝒗𝟏) =𝟏 4 𝟐
𝑣4
𝒕 = 𝟑 − 𝜺
•
Add new node 𝑣
5•
Evaluate preferential attachment
•
e.g. connect to 𝑣
1∏(𝑣 ) = 1
𝑣1 𝑣2
∏(𝑣2) = 1 6
𝑣4
𝑣5
∏(𝒗𝟏) = 𝟏
∏(𝑣4) = 1 𝟐 6
• Comparing Barabási–Albert Graphs
– 𝑛 = 50, ~50 edges
– coloring by node degree
7.4 Barabási–Albert Graphs
• Comparing Barabási–Albert Graphs
– 𝑛 = 100, ~100 edges
7.4 Barabási–Albert Graphs
• Comparing Barabási–Albert Graphs
– 𝑛 = 100, ~150 edges
7.4 Barabási–Albert Graphs
• Histogram of node coefficients
– Single sample – 100 nodes – 300 edges
• Random
– Generally lower degree
• Small World
– Homogeneous degree
• Scale-Free
– Power-law
7.4 Barabási–Albert Graphs
Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random
Number of Nodes 10203040506070
Dampening factor for decreasing strength of preferential attachment
• Node degree for larger Barabási–Albert graphs
– 200k nodes – 400k edges – Logarithmic
Scale
7.4 Barabási–Albert Graphs
relative frequency
• Histogram of cluster coefficients (𝐶 )
– Same sample
• Random
– Low 𝐶
• Small World
– Homogeneous high 𝐶
• Scale-Free
– Also power-law – Lower than SW
7.4 Barabási–Albert Graphs
Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random
Number of Nodes 10203040506070
• Important property of scale-free networks is robustness against random failures
– Removing a random vertex 𝑣 will likely hit a low-degree node
• Expected damage to network is small
– A failing high-degree node can severely damage a network
• Better fail-safety necessary for high-degree node to ensure overall robustness
• Thus, scale-free networks are very sensitive against attacks
– If a malevolent attacks explicitly target the highest degree nodes, the network can easily decompose
• Note: random graphs are not resilient against random failures, but also not particularly prone to attacks
– Most vertices more or less have the same degree
7.4 Scale-Free Networks
• Random Graph: 50 nodes, 50 edges
– Color by degree
7.5 Comparing Graphs
Property Value
Connected No
Diameter (conn.) 9
Avg. Path Length 4.39
#Clusters 6
Largest Cluster 39
k-connectedness
0
Avg. Cluster Coeff. 0.033
Avg. Degree 2
• Watts-Strogatz Graph: 50 nodes, 50 edges
7.5 Comparing Graphs
Property Value
Connected No
Diameter (conn.) 35
Avg. Path Length 12.73
#Clusters 2
Largest Cluster 38
k-connectedness
0
Avg. Cluster Coeff. 0
Avg. Degree 2
𝑝 = 0.05
• Barabási-Albert Graph: 50 nodes, 49 edges
7.5 Comparing Graphs
Property Value
Connected Yes
Diameter 12
Avg. Path Length 5.14
k-connectedness
1
Avg. Cluster Coeff. 0
Avg. Degree 1.96
𝑝𝑎 = 0.8
• Random Graph: 50 nodes, 100 edges
7.5 Comparing Graphs
Property Value
Connected No
Diameter (conn.) 6
Avg. Path Length 2.88
#Clusters 2
Largest Cluster 49
k-connectedness
0
Avg. Cluster Coeff. 0.058
Avg. Degree 4
• Watts-Strogatz Graph: 50 nodes, 100 edges
7.5 Comparing Graphs
Property Value
Connected Yes
Diameter (conn.) 10
Avg. Path Length 4.6
k-connectedness 2
Avg. Cluster Coeff.
0.43Avg. Degree 4
𝑝 = 0.05
• Barabási-Albert: 50 nodes, 98 edges
7.5 Comparing Graphs
Property Value
Connected Yes
Diameter 4
Avg. Path Length
2.55k-connectedness
2
Avg. Cluster Coeff.
0.23Avg. Degree 3.92
𝑝𝑎 = 0.8
• What do real Peer-To-Peer Networks look like?
• Depends on the used protocols
– Some P2P networks like e.g. Freenet evolve
voluntarily in a small-world with a high clustering coefficient and a small diameter
– Analogously, some protocols, e.g., Gnutella, will
implicitly generate a scale-free degree distribution
• Implied by boot-strapping and Ping-Pong
7.6 Models in P2P
• What should Peer-to-Peer networks look like?
• It depends…
• If it should be navigable in a decentralized fashion,
– Make it a small-world and implement Kleinberg‘s routing algorithm (or a variant, e.g., Symphony)
• If the peer-to-peer network could be under attack
– also make it a small-world, where most vertices have the same (low) degree
• If it is peer-to-peer network in a small and secure context, e.g. an intranet in a company,
– Make it a scale-free network.
• This allows to buy only a small number of servers with a high bandwidth.
These will work as 'hubs' of the network
7.6 Models in P2P
• Sometimes large amounts of data have to be distributed over networks
– Software updates, video on demand, etc.
• Early approaches: Napster, Gnutella, Fasttrack, Kaazaa, …
– Use P2P network to locate a
node offering the requested content
– Download whole file from a single peer
• If download fails: repeat search, resume download from
8.1 Swarming
• Issues
– Poor performance due to asymmetric uplink/downlink bandwidth
• Most common network home network connection technology:
ADSL
– Asynchronous Digital Subscriber Line
– e.g. ADSL2+ 16.000 kb/sec download, 1024 kb/sec upload
– No load distribution
• Popular files may have extremely low download speed due to congestion of the offering node
– Low reliability (except for small files)
• Connected glitches may severely hamper download
• Frequent re-connects and resumes necessary
8.1 Swarming
• Idea: Chunks
– Split large files into small chunks – Assign hash values to chunks
• Identification: simple and deterministic labeling of chunks
• Transfer Protection: download chunk and compute hash
– Compare computed hash with hash provided by offering peer – If comparison fails, a transfer error ⇒ occurred reload chunk
8.1 Swarming
Original File
• Parallelization
– Locate the swarm of all peers hosting the required file (and thus the required chunks)
• Download different chunks from different sources simultaneously
– Utilize upload capacity of multiple sources
• Overall download speed may thus exceed upload capabilities of individual sources
• Usually, upload capacity is the bottleneck due to asymmetrical connections
8.1 Swarming
Sources:
Destination:
• So called swarming strategy
– Download chunks in parallel from a swarm of peers
• Swarming Advantages
– Peer failures: no loss of files, only chunks
• Discard unfinished chunks and download them new
• No complicated resume mechanism necessary
– Increased throughput
• Download chunks in parallel from different sources
8.1 Swarming
• Swarming Issues
– Chunk selection: in which order should chunks be requested from which peer
• Avoid scarcity
• Best overall availability?
– Fairness: how can the protocol ensure fair usage of bandwidth
• Avoid free-riding: all peers should contribute to the networks
• Bandwidth allocation: single pairs should not be overwhelmed by request while others are idling
• Systems implementing swarming
– BitTorrent – Avalanche
8.1 Swarming
• BitTorrent
– Torrent = big stream
– Author: Bram Cohen, 2001
– Protocol for swarming file distribution, no search features
• Designed for
– Protocol to quickly and decentrally distribute large content
8.2 BitTorrent
• Implements swarming strategy for content distribution
– Especially suited for flash crowds, i.e. content which is high in demand for a short period of time
• Central components
– Web server for search (torrent site)
• “Classic” web server maintains list of available content (so called “torrents”)
• Provides search functionalities
• Content is represented as a torrent file containing required meta data
– e.g. address of the tracker
– Tracker for peer coordination
• A tracker is a centralized service maintaining the peer swarms
– i.e. which peers have which chunks of which torrents
8.2 BitTorrent
• Workflow for download
– A user uses a torrent site to obtain a torrent file
• Torrent file contains content meta data
– The user’s node connects to the responsible tracker
• Tracker URL and content identifiers (“info hash”) are provided by torrent file
– Node registers itself with the tracker and the corresponding torrent
• i.e. joins the swarm
• Node obtains a list of all peers offering the torrent
– Contact some swarm peers
• Obtain a list of available chunks
8.2 BitTorrent
• Each offered file is split in chunks between 32KB and 4MB each
– Each chunk is identified by an 160-Bit SHA-Hash
• With respect to a certain torrent, each peer can fulfill one of the following roles
– Seeders
• Have all chunks of the torrent and are actively seeding (uploading) those chunks to the swarm
– Leechers
• Do not have all chunks of a torrent
• Download missing chunks from other leechers or seeders
• Upload chunks to other leecher
– There is no download-only role
8.2 BitTorrent
• Torrent File metadata structure
– Describes the files in the torrent
• URL of tracker
• File name
• File length
• Piece length
• SHA-1 hashes of pieces
– Allow peers to verify integrity
• Creation date
– An info hash is created from some fields of the torrent file
8.2 BitTorrent
8.2 BitTorrent
Fields being hashed in InfoHash
8.2 BitTorrent
torrent file
torrent site tracker new leecher peer other swarm peers
register swarm peer list
request info chunk information
request chunk send chunk
• :-)
8.2 BitTorrent
• Which chunk next?
– Priority Actives
• Finish active chunks
– Rarest First
• Improves availability of rare chunks
• Delays download of common chunks
– Random First Piece
• Get first chunk quickly (rarest chunk probably slow to get)
– Endgame Mode
• Send requests for last chunks to all known peers
• End of download not stalled by slow peers
8.2 BitTorrent: piece selection
• Basic Ideas of Game Theory
– Game theory offers a general theory of strategic behavior
• Described in mathematical form
– Situations in which players may choose different actions to maximize their returns
– Situations in which strategic interactions among rational players produce outcomes with respect to the players’
preferences
– The outcomes might not have been intended by any of them
• Plays an important role in
– Modern economics – Decision theory
– Multi-agent systems
Game Theory
• Early game theory tries to explain the optimal strategy in two-person interactions.
• von Neumann and Morgenstern, 1944
– Initially: zero-sum games
– Expected utility hypothesis
• Players will rationally decide for the option with the highest expected outcome
Game Theory
• John Nash
– Worked in game theory and differential geometry
• Non-zero-sum games
• Nash equilibrium 1950
– Strategic equilibrium in which no player gains any advantage when changing strategies (while knowing the opponents strategy)
– 1994 Nobel Prize in Economics
• Harsanyi and Selten
– Incomplete information
– Also 1994 Nobel Prize in Economics
Game Theory
• Games
– Situations are treated as games.
• Rules
– The rules of the game state which actions and decisions are possible
• Player's Strategies
– Plan for actions in each possible situation in the game
• Player's Payoffs
– A player’s expected gain or loss when winning or loosing in a particular situation
• Dominant Strategy
– If players best strategy doesn’t depend on what other players do
Game Theory
• Famous example: Prisoners Dilemma
• A and B are arrested by the police during a robbery
– They are interrogated in separate cells
• Unable to communicate with each other
– Following conditions are known
• If they both resist interrogation and proclaim their mutual
innocence, they both will get off with a three year sentence for robbery
• If one of them confesses to the entire string of robberies and the other does not, the confessor will be rewarded with a light, one year sentence and while the other will get a severe eight year sentence
• If they both confess, then the judge will sentence both to a moderate four years in prison
Game Theory
• Prisoner Dilemma
– Possible outcomes
Game Theory
B - Confession B - No Confession A - Confession 4 years each 1 year for A
8 years for B A - No Confession 8 years for A
1 year for B 3 years each
• Decision Tree of A
• The dominant strategy for A is to confess
– No matter what B does, confessing is better choice
• Nash equilibrium: both A and B will confess
– Also, dominant strategy of B is to confess
Game Theory
B Confesses
4 Years in Prison
8 Years in Prison A: Does Not Confess A: Confess
1 Year in Prison
3 Years in Prison B does not confess
A: Confess A: Does Not Confess
Best strategy Best strategy
A
• A repeated game
– Game that the same players play more than once
– Differ from one-shot games because people's current actions can depend on the past behavior of other
players.
– Cooperation is encouraged
• Book recommendation
– “Thinking strategically” by A.Dixit and B Nalebuff
• German translation: “Spieltheorie für Einsteiger”
Game Theory
Game theory for designing a swarming protocol
• Swarming “Game” Decisions
– While seeding, who should get the chunks?
– While leeching, from whom to download chunks?
• Goal
– Available resources should be optimally used – Free-riding should be prevented
8.2 BitTorrent
• Possible strategy in repeated games: Tit for Tat
– A player using this strategy will initially cooperate
• Player will adapt to opponent
– If the opponent previously was cooperative, the agent is cooperative.
– If not, the agent is not.
• Depends on four conditions
– Unless provoked, the agent will always cooperate – If provoked, the agent will retaliate
– The agent is quick to forgive
– The agent must have a good chance of competing against the opponent more than once
8.2 BitTorrent
BitTorrent uses a so-called choking mechanism for distributing chunks
• Basic idea
– Prefer uploading chunks to peers
which also offered chunks for download
• i.e. aim for bi-directional communication channels
– Bi-directional communication will benefit the whole swarm most
• Tit-for-tat
– Punish peers which seem to be free-riding
• i.e. who only download but provide no upload
8.2 BitTorrent
• Choking as leecher
– Open a bi-directional transfer to another leecher
• Mutually exchange missing chunks
– If a peer does not upload any chunks for more than a minute, choke it:
• Temporary refuse to upload
• Downloading continues as usual
• TCP Connection is kept open
– No Setup costs
– TCP congestion control
8.2 BitTorrent
• Using only this choking mechanism may endanger the health of the swarm
– New leechers will automatically be choked because they cannot offer upload
– Two nodes which would have a good connection won’t use it as no node takes initiative
– A node may be choked by all other nodes due to unlucky circumstances / network weaknesses
8.2 BitTorrent
– Randomly initiate a new connection to a currently
unconnected leecher in the swarm and start uploading
• “Take initiative”
• Hope for a good cooperation
– e.g. that this new node provides a high upload rate in the bi-directional transfer
• Allows finding better peers
• Allows new peers to integrate themselves in the swarm
– Other peers voluntarily start uploading to them
– Randomly unchoke some currently choked connections
• “Quick to forgive”
• Helps locked-out peers to return to the swarm
– If a leecher is currently choked by all its peers, it initiates even more unchoking connection
• “Anti-Snubbing”
8.2 Solution: Optimistic Unchoking
• Open issue: upload-only choking
– Once a download is complete, no bi-directional transfer connections are required anymore by that peer
• Peer becomes a seeder
• Which nodes to upload to?
• Seeding Policy
– Upload to those peers with
the best upload / download ratio – Probable Advantages
• Ensures that chunks are replicated faster within the swarm
• Leechers that have a good upload rates are probably not being served by others
– But: Upload / Download ratio hard to determine
• Central bookkeeping? Bookkeeping own ratio? What about cheating?
8.2 BitTorrent
• Download chunks in parallel
– Look for the rarest pieces
– Verify each chunk by checking hash, download again if hash fails
• Advertise received pieces to all connected peers
8.2 BitTorrent
! I have
leecher A
seed leecher B
• Periodically calculate data-receiving rates
• Upload to (unchoke) the fastest downloaders
• Optimistic unchoking
– Periodically select a peer at random and upload to it – Continuously look for the fastest partners
8.2 BitTorrent
leecher A
seed
leecher B
leecher C
• Remember: Bit Torrent uses centralized trackers for managing peer lists
• Tracker Issues
– Single Point of Failure and Attack – Scalability
• PirateBay tracker nearly overloaded (>5 Mio. Peers)
• Solution: Decentralized Tracker
– Replace with DHT (Kademlia)
– Does not tackle distributed search
8.2 Descentralized tracker
• As a generic DHT protocol:
– Kademlia also uses a SHA-1 160-Bit addressed ring hashing data and nodes
• Similar to Pastry, but uses a more sophisticated routing (tree-based) mechanism requiring less maintenance
– Each key-value pair is stored redundantly on multiple nodes.
– Every node maintains information about files, keywords close to itself.
– The closeness between two objects measured as their bitwise XOR interpreted as an integer
• distance(a, b) = a XOR b
Kademlia
Kademlia binary tree
Kademlia node state
For each i (0 ≤ i <160) every node keeps a list of nodes of distance between 2
iand 2
(i+1)from itself.
Call each list a k-bucket. The list is sorted by time last seen. The value of k is chosen so that any given set of k nodes is unlikely to fail within an hour. The list is updated
whenever a node receives a message.
k = system-wide replication parameter, usually 20
Least recenly seen
Most recenly seen
head
• Kademlia DHT Tracker
– Each torrent is identified by its infohash
– All BitTorrent nodes using an compatible client may join the DHT tracker
• Not part of the core BitTorrent protocol
• Authors of client usually also provide bootstrapping nodes to the DHT tracker
• The DHT takes over the trackers responsibility
– DHT Key-Value pairs
• Key: infohash
• Value: swarm peer listing
8.2 BitTorrent
• Peer Exchange (PEX) and Multi-Tracker
– To further increase the performance and resilience of a torrent, multiple trackers can be used
• Easiest case
– Torrent is registered with multiple trackers which are all explicitly specified in the torrent file
• More complex solution: Peer Exchange
– Start connecting to any one tracker
– Ask other connected peers in the swarm for additional peers and / or trackers
8.2 BitTorrent
• PEX Example
– Obtain a torrent and connect to tracker
• i.e. official Ubuntu torrent providing two trackers
• Obtain peer lists
– Additionally, connect to DHT tracker
• Obtain even more peers
– Peers using the same torrent but are trackerless or use DHT and another tracker
– Start peer exchange
• Again, obtain more peers
– i.e. receive peers not using DHT from a DHT node which is using, e.g. the openbittorrent-tracker
8.2 BitTorrent
• BitTorrent provides absolutely no anonymity
– Peer list of a torrent contain all participating nodes
• IP address, port, sometimes upload/download ratio, etc.
– Thus, it is very easy to identify all people downloading / uploading to a torrent
• User behavior can be tracked quite easily by simply introducing a spy node into the swarm
– Privacy implication
• Also, no “pure” downloading is feasible in BitTorrent as download-only nodes will be choked
– Possibly legal implication for copyrighted content
8.3 Anonymous P2P
• Real anonymity is hard to achieve
– Nodes need to communicate and contact each other
• Identities (addresses must be known)
• However, the number of nodes knowing a nodes identity can be limited to only trusted nodes
– So-called dark nets
• Basic idea
– Only connect to a few trusted “friend” nodes – Never communicate directly with a non-friend
– Friends forward any message anonymously to their friends
• If network is designed correctly, most parts should be reachable via friend-of-a-friend routing
8.3 Anonymous P2P
• Two notable systems
– Freenet: Pure P2P network using small world- properties and anonymous routing
– OneSwarm: BitTorrent extension based on friend-to-friend-routing
• Friend-To-Friend Routing
– e.g. B passes a message from A to C
• C does not know that message originated from A
• A does not know that B passes message to C
8.3 Anonymous P2P
A
B requester