Profr. Dr. Wolf-Tilo Balke
Institut für Informationssysteme
Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Distributed Data Management
7.0 Introduction
7.1 Graph Model Basics
7.2 Random Graph Models
7.3 Small-World Graph Models 7.4 Scale-Free Graph Models 7.5 Network Examples
7.6 Network Models in P2P
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 2
7.0 Network Models
• Basic motivation for this lecture
– Can we show that a given P2P
network really has some desired properties?
– How can a P2P network be designed that it will, with high probability, show those desired properties?
• Large P2P networks are hard to evaluate
– In productive phase, usually no global view of the network is available
– In design phase, no large number of peers is available
7.0 Network Models
• Desirable System properties for P2P
– Decentralized and a self-organized network
• No single point of failure or central bottleneck
• Maintaining the network (joining /leaving/ publishing new content) should be performed without any central authority or global view
– Scalability
• The network should scale for any (possible large) number of nodes
– The structure of the network supports searching and retrieving information efficiently
• Obvious demand in information exchange systems
– Reliability despite dynamic changes
• Network should be robust wrt. network and node failures
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 4
Book: P2P Systems and applications, pp 57-77
7.0 Network Models
• To examine the properties of a P2P network, good models are needed
• In this lecture, we focus on graph models for unstructured P2P networks
– Allows easy statistical analysis of network properties – Peers are represented by vertices in a graph
– Entries in routing tables are represented by edges of the graph
– Peers are ego-centered and do not have global knowledge about all other peers and the data stored at those peers
• More complex P2P network protocols require dynamic simulation of networks to evaluate properties
7.0 Network Models
• Outline for this lecture:
• Network graph basics
– How can P2P networks be represented as graphs?
– Which properties can networks graphs have?
– What are desirable properties for a P2P graph?
• Network models
– Many different network models have been studied during the last 60 years
• Some of them are useful to evaluate or design P2P networks
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 6
7.0 Network Models
– Random Networks
• Simple network model to represent pure P2P networks like Gnutella
– Small-World networks
• Naturally occurring networks showing very desirable properties which can be exploited by P2P systems
– Scale-Free networks
• “Naturally” occurring networks in large infrastructures, like e.g. the internet or power grids
7.0 Network Models
• A directed graph 𝑮 is defined as a 𝐺 = (𝑽, 𝑬)
– 𝑽: a set of nodes or vertices 𝑽
– 𝑬: a set of directed edges between elements of 𝑽
• 𝑬 ⊆ 𝑽 × 𝑽
• For P2P networks, 𝑽 represents the set of peers
– |𝑉| = 𝑛
• 𝑬 represents all directed links in the P2P overlay network
– i.e. the union of entries in the routing table of all peers – If later examples use undirected links, it is assumed that
directed links in both directions exist – 𝐸 = 𝑚
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 8
7.1 Graph Theory
• Node outdegree of a node 𝑣 is denoted 𝐝𝐞𝐠+(𝒗)
– i.e., the number of vertices 𝑤 it is connected to by an edge (𝑣, 𝑤)
• deg+ 𝑣 = 𝑁 𝑣 = | 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}|
• Node indegree of a node 𝑣 is denoted 𝐝𝐞𝐠−(𝒗)
– i.e., the number of vertices 𝑤 that are connected to 𝑣 by an edge (𝑤, 𝑣)
• deg− 𝑣 = | 𝑤 ∈ 𝑉 𝑤, 𝑣 ∈ 𝐸}|
• Node degree of a node 𝑣 is denoted 𝐝𝐞𝐠(𝒗) – deg 𝑣 = deg+(𝑣) + deg−(𝑣)
– For undirected graphs, only the node degree is defined
• no in- or out degree
• Neighbors set of a node 𝑣 is denoted 𝐍(𝒗) – 𝑁 𝑣 = 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}
– For every neighbor 𝑤 ∈ 𝑁(𝑣) there exists an edge 𝑣, 𝑤 ∈ 𝐸
7.1 Graph Theory
• Example:
– 𝑉 = 1, 2, 3, 4, 5
– 𝐸 = { 1,5 , 5, 4 , 4,5 , 2,4 , (2,1), (2,3)}
– 𝑁 2 = 1, 3, 4 – 𝑁 4 = {5}
– deg
+(2) = 3, deg
−2 = 0 – deg
+(4) = 1, deg
−4 = 2
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 10
7.1 Graph Theory
5
2
3
1 4
• Path 𝑷(𝒗, 𝒘)
– A path 𝑃(𝑣, 𝑤) is a set of vertices {𝑣
0, 𝑣
1, … , 𝑣
𝑘}
with 𝑣
0= 𝑣 and 𝑣
𝑘= 𝑤 and 𝑣
𝑖, 𝑣
𝑖+1∈ 𝐸 for all (0 ≤ 𝑖 ≤ 𝑘 − 1)
– The path length |𝑃(𝑣, 𝑤)| is defined as the number of edges in path P
– The distance 𝑑(𝑣, 𝑤) is defined as the shortest path length of any path between 𝑣 and 𝑤
7.1 Graph Theory
A shortest path between v, w with length 4
Thus, distance between A path between v, w with length 6
V W
• Metrics describing whole graphs:
• Connectedness
– A graph is connected, if there is a
path from any node to any other node
• k-Connectedness
– A graph is k-connected if the removal of 𝑘 − 1 nodes still leaves the graph connected
• Bisection width 𝒃𝒔𝒘(𝑮)
– Bisection width of a graph 𝑮 is the minimal number of
edges which must be removed to split the graph into two equally-sized unconnected subgraphs
• Represents the minimal cohesion of the graph
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 12
7.1 Graph Theory
• Graph diameter 𝒅(𝑮)
– Represents the maximum extent (path length) of a graph – The diameter of a graph is the maximal distance of
any pair of vertices
• 𝒅 𝑮 = 𝐦𝐚𝐱 𝒅 𝒗, 𝒘 ; 𝒗, 𝒘 ∈ 𝑽
• Average path length 𝒅
𝒂𝒗𝒈(𝑮)
– The sum of all distances between each pair of nodes
divided by the number of all pairs of nodes in a connected graph
• 𝒅𝒂𝒗𝒈 𝑮 = 𝒊,𝒋 ∈ 𝑽𝒙𝑽 𝒅 𝒊,𝒋
𝒏∗ 𝒏−𝟏
7.1 Graph Theory
• Graph outdegree 𝐝𝐞𝐠 + (𝑮)
– The average outdegree of all nodes of 𝑮
• Graph indegree 𝐝𝐞𝐠 − (𝑮)
– The average indegree of all nodes of 𝑮
• For undirected graphs, there is just degree 𝐝𝐞𝐠 𝑮
– The average degree of all nodes
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 14
7.1 Graph Theory
• The clustering coefficient 𝐂(𝐯) of vertex 𝑣 in a directed graph is given by
– The number of links between the vertices within its neighborhood divided by the number of links that could possibly exist between them
• The number of neighbors of 𝑣 is deg
+(𝑣)
• The maximum number of connections between all neighboring nodes is deg
+𝑣 (deg
+𝑣 − 1)
– i.e. each neighbor connected with each other neighbor
– Describes how densely the neighbors of a vertex are interconnected
7.1 Graph Theory
– If 𝑒(𝑁(𝑣)) denotes the actual number of connections that
neighbors of 𝑣 have with each other, the clustering coefficient is 𝑪 𝒗 =
𝒆 𝑵 𝒗𝒅𝒆𝒈+ 𝒗 (𝒅𝒆𝒈+ 𝒗 −𝟏)
• Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge:
Cambridge University Press.
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 16
7.1 Graph Theory
V
𝐶 𝑣 = 4 ∗ 2
4 ∗ 3 = 0.66 𝐶 𝑣 = 3 ∗ 2
3 ∗ 2 = 1 𝐶 𝑣 = 0
3 ∗ 2 = 0
V V
Links between neighbors of V
Maximum number of neighbor links (4 neighbors having at most 3 links)
• Which properties should a good P2P graph have?
– Connectedness
• Each node should be reachable
• If not, some information is not accessible to all peers
– k-Connectedness with large k
• Removing nodes should not immediately disconnect a graph
– Low diameter 𝒅(𝑮)
• Low diameters are necessary to ensure reachability and reduce message load
– Low diameter → quicker TTL possible when flooding
7.1 Graph Theory
– Low average path length 𝒅
𝒂𝒗𝒈(𝑮)
• Most messages should quickly reach their target
– Low average node degrees 𝒅𝒆𝒈(𝑮)
• The higher the node degree is, the more node states must be stored at nodes
• Increases size of routing tables
– High average cluster coefficient
• Densely connected neighborhoods increase the failure-resilience of networks
• Distributed routing possible
– See later: Kleinberg Model
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 18
7.1 Graph Theory
• Random graphs provide the easiest model for any network
– Simple underlying assumptions
– Analyzable with statistical methods
• First family of network models studied (1950s)
– Multiple models for generating a random graph have been developed
– Most prominent generation models are
• the Erdös-Renyi random graph
• the Gilbert random graph
7.2 Random Graphs
• A random graph is usually denoted as 𝒈 𝒏,𝒎
– Random graph with 𝑛 nodes and 𝑚 edges
– For simplicity, we just consider undirected graphs
• Basic idea for constructing a random graph
– Graph construction starts with 𝑛 vertices without any connections
– 𝑚 edges are added one by one between the vertices using some random system
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 20
7.2 Random Graphs
• Pure peer-to-peer networks like Gnutella 0.4 can be modeled by a random graph
– Peers choose their neighbors more or less randomly
• Random bootstrapping, random Ping-Pong
• Unfortunately, “real” Gnutella 0.4
networks are usually not really random
– Bootstrapping is not random
• Usage special bootstrap nodes or bootstrap caches
– Ping-Pong strengthens connectedness of neighborhood and favors “strong” nodes
– Nodes prefer more popular and stronger nodes
• See later: scale-free networks
7.2 Random Graphs
• The behavior of random graphs is often studied for cases where the number of vertices diverges to infinity, i.e. 𝒏 → ∞
– In context of P2P, think of scalability!
– While the number 𝑚 of edges could be fixed, it is usually assumed that 𝑚 grows with 𝑛
• e.g. new nodes in a P2P network will also lead to new connections
• Fixed 𝑚 would quickly lead to mostly unconnected graphs
• Thus, usually 𝒎 is a function of 𝒏
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 22
7.2 Random Graphs
• Erdős-Rényi graphs are the most popular family of random graphs (1959)
– There are two predominant models which are equivalent for large graphs
• 𝒈
𝒏,𝒎models
– Based on randomly selecting an instance of all graphs with 𝑛 nodes and 𝑚 edges
• 𝒈
𝒏,𝒑models
– Each possible edge has a certain probability 𝑝 to be added to a graph or not
– Also known as Gilbert graphs (1959)
7.2 Erdős-Rényi Graphs
• Constructing 𝒈
𝒏,𝒎graphs
– Let 𝑮
𝒏,𝒎be the set of all labeled graphs with 𝑛 nodes and 𝑚 edges
• Labeled graphs: nodes are identifiable
– Unlabeled random graphs only consider the “shape” of graphs
• The number of all such graphs is given by the polynomial coefficient |𝑮𝒏,𝒎| = 𝑵
𝒎 =
𝒏 𝒎𝟐
– The number of possible edges between 𝑛 nodes is 𝑵 = 𝒏 𝟐
– For generating an instance 𝒈
𝒏,𝒎, any instance of 𝑮
𝒏,𝒎is selected with equal probability
• Erdős, P.; Rényi, A. (1959). "On Random Graphs. I.". Publicationes Mathematicae 6: 290-297
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 24
7.2 Erdős-Rényi Graphs
• Example: Constructing 𝑔 3,2 graphs
– There are 3 possible 𝑔
3,2in 𝐺
3,2• Each graph is selected with the probability
13
7.2 Erdős-Rényi Graphs
1 3
2
1 3
2
1 3
2
• The 𝒈
𝒏,𝒎model of random graphs is not suitable for actually generating large random graphs
– Extremely high number of possible graphs for given 𝑛 and 𝑚
26
7.2 Erdős-Rényi Graphs
0 1 2 3 4 5 6 7 8 9 10
1 0 1 3 6 10 15 21 28 36 45
2 0 0 3 15 45 105 210 378 630 990
3 0 0 1 20 120 455 1330 3276 7140 14190
4 0 0 0 15 210 1365 5985 20475 58905 148995
5 0 0 0 6 252 3003 20349 98280 376992 1221759
6 0 0 0 1 210 5005 54264 376740 1947792 8145060
7 0 0 0 0 120 6435 116280 1184040 8347680 45379620
8 0 0 0 0 45 6435 203490 3108105 30260340 215553195
9 0 0 0 0 10 5005 293930 6906900 94143280 886163135
10 0 0 0 0 1 3003 352716 13123110 254186856 3190187286
11 0 0 0 0 0 1365 352716 21474180 600805296 10150595910
12 0 0 0 0 0 455 293930 30421755 1251677700 28760021745
13 0 0 0 0 0 105 203490 37442160 2310789600 73006209045
14 0 0 0 0 0 15 116280 40116600 3796297200 166871334960
15 0 0 0 0 0 1 54264 37442160 5567902560 344867425584
16 0 0 0 0 0 0 20349 30421755 7307872110 646626422970
#Nodes
#Edges
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig
• For generative models: use probabilistic 𝒈 𝒏,𝒑 model of Erdős-Rényi random graphs
– So-called Gilbert graphs
• Gilbert, E.N. (1959). "Random Graphs". Annals of Mathematical Statistics 30: 1141- 1144.
– Number of nodes 𝒏 is fixed
– Each possible edge in 𝑉 × 𝑉 has the fixed probability 𝒑 to be added to the graph
• i.e. underlying assumption is that adding an edge is fully independent of all existing edges
• Larger 𝑝 will generate graphs with more edges, smaller 𝑝 will generate graphs with less edges
7.2 Gilbert Model
• Both models 𝒈 𝒏,𝒎 and 𝒈 𝒏,𝒑 behave
asymptotically equivalent for large 𝑛
– Expected number of edges is 𝑚 = 𝑛
2 𝑝 for large 𝑛
• Law of large number will guarantee equivalence for 𝑝𝑛
2→∞
– Thus, for large 𝑝𝑛
2, statements about properties can be made like
• “Property P holds for most graphs in 𝒈
𝒏,𝒑”
⇔ “Property P holds for most graphs in 𝒈
𝒏,𝒎= 𝑛2 𝑝
”
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 28
7.2 Gilbert Model
• Randomly generated graphs can be used to approximate properties of large random P2P networks
– Many basic properties of random graphs have been
established by Erdős & Rényi 1960 using large 𝒈
𝒏,𝒑, 𝒏 → ∞
• Asymptotical observations
– Many properties are directly dependent on the probability 𝒑 (or the number of edges 𝑚)
• Graphs show several phase transitions depending on the node/edge ratio,
• Each phase transition has a threshold at which certain properties suddenly becomes extremely probable
• Before or after the threshold, the probability of a property 𝑃 is either ℙ(𝑷) → 𝟎 or
ℙ(𝑷) → 𝟏 for 𝒏 → ∞
7.2 Random Graph Properties
• Predicting connected components
– For 𝒏 ∗ 𝒑 < 𝟏, a 𝒈
𝒏,𝒑graph will rarely have any connected components larger than 𝑂(log 𝑛)
• The graph is mainly unconnected, each of its component is very small
• e.g. for a graph 𝒈
𝒏,𝒎with 150 nodes, this threshold is roughly around 74 edges
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 30
7.2 Random Graph Properties
7.2 Random Graph Properties
𝒈 ≡ 𝒈 𝒈 ≡ 𝒈
• Example Graphs
– Statistical prediction: most components will be of logarithmic size w.r.t. to the number of nodes (i.e. will be small)
• Giant Connected Component
– For 𝑛 ∗ 𝑝 = 1, a graph 𝒈
𝒏,𝒑will very probably have a giant connected component of size in 𝑂(𝑛
23)
• e.g. for a graph 𝒈
𝒏,𝒎with 150 nodes, giant components should be observable for 75 edges and more
– Surprisingly, the giant component will appear when the average node degree is 1!
– For 𝒏 ∗ 𝒑 > 𝟏, all other components
will be of size 𝑶(𝒍𝒐𝒈 𝒏)
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 32
7.2 Random Graph Properties
• Example Graphs (giant component appears)
– Statistical prediction: for 𝑚 = 75 (𝑛 ∗ 𝑝 = 1), there is a largest component of size ≈ 28
7.2 Random Graph Properties
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 34
7.2 Random Graph Properties
𝒈
𝟏𝟓𝟎,𝒎=𝟑𝟎𝟎≡ 𝒈
𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟐𝟔𝟖𝒈
𝟏𝟓𝟎,𝒎=𝟏𝟓𝟎≡ 𝒈
𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟏𝟑𝟒• Example Graphs (other components diminish)
– Statistical prediction: no other component will be large
• Connectedness
– For 𝑝 <
ln 𝑛𝑛
, the graph will surely contain isolated vertices and will thus be disconnected
– For 𝑝 >
ln 𝑛𝑛
, the graph will usually be almost connected
• e.g. for a graph 𝒈
𝒏,𝒎with 150 nodes, this threshold around 374 edges
7.2 Random Graph Properties
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 36
7.2 Random Graph Properties
𝒈
𝟏𝟓𝟎,𝟑𝟕𝟒≡ 𝒈
𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟑𝟑• Example Graphs (connectedness)
– Statistical prediction: for 𝑝 = 0.033 = ln 𝑛
𝑛 , the graph is almost surely connected
• Degree Distribution
– The node degree of large random graphs can be modeled with Poisson distribution
• Let 𝝀 be a constant 𝝀 = (𝒏 − 𝟏) ∗ 𝒑.
• Then the probability distribution of the node degrees 𝒌 = 𝟎, 𝟏, 𝟐, 𝟑, 𝟒, … can be approximated for 𝒏 → ∞ as the Poisson density ℙ 𝑿 = 𝒌 =
𝝀𝒌𝒆−𝝀𝒌!
7.2 Random Graph Properties
• This degree distribution falls faster than an exponential
distribution in 𝑑, hence it is not a power-law distribution
• For larger 𝛌, behaves approximately similar to a normal distribution
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 38
7.2 Random Graph Properties
• Degree Distribution for 𝒈
𝟏𝟓𝟎,𝒑= 𝟏
𝟏𝟓𝟎
and 𝝀 = 𝟏
– 69 edges
7.2 Random Graph Properties
estimated
measured
• Degree Distribution for 𝒈
𝟏𝟓𝟎,𝒑= 𝟐
𝟏𝟓𝟎
and 𝝀 = 𝟐
– 142 edges
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 40
7.2 Random Graph Properties
estimated
measured
• Diameter
– If 𝑔 is connected, the expected diameter of 𝑔
𝑛,𝑚is in 𝑂(log 𝑛) with high probability
• i.e. the diameter of a connected random graph grows only logarithmically
• 𝑔𝑛,𝑝 is surely connected for p ≥ ln 𝑛
𝑛 for 𝑛 → ∞
– or: 𝑔𝑛,𝑚 is surely connected 𝑚 ≥ (𝑛
2)ln 𝑛
𝑛
7.2 Random Graph Properties
𝑑(𝐺) = 7
• Clustering Coefficient
– The clustering coefficient of a random graph 𝑔
𝑛,𝑝is with high probability asymptotically equal to 𝒑 for 𝑛 → ∞
• This is a rather low clustering coefficient
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 42
7.2 Random Graph Properties
𝒈𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟑 𝐶𝑎𝑣𝑔 ≈ 0.0273
nodes colored by 𝐶
𝒈𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟔 𝐶𝑎𝑣𝑔 ≈ 0.0563
• Observation: “Real and natural” networks are not random, but have some inherent structure
– Many “naturally” occurring networks are very robust and efficient
• Social network among people
• Neural networks
• Power lines, the Internet, streets, etc.
• …
– What properties do real-life networks have?
• Why are they stable and efficient?
7.3 Small-World Graphs
• First real networks to be studied: Social Networks among people
– “Six degrees of Separation”
• First mentioned 1929 by the Hungarian star author Karinthy Frigyes in his short story “Chains”
– Claim: all 1½ billion people in the world (sic.) know Frigyes via at most five acquaintances
» Friend-of-a-friend connections – Motivated by two examples
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 44
7.3 Small-World Graphs
– Example 1: some 1929 Nobel price laureate,
• …knows King Gustav of Sweden…
• … who passionately plays tennis and knows a famous tennis champion…
• … who is a friend of Frigyes
– Example 2: unknown factory worker at a Ford manufacture
• Knows his boss, who knows Ford personally, who knows the director of the media house Hearst Publications, who knows the writer Árpád
Pásztor, who is a friend of Frigyes
7.3 Small-World Graphs
• This idea was scientifically examined in 1967:
Sociologist Stanley Milgram, Yale University
– Persons chosen at random in Kansas and Nebraska were asked to deliver a letter to a certain stock broker in Boston, MA
– This was the only information about the target person
– Constraint: The letter can only be given to persons one knows on a first name basis (acquaintances)
• 1967: No internet, transportation really expensive and cumbersome, close local communities
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 46
7.3 Small-World Graphs
S. Milgram (1933 - 1984)
• Letters used in the Milgram experiment
7.3 Small-World Graphs
• Those letters that reached the target person were only passed on over 6 mediators on average
– “6 degrees of separation“
– This was far less than originally assumed!
– Thus, social graphs were coined “Small-World Graphs”
• The original experiment was later criticized
– Only 50 persons took part in the original experiment
– Only 5% of letters were actually received by the target person
• But,…
– One letter was received within only 4 days – The small world effect was experimentally
observed in a vast variety of other sciences
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 48
7.3 Small-World Graphs
• Interesting trivia “Six Degrees of Kevin Bacon“
– Kevin Bacon once claimed that he's worked with everybody in Hollywood or someone who's worked with them
– College students build a party game out of that statement based on Milgram‘s ideas
• Basic idea:
– Link actors via a minimum number of movies to actor Kevin Bacon
– e.g., Val Kilmer was in “Top Gun” with Tom Cruise, and Tom Cruise was in “A Few Good Men” with Kevin Bacon
– Only approximately 12% of all actors cannot be linked to Bacon
> try: http://oracleofbacon.org/
7.3 Small-World Graphs
J. Leskovec, E. Horvitz. Worldwide Buzz: Planetary-Scale Views on an Instant- Messaging Network. Proc. International WWW Conference, 2008.
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 50
7.3 Small-World Graphs
Data June 2006
– 4.5 TB of compressed data.
– 245 million users logged in.
– 180 million users engaged in conversations.
– More than 30 billion conversations.
– More than 255 billion exchanged messages.
• Communication graph
– Edge if the users exchanged at least 1 message – 180 million people
– 1.3 billion edges
– 30 billion conversations
7.3 Small-World Graphs
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 52
7.3 Small-World Graphs
Average path length 6.6
90% of the people can be reached in <8 hops
• However, it took a while until such naturally
occurring networks have been formally understood
– Erdős–Rényi random graphs are bad models for natural networks
• Natural networks often show “hubs”
– There a some nodes with very high node degree
– Node degree better described by a power-law distribution than a Poisson distribution
• Natural networks often show a very high degree of local clustering
– High average cluster coefficients
– e.g. by local communities, friend cliques, co-worker networks, local transportation networks, etc…
• Natural networks often have a low average path length
7.3 Small-World Graphs
• First models for natural graphs were proposed by Duncan Watts and Steven Strogatz in1998
– Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.“ Nature 393 (6684): 409–10. doi:10.1038/30918
– They examined three real-world networks
• The simple neural “brain” network of the roundworm (nematode) Caenorhabditis Elegans
– A natural network
• Power grids networks
– A man-made network
• Collaborations networks between movie actors
– Semi-natural network
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 54
7.3 Small-World Graphs
• Watts and Strogatz mainly examined the average path length and the cluster coefficient
– Comparison with equally sized random graphs
• Similar node and edge number
– Result:
• Real networks have a much higher degree of local clustering (10x to 1000x higher) than random graphs
• Average path length is more or less similar
7.3 Small-World Graphs
• “Definition” of small-world graphs
– “A small world network is a network with a dense local structure and a diameter comparable to a random graph with same numbers of nodes and edges.”
• Additionally: “The node degree is homogenous”
• Watts and Strogatz also proposed the first
generative model for a certain class of small- world- graphs
– So called Watts-Strogatz graphs
• There are other small-world classes
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 56
7.3 Watts-Strogatz Graphs
• Properties of Watts-Strogatz graphs
– Low average path length
– High average clustering coefficients
– Homogenously distributed node degrees
• Good model for e.g. social or neural networks and most other “natural” networks
• Not a good model for most man-made grid-like networks
– Those show power-law distributed node degrees – By definition, these are not small-world graphs – e.g. internet, airline routes, train lines, etc.
– Watts-Strogatz graphs are between random and scale- free networks
7.3 Watts-Strogatz Graphs
• The generative model (Watts-Strogatz model)
– Graph is denoted as 𝑔_𝑤𝑠
𝑛,𝑘,𝑝• 𝒏 is the number of nodes (integer)
• 𝒌 is the neighborhood degree (integer)
• 𝒑 is the rewire probability (float in [0. . 1])
– Build a ring of 𝒏 vertices and connect each vertex with its 𝒌 clockwise neighbors on the ring
– Draw a random number between 0 and 1 for each edge
• Rewire each edge with probability 𝑝: if random number is larger than 𝑝, do nothing. Else rewire.
– Rewiring: keep the source vertex of the edge fixed, and choose a new target vertex uniformly at random from all other vertices
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 58
7.3 Watts-Strogatz Graphs
– For 𝑝 = 0, the resulting network is totally regular, with a clustering coefficient approaching
34
for large 𝑘, the diameter is in 𝑂(𝑛)
– For 𝑝 = 1, the resulting network is a kind of a random graph (regular random graph) with a diameter in 𝑂(log 𝑛)
7.3 Watts-Strogatz Graphs
k = 2
• Comparing Watts-Strogatz Graphs
– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3
– coloring by cluster coefficient
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 60
7.3 Watts-Strogatz Graphs
Erdős-Rényi Graph Watts-Strogatz with 𝑝 = 0.0
• Comparing Watts-Strogatz Graphs
– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3
7.3 Watts-Strogatz Graphs
• Comparing Watts-Strogatz Graphs
– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 62
7.3 Watts-Strogatz Graphs
Watts-Strogatz with 𝑝 = 0.05 Watts-Strogatz with 𝑝 = 0.1
• Histogram of cluster coefficients
– Single sample
• Random
– Generally lower coefficient
• Small World
– Homogeneous, higher coefficient
7.3 Watts-Strogatz Graphs
p=0.00 p=0.02 p=0.05 p=0.10 random
Number of Nodes 01020304050
• Histogram of node degrees
– Same sample
• Random
– Homogeneous degree
– Higher variance
• Small World
– Homogeneous degree
– Low variance
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 64
7.3 Watts-Strogatz Graphs
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10
p=0.00 p=0.02 p=0.05 p=0.10 random
Node Degree Number of Nodes 01020304050
• Investigating clustering coefficients and average path lengths in dependence of 𝒑
– For a graph with 5000 nodes
– Normalized by the clustering coefficient and the path length at 𝑝 = 0
• Clustering coefficient is still high for small 𝑝, but the
average path length decreases extremely fast due to ‘short cuts’
7.3 Watts-Strogatz Graphs
• The Watts-Strogatz model explains how a small- world graph can be constructed
– i.e. “How can locally densely connected graphs with shortcuts be constructed?”
• But: navigating a small-world can be very difficult!
– Assume “Six Degrees of Separation” was true:
route a message to any arbitrary person
• All people on earth would be reachable by just six acquaintances
– But which ones?
• Random navigation or flooding won’t help
– Exponentially many possibilities!
• Solution: Use clues and heuristics to quickly
route the massage into the correct neighborhood!
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 66
7.3 Kleinberg Navigability Model
• Challenging question: how can we find short
paths in a distributed fashion in a small-world?
– “Why should arbitrary pairs of strangers be able to find short chains of acquaintances that link them together?”
• J.M. Kleinberg, “Navigation in a Small-World”, Nature, 2000
– Some routing information is necessary
• Enough but not too much information!
7.3 Kleinberg Navigability Model
– Nodes see local parts of the network (neighborhood)
• i.e., they route the letter in a decentralized fashion
• In social networks additional information (same profession, address, hobbies, etc.) is used to decide which neighbor is
‘closest‘ to the recipient
– Milgram showed that the first steps of the letter were the geographically largest, while later steps were
closing in on the target area
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 68
7.3 Kleinberg Navigability Model
• A decentralized routing algorithm can be modeled as follows
– Let every node 𝑣 have a position 𝑃𝑜𝑠(𝑣) on a toroidal grid in a d-dimensional space
• 𝑃𝑜𝑠(𝑣) = (𝑥1, 𝑥2, … , 𝑥𝑑) with all 𝑥𝑖 being integers
– 𝑃𝑜𝑠(𝑣) is 𝑑-dimensional vector
– 𝑥𝑖(𝑣) is the position of 𝑣 in dimension 𝑖
– Every node knows some basic information of the underlying grid structure
• i.e. its own position in the grid, its neighbors, and the target node
– no global knowledge, only local information
7.3 Kleinberg Navigability Model
– Each node hands the message (i.e., letter) to the one neighbor of 𝑣 that is closest to the target 𝑡
– The distance measure 𝑑
𝑀(𝑣, 𝑤) is given by the Manhattan Distance
• by the sum over the absolute difference
𝑖|𝑥
𝑖𝑣 − 𝑥
𝑖(𝑤)|
• Let the routing algorithm take place on the following network model
– Start with a 𝑑 -dimensional grid
– Add random edges between vertices v and w with a probability of 𝑃 𝑣, 𝑤 ~ 𝑑
𝑀𝑣, 𝑤
−𝛼• inverse 𝛼
𝑡ℎ-power distribution
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 70
7.3 Kleinberg Navigability Models
• Node 𝑢 is connected to all its neighbors (𝑎, 𝑏, 𝑐, and 𝑑) and has a long-range link to some randomly chosen node 𝑣 with a probability proportional to 𝑑𝑖𝑠𝑡 𝑢, 𝑣
−𝛼– The higher the distance, the lower the link probability
7.3 Kleinberg Navigability Model
• Theorem: The routing algorithm will find
‘short‘ paths, if and only if 𝜶 = 𝒅
– ‘short‘ means that arbitrary paths length are in 𝑂(log 𝑛)
– Simulation results on the greedy routing algorithm a 2-dimensional toroidal
grid with 20,000 × 20,000 nodes (averages over
1000 runs)
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 72
7.3 Kleinberg Navigability Model
• Idea behind the proof is that for any 𝛼 < 𝑑 there are too few random edges to form shortcuts
• For 𝛼 > 𝑑 there are too many random edges, and hence too many choices to which the message could be passed on
– The routing will degenerate into a random walk
• Kleinberg small-worlds thus provide a way of
building a peer-to-peer overlay network allowing for a simple, greedy, and distributed routing protocol
– But: How are nodes mapped to 𝑑-dimensional space such that the distance measurement is meaningful?
7.3 Kleinberg Navigability Model
• Small-World and random graphs show homogenous node degree distributions
– For small-world, distribution looks similar to a
normal distribution with 𝜇 = 2𝑘 for non-extreme 𝑝
• The actual model is more complicated
• 𝑘 is the number of neighbors of the initial ring
– Random graphs are Poisson distributed
• For larger 𝑚, will also approximate a normal distribution
• But many (especially artificial) real-life networks show extreme node degree distributions
– e.g. strong hub-topologies
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 74
7.4 Scale-Free Networks
• In 1999, Albert-László Barabási (Univ. of Notre Dame) crawled
parts of the WWW to investigate its actual structure
– The node degree is power-law distributed
• i.e., the probability that a node in the network is connects to k other nodes is 𝑃 𝑘 ~ 𝑘 − 𝛾
– (usually with 2 < 𝛾 ≤ 3)
– Most nodes have a small degree of around 1 to 2 – Few nodes have an extremely high node degree – High-degree vertices are called ‘hubs‘
• Albert-László Barabási. “Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life”. Plume.
2003. ISBN 978-0452284395
7.4 Scale-Free Networks
• Definition: Graphs with a power-law node degree distribution form ‘scale-free’ networks
– Also called power-law networks
• What kind of network model can generate this more realistic degree distribution?
– Barabási–Albert model builds a certain subset of scale-free networks
• Albert-László Barabási & Réka Albert."Emergence of scaling in random networks". Science, 1999 doi:10.1126/science.286.5439.509.
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 76
7.4 Scale-Free Networks
• Barabási–Albert model: Basic Idea
– In its simplest form denoted as 𝒈_𝒃𝒂
𝒏,𝒎• 𝑛 is the number of nodes in the graph
• 𝑚 is the number of edges added per time step
– The total number of edges is thus 𝑛 ∗ 𝑚
– Start with any initial graph of size 𝑛
0• 𝑛0 ≥ 2 and degree of any node deg(𝑣) ≥ 1
• Often, just 𝑚 connected nodes are used as default initial network
– If initial network is not connected, the result network cannot be guaranteed to be connected
– Barabási–Albert graph is constructed iteratively by adding new nodes one by one until target size 𝑛 is reached
• Represents one time step in a simulated network growth
– i.e. Discrete Time Modeling
• Add nodes until target size 𝑛 is reached
• Each new node is connected to 𝒎 existing nodes
7.4 Barabási–Albert Graphs
– New edges are not added randomly, but favor higher-degree nodes
• “The rich get richer“
• Preferential attachment to higher-degree nodes
– The higher the degree of a possible target node, the higher the probability that the new node will attach to it
– Preferential attachment defines the probability
∏(𝒗) for vertex 𝑣 to get an edge to a new node
• In general, is proportional to the node degree, i.e.
∏ 𝒗 ~ 𝐝𝐞𝐠(𝒗)
• Most common definition is
∏ 𝑣 =
deg 𝑣𝑤∈𝑉 deg(𝑤)
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 78
7.4 Barabási–Albert Graphs
• Example: 𝒈_𝒃𝒂 𝟓,𝟏
7.4 Barabási–Albert Graphs
𝒕 = 𝟎 𝒕 = 𝟏 − 𝜺
• Initial graph • Add new node 𝑣3
• Probability for connecting any old
node 𝑣 to 𝑣3 is given by ∏ 𝑣 = deg 𝑣
𝑤∈𝑉deg 𝑤
• e.g., connect to 𝑣1
• Random decision steered by preferential attachment
𝑣1 𝑣2
𝑣1 𝑣2
𝑣3
∏(𝑣2) =1
∏(𝑣1) = 1 2 2
𝑣1 𝑣2
𝑣3 𝒕 = 𝟏
• Example: 𝒈_𝒃𝒂 𝟓,𝟏
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 80
7.4 Barabási–Albert Graphs
𝒕 = 𝟐 − 𝜺
• Add new node 𝑣4
• Evaluate preferential attachment
• e.g. connect to 𝑣1
∏(𝑣3) =1 4
𝑣1 𝑣2
𝑣3
∏(𝑣2) = 1
∏(𝒗𝟐) = 𝟏 4 𝟐
𝑣4
𝒕 = 𝟑 − 𝜺
• Add new node 𝑣5
• Evaluate preferential attachment
• e.g. connect to 𝑣1
∏(𝑣3) = 1 6
𝑣1 𝑣2
𝑣3
∏(𝑣2) = 1 6
𝑣4
𝑣5
∏(𝒗𝟏) = 𝟏
∏(𝑣4) = 1 𝟐 6
• Comparing Barabási–Albert Graphs
– 𝑛 = 50, ~50 edges
– coloring by node degree
7.4 Barabási–Albert Graphs
• Comparing Barabási–Albert Graphs
– 𝑛 = 100, ~100 edges
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 82
7.4 Barabási–Albert Graphs
Erdős-Rényi Graph Barabási–Albert Graphs
• Comparing Barabási–Albert Graphs
– 𝑛 = 100, ~150 edges
7.4 Barabási–Albert Graphs
• Histogram of node coefficients
– Single sample – 100 nodes – 300 edges
• Random
– Generally lower degree
• Small World
– Homogeneous degree
• Scale-Free
– Power-law – Hubs visible
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 84
7.4 Barabási–Albert Graphs
3 5 7 9 11 13 15 17 19 21 23 25 27
Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random
Node Degree Number of Nodes 010203040506070
Dampening factor for decreasing strength of preferential attachment
• Node degree for larger Barabási–Albert graphs
– 200k nodes – 400k edges – Logarithmic
Scale
7.4 Barabási–Albert Graphs
degree
relative frequency
• Histogram of cluster coefficients (𝐶 )
– Same sample
• Random
– Low 𝐶
• Small World
– Homogeneous high 𝐶
• Scale-Free
– Also power-law – Lower than SW
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 86
7.4 Barabási–Albert Graphs
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random
Cluster Coefficient Number of Nodes 010203040506070
• Important property of scale-free networks is robustness against random failures
– Removing a random vertex 𝑣 will likely hit a low-degree node
• Expected damage to network is small
– A failing high-degree node can severely damage a network
• Better fail-safety necessary for high-degree node to ensure overall robustness
• Thus, scale-free networks are very sensitive against attacks
– If a malevolent attacks explicitly target the highest degree nodes, the network can easily decompose
• Note: random graphs are not resilient against random failures, but also not particularly prone to attacks
– Most vertices more or less have the same degree
7.4 Scale-Free Networks
• Random Graph: 50 nodes, 50 edges
– Color by degree
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 88
7.5 Comparing Graphs
Property Value
Connected No
Diameter (conn.) 9
Avg. Path Length 4.39
#Clusters 6
Largest Cluster 39
k-connectedness 0
Avg. Cluster Coeff. 0.033
Avg. Degree 2
• Watts-Strogatz Graph: 50 nodes, 50 edges
7.5 Comparing Graphs
Property Value
Connected No
Diameter (conn.) 35
Avg. Path Length 12.73
#Clusters 2
Largest Cluster 38
k-connectedness 0
Avg. Cluster Coeff. 0
Avg. Degree 2
𝑝 = 0.05
• Barabási-Albert Graph: 50 nodes, 49 edges
Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 90
7.5 Comparing Graphs
Property Value
Connected Yes
Diameter 12
Avg. Path Length 5.14
k-connectedness 1
Avg. Cluster Coeff. 0
Avg. Degree 1.96
𝑝𝑎 = 0.8