Distributed Data Management

(1)

Profr. Dr. Wolf-Tilo Balke

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

(2)

7.0 Introduction

7.1 Graph Model Basics

7.2 Random Graph Models

7.3 Small-World Graph Models 7.4 Scale-Free Graph Models 7.5 Network Examples

7.6 Network Models in P2P

Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig 2

7.0 Network Models

(3)

• Basic motivation for this lecture

– Can we show that a given P2P

network really has some desired properties?

– How can a P2P network be designed that it will, with high probability, show those desired properties?

• Large P2P networks are hard to evaluate

– In productive phase, usually no global view of the network is available

– In design phase, no large number of peers is available

7.0 Network Models

(4)

• Desirable System properties for P2P

– Decentralized and a self-organized network

• No single point of failure or central bottleneck

• Maintaining the network (joining /leaving/ publishing new content) should be performed without any central authority or global view

– Scalability

• The network should scale for any (possible large) number of nodes

– The structure of the network supports searching and retrieving information efficiently

• Obvious demand in information exchange systems

– Reliability despite dynamic changes

• Network should be robust wrt. network and node failures

Book: P2P Systems and applications, pp 57-77

7.0 Network Models

(5)

• To examine the properties of a P2P network, good models are needed

• In this lecture, we focus on graph models for unstructured P2P networks

– Allows easy statistical analysis of network properties – Peers are represented by vertices in a graph

– Entries in routing tables are represented by edges of the graph

– Peers are ego-centered and do not have global knowledge about all other peers and the data stored at those peers

• More complex P2P network protocols require dynamic simulation of networks to evaluate properties

7.0 Network Models

(6)

• Outline for this lecture:

• Network graph basics

– How can P2P networks be represented as graphs?

– Which properties can networks graphs have?

– What are desirable properties for a P2P graph?

• Network models

– Many different network models have been studied during the last 60 years

• Some of them are useful to evaluate or design P2P networks

7.0 Network Models

(7)

– Random Networks

• Simple network model to represent pure P2P networks like Gnutella

– Small-World networks

• Naturally occurring networks showing very desirable properties which can be exploited by P2P systems

– Scale-Free networks

• “Naturally” occurring networks in large infrastructures, like e.g. the internet or power grids

7.0 Network Models

(8)

• A directed graph 𝑮 is defined as a 𝐺 = (𝑽, 𝑬)

– 𝑽: a set of nodes or vertices 𝑽

– 𝑬: a set of directed edges between elements of 𝑽

• 𝑬 ⊆ 𝑽 × 𝑽

• For P2P networks, 𝑽 represents the set of peers

– |𝑉| = 𝑛

• 𝑬 represents all directed links in the P2P overlay network

– i.e. the union of entries in the routing table of all peers – If later examples use undirected links, it is assumed that

directed links in both directions exist – 𝐸 = 𝑚

7.1 Graph Theory

(9)

• Node outdegree of a node 𝑣 is denoted 𝐝𝐞𝐠⁺(𝒗)

– i.e., the number of vertices 𝑤 it is connected to by an edge (𝑣, 𝑤)

• deg⁺ 𝑣 = 𝑁 𝑣 = | 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}|

• Node indegree of a node 𝑣 is denoted 𝐝𝐞𝐠⁻(𝒗)

– i.e., the number of vertices 𝑤 that are connected to 𝑣 by an edge (𝑤, 𝑣)

• deg⁻ 𝑣 = | 𝑤 ∈ 𝑉 𝑤, 𝑣 ∈ 𝐸}|

• Node degree of a node 𝑣 is denoted 𝐝𝐞𝐠(𝒗) – deg 𝑣 = deg⁺(𝑣) + deg⁻(𝑣)

– For undirected graphs, only the node degree is defined

• no in- or out degree

• Neighbors set of a node 𝑣 is denoted 𝐍(𝒗) – 𝑁 𝑣 = 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}

– For every neighbor 𝑤 ∈ 𝑁(𝑣) there exists an edge 𝑣, 𝑤 ∈ 𝐸

7.1 Graph Theory

(10)

• Example:

– 𝑉 = 1, 2, 3, 4, 5

– 𝐸 = { 1,5 , 5, 4 , 4,5 , 2,4 , (2,1), (2,3)}

– 𝑁 2 = 1, 3, 4 – 𝑁 4 = {5}

– deg

⁺

(2) = 3, deg

⁻

2 = 0 – deg

⁺

(4) = 1, deg

⁻

4 = 2

7.1 Graph Theory

5

2

3

1 4

(11)

• Path 𝑷(𝒗, 𝒘)

– A path 𝑃(𝑣, 𝑤) is a set of vertices {𝑣

₀

, 𝑣

₁

, … , 𝑣

_𝑘

}

with 𝑣

₀

= 𝑣 and 𝑣

_𝑘

= 𝑤 and 𝑣

_𝑖

, 𝑣

_𝑖+1

∈ 𝐸 for all (0 ≤ 𝑖 ≤ 𝑘 − 1)

– The path length |𝑃(𝑣, 𝑤)| is defined as the number of edges in path P

– The distance 𝑑(𝑣, 𝑤) is defined as the shortest path length of any path between 𝑣 and 𝑤

7.1 Graph Theory

A shortest path between v, w with length 4

Thus, distance between A path between v, w with length 6

V W

(12)

• Metrics describing whole graphs:

• Connectedness

– A graph is connected, if there is a

path from any node to any other node

• k-Connectedness

– A graph is k-connected if the removal of 𝑘 − 1 nodes still leaves the graph connected

• Bisection width 𝒃𝒔𝒘(𝑮)

– Bisection width of a graph 𝑮 is the minimal number of

edges which must be removed to split the graph into two equally-sized unconnected subgraphs

• Represents the minimal cohesion of the graph

7.1 Graph Theory

(13)

• Graph diameter 𝒅(𝑮)

– Represents the maximum extent (path length) of a graph – The diameter of a graph is the maximal distance of

any pair of vertices

• 𝒅 𝑮 = 𝐦𝐚𝐱 𝒅 𝒗, 𝒘 ; 𝒗, 𝒘 ∈ 𝑽

• Average path length 𝒅

_𝒂𝒗𝒈

(𝑮)

– The sum of all distances between each pair of nodes

divided by the number of all pairs of nodes in a connected graph

• 𝒅_𝒂𝒗𝒈 𝑮 = ^{𝒊,𝒋 ∈ 𝑽𝒙𝑽} ^{𝒅 𝒊,𝒋}

𝒏∗ 𝒏−𝟏

7.1 Graph Theory

(14)

• Graph outdegree 𝐝𝐞𝐠 ⁺ (𝑮)

– The average outdegree of all nodes of 𝑮

• Graph indegree 𝐝𝐞𝐠 ⁻ (𝑮)

– The average indegree of all nodes of 𝑮

• For undirected graphs, there is just degree 𝐝𝐞𝐠 𝑮

– The average degree of all nodes

7.1 Graph Theory

(15)

• The clustering coefficient 𝐂(𝐯) of vertex 𝑣 in a directed graph is given by

– The number of links between the vertices within its neighborhood divided by the number of links that could possibly exist between them

• The number of neighbors of 𝑣 is deg

⁺

(𝑣)

• The maximum number of connections between all neighboring nodes is deg

⁺

𝑣 (deg

⁺

𝑣 − 1)

– i.e. each neighbor connected with each other neighbor

– Describes how densely the neighbors of a vertex are interconnected

7.1 Graph Theory

(16)

– If 𝑒(𝑁(𝑣)) denotes the actual number of connections that

neighbors of 𝑣 have with each other, the clustering coefficient is 𝑪 𝒗 =

^{𝒆 𝑵 𝒗}

𝒅𝒆𝒈⁺ 𝒗 (𝒅𝒆𝒈⁺ 𝒗 −𝟏)

• Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge:

Cambridge University Press.

7.1 Graph Theory

V

𝐶 𝑣 = 4 ∗ 2

4 ∗ 3 = 0.66 𝐶 𝑣 = 3 ∗ 2

3 ∗ 2 = 1 𝐶 𝑣 = 0

3 ∗ 2 = 0

V V

Links between neighbors of V

Maximum number of neighbor links (4 neighbors having at most 3 links)

(17)

• Which properties should a good P2P graph have?

– Connectedness

• Each node should be reachable

• If not, some information is not accessible to all peers

– k-Connectedness with large k

• Removing nodes should not immediately disconnect a graph

– Low diameter 𝒅(𝑮)

• Low diameters are necessary to ensure reachability and reduce message load

– Low diameter → quicker TTL possible when flooding

7.1 Graph Theory

(18)

– Low average path length 𝒅

_𝒂𝒗𝒈

(𝑮)

• Most messages should quickly reach their target

– Low average node degrees 𝒅𝒆𝒈(𝑮)

• The higher the node degree is, the more node states must be stored at nodes

• Increases size of routing tables

– High average cluster coefficient

• Densely connected neighborhoods increase the failure-resilience of networks

• Distributed routing possible

– See later: Kleinberg Model

7.1 Graph Theory

(19)

• Random graphs provide the easiest model for any network

– Simple underlying assumptions

– Analyzable with statistical methods

• First family of network models studied (1950s)

– Multiple models for generating a random graph have been developed

– Most prominent generation models are

• the Erdös-Renyi random graph

• the Gilbert random graph

7.2 Random Graphs

(20)

• A random graph is usually denoted as 𝒈 _𝒏,𝒎

– Random graph with 𝑛 nodes and 𝑚 edges

– For simplicity, we just consider undirected graphs

• Basic idea for constructing a random graph

– Graph construction starts with 𝑛 vertices without any connections

– 𝑚 edges are added one by one between the vertices using some random system

7.2 Random Graphs

(21)

• Pure peer-to-peer networks like Gnutella 0.4 can be modeled by a random graph

– Peers choose their neighbors more or less randomly

• Random bootstrapping, random Ping-Pong

• Unfortunately, “real” Gnutella 0.4

networks are usually not really random

– Bootstrapping is not random

• Usage special bootstrap nodes or bootstrap caches

– Ping-Pong strengthens connectedness of neighborhood and favors “strong” nodes

– Nodes prefer more popular and stronger nodes

• See later: scale-free networks

7.2 Random Graphs

(22)

• The behavior of random graphs is often studied for cases where the number of vertices diverges to infinity, i.e. 𝒏 → ∞

– In context of P2P, think of scalability!

– While the number 𝑚 of edges could be fixed, it is usually assumed that 𝑚 grows with 𝑛

• e.g. new nodes in a P2P network will also lead to new connections

• Fixed 𝑚 would quickly lead to mostly unconnected graphs

• Thus, usually 𝒎 is a function of 𝒏

7.2 Random Graphs

(23)

• Erdős-Rényi graphs are the most popular family of random graphs (1959)

– There are two predominant models which are equivalent for large graphs

• 𝒈

_𝒏,𝒎

models

– Based on randomly selecting an instance of all graphs with 𝑛 nodes and 𝑚 edges

• 𝒈

_𝒏,𝒑

models

– Each possible edge has a certain probability 𝑝 to be added to a graph or not

– Also known as Gilbert graphs (1959)

7.2 Erdős-Rényi Graphs

(24)

• Constructing 𝒈

_𝒏,𝒎

graphs

– Let 𝑮

_𝒏,𝒎

be the set of all labeled graphs with 𝑛 nodes and 𝑚 edges

• Labeled graphs: nodes are identifiable

– Unlabeled random graphs only consider the “shape” of graphs

• The number of all such graphs is given by the polynomial coefficient |𝑮_𝒏,𝒎| = 𝑵

𝒎 =

𝒏 𝒎𝟐

– The number of possible edges between 𝑛 nodes is 𝑵 = 𝒏 𝟐

– For generating an instance 𝒈

_𝒏,𝒎

, any instance of 𝑮

_𝒏,𝒎

is selected with equal probability

• Erdős, P.; Rényi, A. (1959). "On Random Graphs. I.". Publicationes Mathematicae 6: 290-297

7.2 Erdős-Rényi Graphs

(25)

• Example: Constructing 𝑔 _3,2 graphs

– There are 3 possible 𝑔

_3,2

in 𝐺

_3,2

• Each graph is selected with the probability

¹

3

7.2 Erdős-Rényi Graphs

1 3

2

1 3

2

1 3

2

(26)

• The 𝒈

_𝒏,𝒎

model of random graphs is not suitable for actually generating large random graphs

– Extremely high number of possible graphs for given 𝑛 and 𝑚

26

7.2 Erdős-Rényi Graphs

0 1 2 3 4 5 6 7 8 9 10

1 0 1 3 6 10 15 21 28 36 45

2 0 0 3 15 45 105 210 378 630 990

3 0 0 1 20 120 455 1330 3276 7140 14190

4 0 0 0 15 210 1365 5985 20475 58905 148995

5 0 0 0 6 252 3003 20349 98280 376992 1221759

6 0 0 0 1 210 5005 54264 376740 1947792 8145060

7 0 0 0 0 120 6435 116280 1184040 8347680 45379620

8 0 0 0 0 45 6435 203490 3108105 30260340 215553195

9 0 0 0 0 10 5005 293930 6906900 94143280 886163135

10 0 0 0 0 1 3003 352716 13123110 254186856 3190187286

11 0 0 0 0 0 1365 352716 21474180 600805296 10150595910

12 0 0 0 0 0 455 293930 30421755 1251677700 28760021745

13 0 0 0 0 0 105 203490 37442160 2310789600 73006209045

14 0 0 0 0 0 15 116280 40116600 3796297200 166871334960

15 0 0 0 0 0 1 54264 37442160 5567902560 344867425584

16 0 0 0 0 0 0 20349 30421755 7307872110 646626422970

#Nodes

#Edges

Distributed Data Management – Profr. Dr. Wolf-Tilo Balke – IfIS – TU Braunschweig

(27)

• For generative models: use probabilistic 𝒈 _𝒏,𝒑 model of Erdős-Rényi random graphs

– So-called Gilbert graphs

• Gilbert, E.N. (1959). "Random Graphs". Annals of Mathematical Statistics 30: 1141- 1144.

– Number of nodes 𝒏 is fixed

– Each possible edge in 𝑉 × 𝑉 has the fixed probability 𝒑 to be added to the graph

• i.e. underlying assumption is that adding an edge is fully independent of all existing edges

• Larger 𝑝 will generate graphs with more edges, smaller 𝑝 will generate graphs with less edges

7.2 Gilbert Model

(28)

• Both models 𝒈 _𝒏,𝒎 and 𝒈 _𝒏,𝒑 behave

asymptotically equivalent for large 𝑛

– Expected number of edges is 𝑚 = 𝑛

2 𝑝 for large 𝑛

• Law of large number will guarantee equivalence for 𝑝𝑛

²

→∞

– Thus, for large 𝑝𝑛

²

, statements about properties can be made like

• “Property P holds for most graphs in 𝒈

_𝒏,𝒑

”

⇔ “Property P holds for most graphs in 𝒈

_𝒏,𝒎= ^𝑛

2 𝑝

”

7.2 Gilbert Model

(29)

• Randomly generated graphs can be used to approximate properties of large random P2P networks

– Many basic properties of random graphs have been

established by Erdős & Rényi 1960 using large 𝒈

_𝒏,𝒑

, 𝒏 → ∞

• Asymptotical observations

– Many properties are directly dependent on the probability 𝒑 (or the number of edges 𝑚)

• Graphs show several phase transitions depending on the node/edge ratio,

• Each phase transition has a threshold at which certain properties suddenly becomes extremely probable

• Before or after the threshold, the probability of a property 𝑃 is either ℙ(𝑷) → 𝟎 or

ℙ(𝑷) → 𝟏 for 𝒏 → ∞

7.2 Random Graph Properties

(30)

• Predicting connected components

– For 𝒏 ∗ 𝒑 < 𝟏, a 𝒈

_𝒏,𝒑

graph will rarely have any connected components larger than 𝑂(log 𝑛)

• The graph is mainly unconnected, each of its component is very small

• e.g. for a graph 𝒈

_𝒏,𝒎

with 150 nodes, this threshold is roughly around 74 edges

7.2 Random Graph Properties

(31)

7.2 Random Graph Properties

𝒈 ≡ 𝒈 𝒈 ≡ 𝒈

• Example Graphs

– Statistical prediction: most components will be of logarithmic size w.r.t. to the number of nodes (i.e. will be small)

(32)

• Giant Connected Component

– For 𝑛 ∗ 𝑝 = 1, a graph 𝒈

_𝒏,𝒑

will very probably have a giant connected component of size in 𝑂(𝑛

²³

)

• e.g. for a graph 𝒈

_𝒏,𝒎

with 150 nodes, giant components should be observable for 75 edges and more

– Surprisingly, the giant component will appear when the average node degree is 1!

– For 𝒏 ∗ 𝒑 > 𝟏, all other components

will be of size 𝑶(𝒍𝒐𝒈 𝒏)

7.2 Random Graph Properties

(33)

• Example Graphs (giant component appears)

– Statistical prediction: for 𝑚 = 75 (𝑛 ∗ 𝑝 = 1), there is a largest component of size ≈ 28

7.2 Random Graph Properties

(34)

7.2 Random Graph Properties

𝒈

_{𝟏𝟓𝟎,𝒎=𝟑𝟎𝟎}

≡ 𝒈

𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟐𝟔𝟖

𝒈

_{𝟏𝟓𝟎,𝒎=𝟏𝟓𝟎}

≡ 𝒈

𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟏𝟑𝟒

• Example Graphs (other components diminish)

– Statistical prediction: no other component will be large

(35)

• Connectedness

– For 𝑝 <

^{ln 𝑛}

𝑛

, the graph will surely contain isolated vertices and will thus be disconnected

– For 𝑝 >

^{ln 𝑛}

𝑛

, the graph will usually be almost connected

• e.g. for a graph 𝒈

_𝒏,𝒎

with 150 nodes, this threshold around 374 edges

7.2 Random Graph Properties

(36)

7.2 Random Graph Properties

𝒈

_{𝟏𝟓𝟎,𝟑𝟕𝟒}

≡ 𝒈

𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟑𝟑

• Example Graphs (connectedness)

– Statistical prediction: for 𝑝 = 0.033 = ^{ln 𝑛}

𝑛 , the graph is almost surely connected

(37)

• Degree Distribution

– The node degree of large random graphs can be modeled with Poisson distribution

• Let 𝝀 be a constant 𝝀 = (𝒏 − 𝟏) ∗ 𝒑.

• Then the probability distribution of the node degrees 𝒌 = 𝟎, 𝟏, 𝟐, 𝟑, 𝟒, … can be approximated for 𝒏 → ∞ as the Poisson density ℙ 𝑿 = 𝒌 =

^𝝀^𝒌^𝒆^−𝝀

𝒌!

7.2 Random Graph Properties

(38)

• This degree distribution falls faster than an exponential

distribution in 𝑑, hence it is not a power-law distribution

• For larger 𝛌, behaves approximately similar to a normal distribution

7.2 Random Graph Properties

(39)

• Degree Distribution for 𝒈

𝟏𝟓𝟎,𝒑= ^𝟏

𝟏𝟓𝟎

and 𝝀 = 𝟏

– 69 edges

7.2 Random Graph Properties

estimated

measured

(40)

• Degree Distribution for 𝒈

𝟏𝟓𝟎,𝒑= ^𝟐

𝟏𝟓𝟎

and 𝝀 = 𝟐

– 142 edges

7.2 Random Graph Properties

estimated

measured

(41)

• Diameter

– If 𝑔 is connected, the expected diameter of 𝑔

_𝑛,𝑚

is in 𝑂(log 𝑛) with high probability

• i.e. the diameter of a connected random graph grows only logarithmically

• 𝑔_𝑛,𝑝 is surely connected for p ≥ ^{ln 𝑛}

𝑛 for 𝑛 → ∞

– or: 𝑔_𝑛,𝑚 is surely connected 𝑚 ≥ (𝑛

2)^{ln 𝑛}

𝑛

7.2 Random Graph Properties

𝑑(𝐺) = 7

(42)

• Clustering Coefficient

– The clustering coefficient of a random graph 𝑔

_𝑛,𝑝

is with high probability asymptotically equal to 𝒑 for 𝑛 → ∞

• This is a rather low clustering coefficient

7.2 Random Graph Properties

𝒈_{𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟑} 𝐶_𝑎𝑣𝑔 ≈ 0.0273

nodes colored by 𝐶

𝒈_{𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟔} 𝐶_𝑎𝑣𝑔 ≈ 0.0563

(43)

• Observation: “Real and natural” networks are not random, but have some inherent structure

– Many “naturally” occurring networks are very robust and efficient

• Social network among people

• Neural networks

• Power lines, the Internet, streets, etc.

• …

– What properties do real-life networks have?

• Why are they stable and efficient?

7.3 Small-World Graphs

(44)

• First real networks to be studied: Social Networks among people

– “Six degrees of Separation”

• First mentioned 1929 by the Hungarian star author Karinthy Frigyes in his short story “Chains”

– Claim: all 1½ billion people in the world (sic.) know Frigyes via at most five acquaintances

» Friend-of-a-friend connections – Motivated by two examples

7.3 Small-World Graphs

(45)

– Example 1: some 1929 Nobel price laureate,

• …knows King Gustav of Sweden…

• … who passionately plays tennis and knows a famous tennis champion…

• … who is a friend of Frigyes

– Example 2: unknown factory worker at a Ford manufacture

• Knows his boss, who knows Ford personally, who knows the director of the media house Hearst Publications, who knows the writer Árpád

Pásztor, who is a friend of Frigyes

7.3 Small-World Graphs

(46)

• This idea was scientifically examined in 1967:

Sociologist Stanley Milgram, Yale University

– Persons chosen at random in Kansas and Nebraska were asked to deliver a letter to a certain stock broker in Boston, MA

– This was the only information about the target person

– Constraint: The letter can only be given to persons one knows on a first name basis (acquaintances)

• 1967: No internet, transportation really expensive and cumbersome, close local communities

7.3 Small-World Graphs

S. Milgram (1933 - 1984)

(47)

• Letters used in the Milgram experiment

7.3 Small-World Graphs

(48)

• Those letters that reached the target person were only passed on over 6 mediators on average

– “6 degrees of separation“

– This was far less than originally assumed!

– Thus, social graphs were coined “Small-World Graphs”

• The original experiment was later criticized

– Only 50 persons took part in the original experiment

– Only 5% of letters were actually received by the target person

• But,…

– One letter was received within only 4 days – The small world effect was experimentally

observed in a vast variety of other sciences

7.3 Small-World Graphs

(49)

• Interesting trivia “Six Degrees of Kevin Bacon“

– Kevin Bacon once claimed that he's worked with everybody in Hollywood or someone who's worked with them

– College students build a party game out of that statement based on Milgram‘s ideas

• Basic idea:

– Link actors via a minimum number of movies to actor Kevin Bacon

– e.g., Val Kilmer was in “Top Gun” with Tom Cruise, and Tom Cruise was in “A Few Good Men” with Kevin Bacon

– Only approximately 12% of all actors cannot be linked to Bacon

> try: http://oracleofbacon.org/

7.3 Small-World Graphs

(50)

J. Leskovec, E. Horvitz. Worldwide Buzz: Planetary-Scale Views on an Instant- Messaging Network. Proc. International WWW Conference, 2008.

7.3 Small-World Graphs

Data June 2006

– 4.5 TB of compressed data.

– 245 million users logged in.

– 180 million users engaged in conversations.

– More than 30 billion conversations.

– More than 255 billion exchanged messages.

(51)

• Communication graph

– Edge if the users exchanged at least 1 message – 180 million people

– 1.3 billion edges

– 30 billion conversations

7.3 Small-World Graphs

(52)

7.3 Small-World Graphs

Average path length 6.6

90% of the people can be reached in <8 hops

(53)

• However, it took a while until such naturally

occurring networks have been formally understood

– Erdős–Rényi random graphs are bad models for natural networks

• Natural networks often show “hubs”

– There a some nodes with very high node degree

– Node degree better described by a power-law distribution than a Poisson distribution

• Natural networks often show a very high degree of local clustering

– High average cluster coefficients

– e.g. by local communities, friend cliques, co-worker networks, local transportation networks, etc…

• Natural networks often have a low average path length

7.3 Small-World Graphs

(54)

• First models for natural graphs were proposed by Duncan Watts and Steven Strogatz in1998

– Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.“ Nature 393 (6684): 409–10. doi:10.1038/30918

– They examined three real-world networks

• The simple neural “brain” network of the roundworm (nematode) Caenorhabditis Elegans

– A natural network

• Power grids networks

– A man-made network

• Collaborations networks between movie actors

– Semi-natural network

7.3 Small-World Graphs

(55)

• Watts and Strogatz mainly examined the average path length and the cluster coefficient

– Comparison with equally sized random graphs

• Similar node and edge number

– Result:

• Real networks have a much higher degree of local clustering (10x to 1000x higher) than random graphs

• Average path length is more or less similar

7.3 Small-World Graphs

(56)

• “Definition” of small-world graphs

– “A small world network is a network with a dense local structure and a diameter comparable to a random graph with same numbers of nodes and edges.”

• Additionally: “The node degree is homogenous”

• Watts and Strogatz also proposed the first

generative model for a certain class of small- world- graphs

– So called Watts-Strogatz graphs

• There are other small-world classes

7.3 Watts-Strogatz Graphs

(57)

• Properties of Watts-Strogatz graphs

– Low average path length

– High average clustering coefficients

– Homogenously distributed node degrees

• Good model for e.g. social or neural networks and most other “natural” networks

• Not a good model for most man-made grid-like networks

– Those show power-law distributed node degrees – By definition, these are not small-world graphs – e.g. internet, airline routes, train lines, etc.

– Watts-Strogatz graphs are between random and scale- free networks

7.3 Watts-Strogatz Graphs

(58)

• The generative model (Watts-Strogatz model)

– Graph is denoted as 𝑔_𝑤𝑠

_{𝑛,𝑘,𝑝}

• 𝒏 is the number of nodes (integer)

• 𝒌 is the neighborhood degree (integer)

• 𝒑 is the rewire probability (float in [0. . 1])

– Build a ring of 𝒏 vertices and connect each vertex with its 𝒌 clockwise neighbors on the ring

– Draw a random number between 0 and 1 for each edge

• Rewire each edge with probability 𝑝: if random number is larger than 𝑝, do nothing. Else rewire.

– Rewiring: keep the source vertex of the edge fixed, and choose a new target vertex uniformly at random from all other vertices

7.3 Watts-Strogatz Graphs

(59)

– For 𝑝 = 0, the resulting network is totally regular, with a clustering coefficient approaching

³

4

for large 𝑘, the diameter is in 𝑂(𝑛)

– For 𝑝 = 1, the resulting network is a kind of a random graph (regular random graph) with a diameter in 𝑂(log 𝑛)

7.3 Watts-Strogatz Graphs

k = 2

(60)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

– coloring by cluster coefficient

7.3 Watts-Strogatz Graphs

Erdős-Rényi Graph Watts-Strogatz with 𝑝 = 0.0

(61)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

7.3 Watts-Strogatz Graphs

(62)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

7.3 Watts-Strogatz Graphs

Watts-Strogatz with 𝑝 = 0.05 Watts-Strogatz with 𝑝 = 0.1

(63)

• Histogram of cluster coefficients

– Single sample

• Random

– Generally lower coefficient

• Small World

– Homogeneous, higher coefficient

7.3 Watts-Strogatz Graphs

p=0.00 p=0.02 p=0.05 p=0.10 random

Number of Nodes 01020304050

(64)

• Histogram of node degrees

– Same sample

• Random

– Homogeneous degree

– Higher variance

• Small World

– Homogeneous degree

– Low variance

7.3 Watts-Strogatz Graphs

1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10

p=0.00 p=0.02 p=0.05 p=0.10 random

Node Degree Number of Nodes 01020304050

(65)

• Investigating clustering coefficients and average path lengths in dependence of 𝒑

– For a graph with 5000 nodes

– Normalized by the clustering coefficient and the path length at 𝑝 = 0

• Clustering coefficient is still high for small 𝑝, but the

average path length decreases extremely fast due to ‘short cuts’

7.3 Watts-Strogatz Graphs

(66)

• The Watts-Strogatz model explains how a small- world graph can be constructed

– i.e. “How can locally densely connected graphs with shortcuts be constructed?”

• But: navigating a small-world can be very difficult!

– Assume “Six Degrees of Separation” was true:

route a message to any arbitrary person

• All people on earth would be reachable by just six acquaintances

– But which ones?

• Random navigation or flooding won’t help

– Exponentially many possibilities!

• Solution: Use clues and heuristics to quickly

route the massage into the correct neighborhood!

7.3 Kleinberg Navigability Model

(67)

• Challenging question: how can we find short

paths in a distributed fashion in a small-world?

– “Why should arbitrary pairs of strangers be able to find short chains of acquaintances that link them together?”

• J.M. Kleinberg, “Navigation in a Small-World”, Nature, 2000

– Some routing information is necessary

• Enough but not too much information!

7.3 Kleinberg Navigability Model

(68)

– Nodes see local parts of the network (neighborhood)

• i.e., they route the letter in a decentralized fashion

• In social networks additional information (same profession, address, hobbies, etc.) is used to decide which neighbor is

‘closest‘ to the recipient

– Milgram showed that the first steps of the letter were the geographically largest, while later steps were

closing in on the target area

7.3 Kleinberg Navigability Model

(69)

• A decentralized routing algorithm can be modeled as follows

– Let every node 𝑣 have a position 𝑃𝑜𝑠(𝑣) on a toroidal grid in a d-dimensional space

• 𝑃𝑜𝑠(𝑣) = (𝑥₁, 𝑥₂, … , 𝑥_𝑑) with all 𝑥_𝑖 being integers

– 𝑃𝑜𝑠(𝑣) is 𝑑-dimensional vector

– 𝑥_𝑖(𝑣) is the position of 𝑣 in dimension 𝑖

– Every node knows some basic information of the underlying grid structure

• i.e. its own position in the grid, its neighbors, and the target node

– no global knowledge, only local information

7.3 Kleinberg Navigability Model

(70)

– Each node hands the message (i.e., letter) to the one neighbor of 𝑣 that is closest to the target 𝑡

– The distance measure 𝑑

_𝑀

(𝑣, 𝑤) is given by the Manhattan Distance

• by the sum over the absolute difference

_𝑖

|𝑥

_𝑖

𝑣 − 𝑥

_𝑖

(𝑤)|

• Let the routing algorithm take place on the following network model

– Start with a 𝑑 -dimensional grid

– Add random edges between vertices v and w with a probability of 𝑃 𝑣, 𝑤 ~ 𝑑

_𝑀

𝑣, 𝑤

^−𝛼

• inverse 𝛼

^𝑡ℎ

-power distribution

7.3 Kleinberg Navigability Models

(71)

• Node 𝑢 is connected to all its neighbors (𝑎, 𝑏, 𝑐, and 𝑑) and has a long-range link to some randomly chosen node 𝑣 with a probability proportional to 𝑑𝑖𝑠𝑡 𝑢, 𝑣

^−𝛼

– The higher the distance, the lower the link probability

7.3 Kleinberg Navigability Model

(72)

• Theorem: The routing algorithm will find

‘short‘ paths, if and only if 𝜶 = 𝒅

– ‘short‘ means that arbitrary paths length are in 𝑂(log 𝑛)

– Simulation results on the greedy routing algorithm a 2-dimensional toroidal

grid with 20,000 × 20,000 nodes (averages over

1000 runs)

7.3 Kleinberg Navigability Model

(73)

• Idea behind the proof is that for any 𝛼 < 𝑑 there are too few random edges to form shortcuts

• For 𝛼 > 𝑑 there are too many random edges, and hence too many choices to which the message could be passed on

– The routing will degenerate into a random walk

• Kleinberg small-worlds thus provide a way of

building a peer-to-peer overlay network allowing for a simple, greedy, and distributed routing protocol

– But: How are nodes mapped to 𝑑-dimensional space such that the distance measurement is meaningful?

7.3 Kleinberg Navigability Model

(74)

• Small-World and random graphs show homogenous node degree distributions

– For small-world, distribution looks similar to a

normal distribution with 𝜇 = 2𝑘 for non-extreme 𝑝

• The actual model is more complicated

• 𝑘 is the number of neighbors of the initial ring

– Random graphs are Poisson distributed

• For larger 𝑚, will also approximate a normal distribution

• But many (especially artificial) real-life networks show extreme node degree distributions

– e.g. strong hub-topologies

7.4 Scale-Free Networks

(75)

• In 1999, Albert-László Barabási (Univ. of Notre Dame) crawled

parts of the WWW to investigate its actual structure

– The node degree is power-law distributed

• i.e., the probability that a node in the network is connects to k other nodes is 𝑃 𝑘 ~ 𝑘 − 𝛾

– (usually with 2 < 𝛾 ≤ 3)

– Most nodes have a small degree of around 1 to 2 – Few nodes have an extremely high node degree – High-degree vertices are called ‘hubs‘

• Albert-László Barabási. “Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life”. Plume.

2003. ISBN 978-0452284395

7.4 Scale-Free Networks

(76)

• Definition: Graphs with a power-law node degree distribution form ‘scale-free’ networks

– Also called power-law networks

• What kind of network model can generate this more realistic degree distribution?

– Barabási–Albert model builds a certain subset of scale-free networks

• Albert-László Barabási & Réka Albert."Emergence of scaling in random networks". Science, 1999 doi:10.1126/science.286.5439.509.

7.4 Scale-Free Networks

(77)

• Barabási–Albert model: Basic Idea

– In its simplest form denoted as 𝒈_𝒃𝒂

_𝒏,𝒎

• 𝑛 is the number of nodes in the graph

• 𝑚 is the number of edges added per time step

– The total number of edges is thus 𝑛 ∗ 𝑚

– Start with any initial graph of size 𝑛

₀

• 𝑛₀ ≥ 2 and degree of any node deg(𝑣) ≥ 1

• Often, just 𝑚 connected nodes are used as default initial network

– If initial network is not connected, the result network cannot be guaranteed to be connected

– Barabási–Albert graph is constructed iteratively by adding new nodes one by one until target size 𝑛 is reached

• Represents one time step in a simulated network growth

– i.e. Discrete Time Modeling

• Add nodes until target size 𝑛 is reached

• Each new node is connected to 𝒎 existing nodes

7.4 Barabási–Albert Graphs

(78)

– New edges are not added randomly, but favor higher-degree nodes

• “The rich get richer“

• Preferential attachment to higher-degree nodes

– The higher the degree of a possible target node, the higher the probability that the new node will attach to it

– Preferential attachment defines the probability

∏(𝒗) for vertex 𝑣 to get an edge to a new node

• In general, is proportional to the node degree, i.e.

∏ 𝒗 ~ 𝐝𝐞𝐠(𝒗)

• Most common definition is

∏ 𝑣 =

^{deg 𝑣}

𝑤∈𝑉 deg(𝑤)

7.4 Barabási–Albert Graphs

(79)

• Example: 𝒈_𝒃𝒂 _𝟓,𝟏

7.4 Barabási–Albert Graphs

𝒕 = 𝟎 𝒕 = 𝟏 − 𝜺

• Initial graph • Add new node 𝑣₃

• Probability for connecting any old

node 𝑣 to 𝑣₃ is given by ∏ 𝑣 = ^{deg 𝑣}

𝑤∈𝑉deg 𝑤

• e.g., connect to 𝑣₁

• Random decision steered by preferential attachment

𝑣₁ 𝑣₂

𝑣₃

∏(𝑣₂) =1

∏(𝑣₁) = 1 2 2

𝑣₁ 𝑣₂

𝑣₃ 𝒕 = 𝟏

(80)

• Example: 𝒈_𝒃𝒂 _𝟓,𝟏

7.4 Barabási–Albert Graphs

𝒕 = 𝟐 − 𝜺

• Add new node 𝑣₄

• Evaluate preferential attachment

• e.g. connect to 𝑣₁

∏(𝑣₃) =1 4

𝑣₁ 𝑣₂

𝑣₃

∏(𝑣₂) = 1

∏(𝒗_𝟐) = 𝟏 4 𝟐

𝑣₄

𝒕 = 𝟑 − 𝜺

• Add new node 𝑣₅

• Evaluate preferential attachment

• e.g. connect to 𝑣₁

∏(𝑣₃) = 1 6

𝑣₁ 𝑣₂

𝑣₃

∏(𝑣₂) = 1 6

𝑣₄

𝑣₅

∏(𝒗_𝟏) = 𝟏

∏(𝑣₄) = 1 𝟐 6

(81)

• Comparing Barabási–Albert Graphs

– 𝑛 = 50, ~50 edges

– coloring by node degree

7.4 Barabási–Albert Graphs

(82)

• Comparing Barabási–Albert Graphs

– 𝑛 = 100, ~100 edges

7.4 Barabási–Albert Graphs

Erdős-Rényi Graph Barabási–Albert Graphs

(83)

• Comparing Barabási–Albert Graphs

– 𝑛 = 100, ~150 edges

7.4 Barabási–Albert Graphs

(84)

• Histogram of node coefficients

– Single sample – 100 nodes – 300 edges

• Random

– Generally lower degree

• Small World

– Homogeneous degree

• Scale-Free

– Power-law – Hubs visible

7.4 Barabási–Albert Graphs

3 5 7 9 11 13 15 17 19 21 23 25 27

Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random

Node Degree Number of Nodes 010203040506070

Dampening factor for decreasing strength of preferential attachment

(85)

• Node degree for larger Barabási–Albert graphs

– 200k nodes – 400k edges – Logarithmic

Scale

7.4 Barabási–Albert Graphs

degree

relative frequency

(86)

• Histogram of cluster coefficients (𝐶 )

– Same sample

• Random

– Low 𝐶

• Small World

– Homogeneous high 𝐶

• Scale-Free

– Also power-law – Lower than SW

7.4 Barabási–Albert Graphs

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random

Cluster Coefficient Number of Nodes 010203040506070

(87)

• Important property of scale-free networks is robustness against random failures

– Removing a random vertex 𝑣 will likely hit a low-degree node

• Expected damage to network is small

– A failing high-degree node can severely damage a network

• Better fail-safety necessary for high-degree node to ensure overall robustness

• Thus, scale-free networks are very sensitive against attacks

– If a malevolent attacks explicitly target the highest degree nodes, the network can easily decompose

• Note: random graphs are not resilient against random failures, but also not particularly prone to attacks

– Most vertices more or less have the same degree

7.4 Scale-Free Networks

(88)

• Random Graph: 50 nodes, 50 edges

– Color by degree

7.5 Comparing Graphs

Property Value

Connected No

Diameter (conn.) 9

Avg. Path Length 4.39

#Clusters 6

Largest Cluster 39

k-connectedness 0

Avg. Cluster Coeff. 0.033

Avg. Degree 2

(89)

• Watts-Strogatz Graph: 50 nodes, 50 edges

7.5 Comparing Graphs

Connected No

Diameter (conn.) 35

#Clusters 2

Largest Cluster 38

k-connectedness 0

Avg. Cluster Coeff. 0

Avg. Degree 2

𝑝 = 0.05

(90)

• Barabási-Albert Graph: 50 nodes, 49 edges

7.5 Comparing Graphs

Connected Yes

Diameter 12

k-connectedness 1

Avg. Cluster Coeff. 0

Avg. Degree 1.96

𝑝𝑎 = 0.8