• Keine Ergebnisse gefunden

Distributed Data Management

N/A
N/A
Protected

Academic year: 2021

Aktie "Distributed Data Management"

Copied!
102
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Christoph Lofi

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

(2)

7.0 Introduction

7.1 Graph Model Basics

7.2 Random Graph Models

7.3 Small-World Graph Models 7.4 Scale-Free Graph Models 7.5 Network Examples

7.6 Network Models in P2P

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 2

7.0 Network Models

(3)

• Basic motivation for this lecture

– Can we show that a given P2P

network really has some desired properties?

– How can a P2P network be designed that it will, with high probability, show those desired properties?

• Large P2P networks are hard to evaluate

– In productive phase, usually no global view of the network is available

– In design phase, no large number of peers is available

7.0 Network Models

(4)

• Desirable System properties for P2P

Decentralized and a self-organized network

• No single point of failure or central bottleneck

• Maintaining the network (joining /leaving/ publishing new content) should be performed without any central authority or global view

Scalability

• The network should scale for any (possible large) number of nodes

– The structure of the network supports searching and retrieving information efficiently

• Obvious demand in information exchange systems

Reliability despite dynamic changes

• Network should be robust wrt. network and node failures

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 4

Book: P2P Systems and applications, pp 57-77

7.0 Network Models

(5)

• To examine the properties of a P2P network, good models are needed

• In this lecture, we focus on graph models for unstructured P2P networks

– Allows easy statistical analysis of network properties – Peers are represented by vertices in a graph

Entries in routing tables are represented by edges of the graph

– Peers are ego-centered and do not have global knowledge about all other peers and the data stored at those peers

• More complex P2P network protocols require dynamic simulation of networks to evaluate properties

7.0 Network Models

(6)

Outline for this lecture:

Network graph basics

– How can P2P networks be represented as graphs?

– Which properties can networks graphs have?

– What are desirable properties for a P2P graph?

Network models

– Many different network models have been studied during the last 60 years

• Some of them are useful to evaluate or design P2P networks

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 6

7.0 Network Models

(7)

Random Networks

• Simple network model to represent pure P2P networks like Gnutella

Small-World networks

• Naturally occurring networks showing very desirable properties which can be exploited by P2P systems

Scale-Free networks

• “Naturally” occurring networks in large infrastructures, like e.g. the internet or power grids

7.0 Network Models

(8)

• A directed graph 𝑮 is defined as a 𝐺 = (𝑽, 𝑬)

– 𝑽: a set of nodes or vertices 𝑽

– 𝑬: a set of directed edges between elements of 𝑽

• 𝑬 ⊆ 𝑽 × 𝑽

• For P2P networks, 𝑽 represents the set of peers

– |𝑉| = 𝑛

• 𝑬 represents all directed links in the P2P overlay network

– i.e. the union of entries in the routing table of all peers – If later examples use undirected links, it is assumed that

directed links in both directions exist – 𝐸 = 𝑚

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 8

7.1 Graph Theory

(9)

Node outdegree of a node 𝑣 is denoted 𝐝𝐞𝐠 + (𝒗)

– i.e., the number of vertices 𝑤 it is connected to by an edge (𝑣, 𝑤)

• deg

+

𝑣 = 𝑁 𝑣 = | 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}|

Node indegree of a node 𝑣 is denoted 𝐝𝐞𝐠 (𝒗)

– i.e., the number of vertices 𝑤 that are connected to 𝑣 by an edge (𝑤, 𝑣)

• deg

𝑣 = | 𝑤 ∈ 𝑉 𝑤, 𝑣 ∈ 𝐸}|

Node degree of a node 𝑣 is denoted 𝐝𝐞𝐠(𝒗) – deg 𝑣 = deg

+

(𝑣) + deg

(𝑣)

– For undirected graphs, only the node degree is defined

• no in- or out degree

Neighbors set of a node 𝑣 is denoted 𝐍(𝒗) – 𝑁 𝑣 = 𝑤 ∈ 𝑉 𝑣, 𝑤 ∈ 𝐸}

– For every neighbor 𝑤 ∈ 𝑁(𝑣) there exists an edge 𝑣, 𝑤 ∈ 𝐸

7.1 Graph Theory

(10)

Example:

– 𝑉 = 1, 2, 3, 4, 5

– 𝐸 = { 1,5 , 5, 4 , 4,5 , 2,4 , (2,1), (2,3)}

– 𝑁 2 = 1, 3, 4 – 𝑁 4 = {5}

– deg + (2) = 3, deg 2 = 0 – deg + (4) = 1, deg 4 = 2

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 10

7.1 Graph Theory

5

2

3

1 4

(11)

Path 𝑷(𝒗, 𝒘)

– A path 𝑃(𝑣, 𝑤) is a set of vertices {𝑣 0 , 𝑣 1 , … , 𝑣 𝑘 } with 𝑣 0 = 𝑣 and 𝑣 𝑘 = 𝑤 and 𝑣 𝑖 , 𝑣 𝑖+1 ∈ 𝐸 for all (0 ≤ 𝑖 ≤ 𝑘 − 1)

– The path length |𝑃(𝑣, 𝑤)| is defined as the number of edges in path P

– The distance 𝑑(𝑣, 𝑤) is defined as the shortest path length of any path between 𝑣 and 𝑤

7.1 Graph Theory

A shortest path between v, w with length 4

Thus, distance between A path between v, w with length 6

V W

(12)

• Metrics describing whole graphs:

Connectedness

– A graph is connected, if there is a

path from any node to any other node

k-Connectedness

– A graph is k-connected if the removal of 𝑘 − 1 nodes still leaves the graph connected

Bisection width 𝒃𝒔𝒘(𝑮)

– Bisection width of a graph 𝑮 is the minimal number of

edges which must be removed to split the graph into two equally-sized unconnected subgraphs

• Represents the minimal cohesion of the graph

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 12

7.1 Graph Theory

(13)

Graph diameter 𝒅(𝑮)

– Represents the maximum extent (path length) of a graph – The diameter of a graph is the maximal distance of

any pair of vertices

• 𝒅 𝑮 = 𝐦𝐚𝐱 𝒅 𝒗, 𝒘 ; 𝒗, 𝒘 ∈ 𝑽

Average path length 𝒅 𝒂𝒗𝒈 (𝑮)

– The sum of all distances between each pair of nodes

divided by the number of all pairs of nodes in a connected graph

• 𝒅 𝒂𝒗𝒈 𝑮 =

𝒊,𝒋 ∈ 𝑽𝒙𝑽

𝒏∗ 𝒏−𝟏 𝒅 𝒊,𝒋

7.1 Graph Theory

(14)

Graph outdegree 𝐝𝐞𝐠 + (𝑮)

– The average outdegree of all nodes of 𝑮

Graph indegree 𝐝𝐞𝐠 (𝑮)

– The average indegree of all nodes of 𝑮

• For undirected graphs, there is just degree 𝐝𝐞𝐠 𝑮

– The average degree of all nodes

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 14

7.1 Graph Theory

(15)

• The clustering coefficient 𝐂(𝐯) of vertex 𝑣 in a directed graph is given by

– The number of links between the vertices within its neighborhood divided by the number of links that could possibly exist between them

• The number of neighbors of 𝑣 is deg + (𝑣)

• The maximum number of connections between all neighboring nodes is deg + 𝑣 (deg + 𝑣 − 1)

– i.e. each neighbor connected with each other neighbor

– Describes how densely the neighbors of a vertex are interconnected

7.1 Graph Theory

(16)

– If 𝑒(𝑁(𝑣)) denotes the actual number of connections that

neighbors of 𝑣 have with each other, the clustering coefficient is 𝑪 𝒗 = 𝒅𝒆𝒈

+

𝒗 (𝒅𝒆𝒈 𝒆 𝑵 𝒗

+

𝒗 −𝟏)

• Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge:

Cambridge University Press.

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 16

7.1 Graph Theory

V

𝐶 𝑣 = 4 ∗ 2

4 ∗ 3 = 0.66 𝐶 𝑣 = 3 ∗ 2

3 ∗ 2 = 1 𝐶 𝑣 = 0

3 ∗ 2 = 0

V V

Links between neighbors of V

Maximum number of neighbor links (4 neighbors having at most 3 links)

(17)

• Which properties should a good P2P graph have?

Connectedness

• Each node should be reachable

• If not, some information is not accessible to all peers

k-Connectedness with large k

• Removing nodes should not immediately disconnect a graph

Low diameter 𝒅(𝑮)

• Low diameters are necessary to ensure reachability and reduce message load

– Low diameter → quicker TTL possible when flooding

7.1 Graph Theory

(18)

Low average path length 𝒅 𝒂𝒗𝒈 (𝑮)

• Most messages should quickly reach their target

Low average node degrees 𝒅𝒆𝒈(𝑮)

• The higher the node degree is, the more node states must be stored at nodes

• Increases size of routing tables

High average cluster coefficient

• Densely connected neighborhoods increase the failure-resilience of networks

• Distributed routing possible

– See later: Kleinberg Model

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 18

7.1 Graph Theory

(19)

Random graphs provide the easiest model for any network

Simple underlying assumptions

– Analyzable with statistical methods

• First family of network models studied (1950s)

– Multiple models for generating a random graph have been developed

– Most prominent generation models are

• the Erdös-Renyi random graph

• the Gilbert random graph

7.2 Random Graphs

(20)

• A random graph is usually denoted as 𝒈 𝒏,𝒎

– Random graph with 𝑛 nodes and 𝑚 edges

– For simplicity, we just consider undirected graphs

Basic idea for constructing a random graph

– Graph construction starts with 𝑛 vertices without any connections

– 𝑚 edges are added one by one between the vertices using some random system

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 20

7.2 Random Graphs

(21)

Pure peer-to-peer networks like Gnutella 0.4 can be modeled by a random graph

– Peers choose their neighbors more or less randomly

• Random bootstrapping, random Ping-Pong

• Unfortunately, “real” Gnutella 0.4

networks are usually not really random

– Bootstrapping is not random

• Usage special bootstrap nodes or bootstrap caches

– Ping-Pong strengthens connectedness of neighborhood and favors “strong” nodes

– Nodes prefer more popular and stronger nodes

• See later: scale-free networks

7.2 Random Graphs

(22)

• The behavior of random graphs is often studied for cases where the number of vertices diverges to infinity, i.e. 𝒏 → ∞

– In context of P2P, think of scalability!

– While the number 𝑚 of edges could be fixed, it is usually assumed that 𝑚 grows with 𝑛

• e.g. new nodes in a P2P network will also lead to new connections

• Fixed 𝑚 would quickly lead to mostly unconnected graphs

• Thus, usually 𝒎 is a function of 𝒏

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 22

7.2 Random Graphs

(23)

Erdős-Rényi graphs are the most popular family of random graphs (1959)

– There are two predominant models which are equivalent for large graphs

• 𝒈 𝒏,𝒎 models

– Based on randomly selecting an instance of all graphs with 𝑛 nodes and 𝑚 edges

• 𝒈 𝒏,𝒑 models

– Each possible edge has a certain probability 𝑝 to be added to a graph or not

– Also known as Gilbert graphs (1959)

7.2 Erdős-Rényi Graphs

(24)

Constructing 𝒈 𝒏,𝒎 graphs

– Let 𝑮 𝒏,𝒎 be the set of all labeled graphs with 𝑛 nodes and 𝑚 edges

• Labeled graphs: nodes are identifiable

– Unlabeled random graphs only consider the “shape” of graphs

• The number of all such graphs is given by the polynomial coefficient |𝑮 𝒏,𝒎 | = 𝑵

𝒎 = 𝒏

𝒎 𝟐

– The number of possible edges between 𝑛 nodes is 𝑵 = 𝒏 𝟐

– For generating an instance 𝒈 𝒏,𝒎 , any instance of 𝑮 𝒏,𝒎 is selected with equal probability

• Erdős, P.; Rényi, A. (1959). "On Random Graphs. I.". Publicationes Mathematicae 6: 290-297

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 24

7.2 Erdős-Rényi Graphs

(25)

Example: Constructing 𝑔 3,2 graphs

– There are 3 possible 𝑔 3,2 in 𝐺 3,2

• Each graph is selected with the probability 1 3

7.2 Erdős-Rényi Graphs

1 3

2

1 3

2

1 3

2

(26)

• The 𝒈 𝒏,𝒎 model of random graphs is not suitable for actually generating large random graphs

– Extremely high number of possible graphs for given 𝑛 and 𝑚

26

7.2 Erdős-Rényi Graphs

0 1 2 3 4 5 6 7 8 9 10

1 0 1 3 6 10 15 21 28 36 45

2 0 0 3 15 45 105 210 378 630 990

3 0 0 1 20 120 455 1330 3276 7140 14190

4 0 0 0 15 210 1365 5985 20475 58905 148995

5 0 0 0 6 252 3003 20349 98280 376992 1221759

6 0 0 0 1 210 5005 54264 376740 1947792 8145060

7 0 0 0 0 120 6435 116280 1184040 8347680 45379620

8 0 0 0 0 45 6435 203490 3108105 30260340 215553195 9 0 0 0 0 10 5005 293930 6906900 94143280 886163135 10 0 0 0 0 1 3003 352716 13123110 254186856 3190187286 11 0 0 0 0 0 1365 352716 21474180 600805296 10150595910 12 0 0 0 0 0 455 293930 30421755 1251677700 28760021745 13 0 0 0 0 0 105 203490 37442160 2310789600 73006209045 14 0 0 0 0 0 15 116280 40116600 3796297200 166871334960 15 0 0 0 0 0 1 54264 37442160 5567902560 344867425584 16 0 0 0 0 0 0 20349 30421755 7307872110 646626422970

#Nodes

#Edg es

(27)

For generative models: use probabilistic 𝒈 𝒏,𝒑 model of Erdős-Rényi random graphs

– So-called Gilbert graphs

• Gilbert, E.N. (1959). "Random Graphs". Annals of Mathematical Statistics 30: 1141- 1144.

– Number of nodes 𝒏 is fixed

– Each possible edge in 𝑉 × 𝑉 has the fixed probability 𝒑 to be added to the graph

• i.e. underlying assumption is that adding an edge is fully independent of all existing edges

• Larger 𝑝 will generate graphs with more edges, smaller 𝑝 will generate graphs with less edges

7.2 Gilbert Model

(28)

• Both models 𝒈 𝒏,𝒎 and 𝒈 𝒏,𝒑 behave

asymptotically equivalent for large 𝑛

Expected number of edges is 𝑚 = 𝑛

2 𝑝 for large 𝑛

• Law of large number will guarantee equivalence for 𝑝𝑛 2 →∞

– Thus, for large 𝑝𝑛 2 , statements about properties can made like

• “Property P holds for most graphs in 𝒈 𝒏,𝒑

⇔ “Property P holds for most graphs in 𝒈 𝒏,𝒎= 𝑛

2 𝑝 ”

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 28

7.2 Gilbert Model

(29)

• Randomly generated graphs can be used to approximate properties of large random P2P networks

– Many basic properties of random graphs have been

established by Erdős & Rényi 1960 using large 𝒈 𝒏,𝒑 , 𝒏 → ∞

Asymptotical observations

– Many properties are directly dependent on the probability 𝒑 (or the number of edges 𝑚)

• Graphs show several phase transitions depending on the node/edge ratio,

• Each phase transition has a threshold at which certain properties suddenly becomes extremely probable

• Before or after the threshold, the probability of a property 𝑃 is either ℙ(𝑷) → 𝟎 or

ℙ(𝑷) → 𝟏 for 𝒏 → ∞

7.2 Random Graph Properties

(30)

• Predicting connected components

– For 𝒏 ∗ 𝒑 < 𝟏, a 𝒈 𝒏,𝒑 graph will rarely have any connected components larger than 𝑂(log 𝑛)

• The graph is mainly unconnected, each of its component is very small

• e.g. for a graph 𝒈 𝒏,𝒎 with 150 nodes, this threshold is roughly around 74 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 30

7.2 Random Graph Properties

(31)

7.2 Random Graph Properties

𝒈 ≡ 𝒈 𝒈 ≡ 𝒈

Example Graphs

– Statistical prediction: most components will be of logarithmic size wrt. to

the number of nodes (i.e. will be small)

(32)

Giant Connected Component

– For 𝑛 ∗ 𝑝 = 1, a graph 𝒈 𝒏,𝒑 will very probably have a giant connected component of size in 𝑂(𝑛 2 3 )

• e.g. for a graph 𝒈 𝒏,𝒎 with 150 nodes, giant components should be observable for 75 edges and more

– Surprisingly, the giant component will appear when the average node degree is 1!

– For 𝒏 ∗ 𝒑 > 𝟏, all other components

will be of size 𝑶(𝒍𝒐𝒈 𝒏)

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 32

7.2 Random Graph Properties

(33)

• Example Graphs (giant component appears)

– Statistical prediction: for 𝑚 = 75 (𝑛 ∗ 𝑝 = 1), there is a largest component of size ≈ 28

7.2 Random Graph Properties

𝒈 ≡ 𝒈 𝒈 ≡ 𝒈

(34)

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 34

7.2 Random Graph Properties

𝒈 𝟏𝟓𝟎,𝒎=𝟑𝟎𝟎 ≡ 𝒈 𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟐𝟔𝟖

𝒈 𝟏𝟓𝟎,𝒎=𝟏𝟓𝟎 ≡ 𝒈 𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟏𝟑𝟒

• Example Graphs (other components diminish)

– Statistical prediction: no other component will be large

(35)

Connectedness

– For 𝑝 < ln 𝑛

𝑛 , the graph will surely contain isolated vertices and will thus be disconnected

– For 𝑝 > ln 𝑛

𝑛 , the graph will usually be almost connected

• e.g. for a graph 𝒈 𝒏,𝒎 with 150 nodes, this threshold around 374 edges

7.2 Random Graph Properties

(36)

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 36

7.2 Random Graph Properties

𝒈 𝟏𝟓𝟎,𝟑𝟕𝟒 ≡ 𝒈 𝟏𝟓𝟎,𝒑=𝟎.𝟎𝟑𝟑

• Example Graphs (connectedness)

– Statistical prediction: for of 𝑝 = 0.033 =

ln 𝑛

𝑛

, the graphs is almost surly

connected

(37)

Degree Distribution

– The node degree of large random graphs can be modeled with Poisson distribution

• Let 𝝀 be a constant 𝝀 = (𝒏 − 𝟏) ∗ 𝒑.

• Then the probability distribution of the node degrees 𝒌 = 𝟎, 𝟏, 𝟐, 𝟑, 𝟒, … can be approximated for 𝒏 → ∞ as the Poisson density ℙ 𝑿 = 𝒌 = 𝝀

𝒌

𝒆

−𝝀

𝒌!

7.2 Random Graph Properties

(38)

• This degree distribution falls faster than an exponential

distribution in 𝑑, hence it is not a power-law distribution

• For larger 𝛌, behaves approximately similar to a normal distribution

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 38

7.2 Random Graph Properties

(39)

• Degree Distribution for 𝒈 𝟏𝟓𝟎,𝒑= 𝟏

𝟏𝟓𝟎

and 𝝀 = 𝟏

– 69 edges

7.2 Random Graph Properties

estimated

measured

(40)

• Degree Distribution for 𝒈 𝟏𝟓𝟎,𝒑= 𝟐

𝟏𝟓𝟎

and 𝝀 = 𝟐

– 142 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 40

7.2 Random Graph Properties

estimated

measured

(41)

Diameter

– If 𝑔 is connected, the expected diameter of 𝑔 𝑛,𝑚 is in 𝑂(log 𝑛) with high probability

• i.e. the diameter of a connected random graph grows only logarithmically

• 𝑔 𝑛,𝑝 is surely connected for p ≥ ln 𝑛 𝑛 for 𝑛 → ∞

– or: 𝑔

𝑛,𝑚

is surely connected 𝑚 ≥ ( 𝑛 2)

ln 𝑛 𝑛

7.2 Random Graph Properties

𝑑

75,ln 76 75

, 𝑑(𝐺) = 7

(42)

Clustering Coefficient

– The clustering coefficient of a random graph 𝑔 𝑛,𝑝 is with high probability asymptotically equal to 𝒑 for 𝑛 → ∞

• This is a rather low clustering coefficient

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 42

7.2 Random Graph Properties

𝒈

𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟑

𝐶

𝑎𝑣𝑔

≈ 0.0273

nodes colored by 𝐶

𝒈

𝟏𝟎𝟎,𝒑=𝟎.𝟎𝟔

𝐶

𝑎𝑣𝑔

≈ 0.0563

(43)

• Observation: “Real and natural” networks are not random, but have some inherent structure

– Many “naturally” occurring networks are very robust and efficient

• Social network among people

• Neural networks

• Power lines, the Internet, streets, etc.

• …

– What properties do real-life networks have?

• Why are they stable and efficient?

7.3 Small-World Graphs

(44)

• First real networks to be studied: Social Networks among people

“Six degrees of Separation”

• First mentioned 1929 by the Hungarian star author Karinthy Frigyes in his short story “Chains”

– Claim: all ½ billion people in the world (sic.) know Frigyes via at most five acquaintances

» Friend-of-a-friend connections – Motivated by two examples

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 44

7.3 Small-World Graphs

(45)

– Example 1: some 1929 Nobel price laureate,

• …knows King Gustav of Sweden…

• … who passionately plays tennis and knows a famous tennis champion…

• … who is a friend of Frigyes

– Example 2: unknown factory worker at a Ford manufacture

• Knows his boss, who knows Ford personally, who knows the director of the media house Hearst Publications, who knows the writer Árpád

Pásztor, who is a friend of Frigyes

7.3 Small-World Graphs

(46)

• This idea was scientifically examined in 1967:

Sociologist Stanley Milgram, Yale University

– Persons chosen at random in Kansas and Nebraska were asked to deliver a letter to a certain stock broker in Cambridge, MA

– This was the only information about the target person

Constraint: The letter can only be given to persons one knows on a first name basis (acquaintances)

• 1967: No internet, transportation really expensive and cumbersome, close local communities

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 46

7.3 Small-World Graphs

S. Milgram (1933 - 1984)

(47)

• Letters used in the Milgram experiment

7.3 Small-World Graphs

(48)

• Those letters that reached the target person were only passed on over 6 mediators on average

– “6 degrees of separation“

– This was far less than originally assumed!

– Thus, social graphs were coined “Small-World Graphs”

• The original experiment was later criticized

– Only 50 persons took part in the original experiment

– Only 5% of letters were actually received by the target person

But,…

– One letter was received within only 4 days – The small world effect was experimentally

observed in a vast variety of other sciences

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 48

7.3 Small-World Graphs

(49)

• Interesting trivia “Six Degrees of Kevin Bacon“

– Kevin Bacon once claimed that he's worked with everybody in Hollywood or someone who's worked with them

– College students build a party game out of that statement based on Milgram‘s ideas

• Basic idea:

– Link actors via a minimum number of movies to actor Kevin Bacon

– e.g., Val Kilmer was in “Top Gun” with Tom Cruise, and Tom Cruise was in “A Few Good Men” with Kevin Bacon

– Only approximately 12% of all actors cannot be linked to Bacon

> try: http://oracleofbacon.org/

7.3 Small-World Graphs

(50)

• However, it took a while until such naturally

occurring networks have been formally understood

– Erdős–Rényi random graphs are bad models for natural networks

• Natural networks often show “hubs”

– There a some nodes with very high node degree

– Node degree better described by a power-law distribution than a Poisson distribution

• Natural networks often show a very high degree of local clustering

– High average cluster coefficients

– e.g. by local communities, friend cliques, co-worker networks, local transportation networks, etc…

• Natural networks often have a low average path length

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 50

7.3 Small-World Graphs

(51)

• First models for natural graphs were proposed by Duncan Watts and Steven Strogatz in1998

– Watts, D.J.; Strogatz, S.H. (1998). "Collective dynamics of 'small-world' networks.“ Nature 393 (6684): 409–10. doi:10.1038/30918

– They examined three real-world networks

• The simple neural “brain” network of the roundworm (nematode) Caenorhabditis Elegans

– A natural network

Power grids networks

– A man-made network

Collaborations networks between movie actors

– Semi-natural network

7.3 Small-World Graphs

(52)

• Watts and Strogatz mainly examined the average path length and the cluster coefficient

– Comparison with equally sized random graphs

• Similar node and edge number

– Result:

• Real networks have a much higher degree of local clustering (10x to 1000x higher) than random graphs

• Average path length is more or less similar

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 52

7.3 Small-World Graphs

(53)

• “Definition” of small-world graphs

– “A small world network is a network with a dense local structure and a diameter comparable to a random graph with same numbers of nodes and edges.”

• Additionally: “The node degree is homogenous”

• Watts and Strogatz also proposed the first

generative model for a certain class of small- world- graphs

– So called Watts-Strogatz graphs

• There are other small-world classes

7.3 Watts-Strogatz Graphs

(54)

• Properties of Watts-Strogatz graphs

Low average path length

High average clustering coefficients

Homogenously distributed node degrees

• Good model for e.g. social or neural networks and most other “natural” networks

• Not a good model for most man-made grid-like networks

– Those show power-law distributed node degrees – By definition, these are not small-world graphs – e.g. internet, airline routes, train lines, etc.

– Watts-Strogatz graphs are between random and scale- free networks

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 54

7.3 Watts-Strogatz Graphs

(55)

• The generative model (Watts-Strogatz model)

– Graph is denoted as 𝑔_𝑤𝑠 𝑛,𝑘,𝑝

• 𝒏 is the number of nodes (integer)

• 𝒌 is the neighborhood degree (integer)

• 𝒑 is the rewire probability (float in [0. . 1])

– Build a ring of 𝒏 vertices and connect each vertex with its 𝒌 clockwise neighbors on the ring

– Draw a random number between 0 and 1 for each edge

• Rewire each edge with probability 𝑝: if random number is larger than 𝑝, do nothing. Else rewire.

Rewiring: keep the source vertex of the edge fixed, and choose a new target vertex uniformly at random from all other vertices

7.3 Watts-Strogatz Graphs

(56)

– For 𝑝 = 0, the resulting network is totally regular, with a clustering coefficient approaching 3 4 for large 𝑘, the

diameter is in 𝑂(𝑛)

– For 𝑝 = 1, the resulting network is a kind of a random graph (regular random graph) with a diameter in 𝑂(log 𝑛)

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 56

7.3 Watts-Strogatz Graphs

Increasing randomness

p=0 p=1

k = 2

(57)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

– coloring by cluster coefficient

7.3 Watts-Strogatz Graphs

(58)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 58

7.3 Watts-Strogatz Graphs

Watts-Strogatz with 𝑝 = 0.01 Watts-Strogatz with 𝑝 = 0.03

(59)

• Comparing Watts-Strogatz Graphs

– 𝑛 = 50, 𝑚 = 150 ≡ 𝑘 = 3

7.3 Watts-Strogatz Graphs

(60)

• Histogram of cluster coefficients

– Single sample

Random

– Generally lower coefficient

Small World

– Homogeneous, higher coefficient

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 60

7.3 Watts-Strogatz Graphs

0.05 0.15 0.25 0.35 0.45 0.55 0.65

p=0.00 p=0.02 p=0.05 p=0.10 random

Cluster Coefficient Number of Nodes 01020304050

(61)

• Histogram of node degrees

– Same sample

Random

– Homogeneous degree

– Higher variance

Small World

– Homogeneous degree

– Low variance

7.3 Watts-Strogatz Graphs

p=0.00 p=0.02 p=0.05 p=0.10 random

Number of Nodes 01020304050

(62)

• Investigating clustering coefficients and average path lengths in dependence of 𝒑

– For a graph with 5000 nodes

– Normalized by the clustering coefficient and the path length at 𝑝 = 0

• Clustering coefficient is still high for small 𝑝, but the

average path length decreases extremely fast due to ‘short cuts’

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 62

7.3 Watts-Strogatz Graphs

p 1

(63)

• The Watts-Strogatz model explains how a small- world graph can be constructed

– i.e. “How can locally densely connected graphs with shortcuts be constructed?”

• But: navigating a small-world can be very difficult!

– Assume “Six Degrees of Separation” was true:

route a message to any arbitrary person

• All people on earth would be reachable by just six acquaintances

– But which ones?

• Random navigation or flooding won’t help

– Exponentially many possibilities!

• Solution: Use clues and heuristics to quickly

route the massage into the correct neighborhood!

7.3 Kleinberg Navigability Model

(64)

• Challenging question: how can we find short

paths in a distributed fashion in a small-world?

– “Why should arbitrary pairs of strangers be able to find short chains of acquaintances that link them together?”

• J.M. Kleinberg, “Navigation in a Small-World”, Nature, 2000

– Some routing information is necessary

• Enough but not too much information!

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 64

7.3 Kleinberg Navigability Model

(65)

– Nodes see local parts of the network (neighborhood)

• i.e., they route the letter in a decentralized fashion

• In social networks additional information (same profession, address, hobbies, etc.) is used to decide which neighbor is

‘closest‘ to the recipient

– Milgram showed that the first steps of the letter were the geographically largest, while later steps were

closing in on the target area

7.3 Kleinberg Navigability Model

(66)

• A decentralized routing algorithm can be modeled as follows

– Let every node 𝑣 have a position 𝑃𝑜𝑠(𝑣) on a toroidal grid in a d-dimensional space

• 𝑃𝑜𝑠(𝑣) = (𝑥

1

, 𝑥

2

, … , 𝑥

𝑑

) with all 𝑥

𝑖

being integers

– 𝑃𝑜𝑠(𝑣) is 𝑑-dimensional vector

– 𝑥

𝑖

(𝑣) is the position of 𝑣 in dimension 𝑖

– Every node knows the some basic information of the underlying grid structure

• i.e. its own position in the grid, its neighbors, and the target node

– no global knowledge, only local information

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 66

7.3 Kleinberg Navigability Model

(67)

– Each node hands the message (i.e., letter) to the one neighbor of 𝑣 that is closest to the target 𝑡

– The distance measure 𝑑 𝑀 (𝑣, 𝑤) is given by the Manhattan Distance

• by the sum over the absolute difference |𝑥 𝑖 𝑖 𝑣 − 𝑥 𝑖 (𝑤)|

• Let the routing algorithm take place on the following network model

– Start with a 𝑑 -dimensional grid

– Add random edges between vertices v and w with a probability of 𝑃 𝑣, 𝑤 ~ 𝑑 𝑀 𝑣, 𝑤 −𝛼

• inverse 𝛼 𝑡ℎ -power distribution

7.3 Kleinberg Navigability Models

(68)

• Node 𝑢 is connected to all its neighbors (𝑎, 𝑏, 𝑐, and 𝑑 ) and has a long-range link to some randomly chosen node 𝑣 with a probability proportional to 𝑑𝑖𝑠𝑡 𝑢, 𝑣 −𝛼

– The higher the distance, the lower the link probability

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 68

7.3 Kleinberg Navigability Model

(69)

Theorem: The routing algorithm will find

‘short‘ paths, if and only if 𝜶 = 𝒅

– ‘short‘ means that arbitrary paths length are in 𝑂(log 𝑛)

– Simulation results on the greedy routing algorithm a 2-dimensional toroidal grid with 20,000 × 20,000 nodes (averages over

1000 runs)

7.3 Kleinberg Navigability Model

(70)

• Idea behind the proof is that for any 𝛼 < 𝑑 there are too few random edges to form shortcuts

• For 𝛼 > 𝑑 there are too many random edges, and hence too many choices to which the message could be passed on

– The routing will degenerate into a random walk

Kleinberg small-worlds thus provide a way of

building a peer-to-peer overlay network allowing for a simple, greedy, and distributed routing protocol

But: How are nodes mapped to 𝑑-dimensional space such that the distance measurement is meaningful?

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 70

7.3 Kleinberg Navigability Model

(71)

Small-World and random graphs show homogenous node degree distributions

– For small-world, distribution looks similar to a

normal distribution with 𝜇 = 2𝑘 for non-extreme 𝑝

• The actual model is more complicated

• 𝑘 is the number of neighbors of the initial ring

Random graphs are Poisson distributed

• For larger 𝑚, will also approximate a normal distribution

• But many (especially artificial) real-life networks show extreme node degree distributions

– e.g. strong hub-topologies

7.4 Scale-Free Networks

(72)

• In 1999, Albert-László Barabási (Univ. of Notre Dame) crawled

parts of the WWW to investigate its actual structure

– The node degree is power-law distributed

• i.e., the probability that a node in the network is connects to k other nodes is 𝑃 𝑘 ~ 𝑘 − 𝛾

– (usually with 2 < 𝛾 ≤ 3)

Most nodes have a small degree of around 1 to 2 Few nodes have an extremely high node degree – High-degree vertices are called ‘hubs‘

• Albert-László Barabási. “Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life”. Plume.

2003. ISBN 978-0452284395

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 72

7.4 Scale-Free Networks

(73)

Definition: Graphs with a power-law node

degree distribution form ‘scale-free’ networks

– Also called power-law networks

• What kind of network model can generate this more realistic degree distribution?

– Barabási–Albert model builds a certain subset of scale-free networks

• Albert-László Barabási & Réka Albert."Emergence of scaling in random networks". Science, 1999 doi:10.1126/science.286.5439.509.

7.4 Scale-Free Networks

(74)

• Barabási–Albert model: Basic Idea

– In its simplest form denoted as 𝒈_𝒃𝒂 𝒏,𝒎

• 𝑛 is the number of nodes in the graph

• 𝑚 is the number of edges added per time step

– The total number of edges is thus 𝑛 ∗ 𝑚

– Start with any initial graph of size 𝑛 0

• 𝑛

0

≥ 2 and degree of any node deg (𝑣) ≥ 1

• Often, just 𝑚 connected nodes are used as default initial network

– If initial network is not connected, the result network cannot be guaranteed to be connected

– Barabási–Albert graph is constructed iteratively by adding new nodes one by one until target size 𝑛 is reached

• Represents one time step in a simulated network growth

– i.e. Discrete Time Modeling

• Add nodes until target size 𝑛 is reached

Each new node is connected to 𝒎 existing nodes

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 74

7.4 Barabási–Albert Graphs

(75)

New edges are not added randomly, but favor higher-degree nodes

“The rich get richer“

Preferential attachment to higher-degree nodes

– The higher the degree of a possible target node, the higher the probability that the new node will attach to it

Preferential attachment defines the probability

∏(𝒗) for vertex 𝑣 to get an edge to a new node

• In general, is propertional to the node degree, i.e.

∏ 𝒗 ~ 𝐝𝐞𝐠(𝒗)

• Most common definition is ∏ 𝑣 = deg 𝑣 deg (𝑤)

𝑤∈𝑉

7.4 Barabási–Albert Graphs

(76)

• Example: 𝒈_𝒃𝒂 𝟓,𝟏

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 76

7.4 Barabási–Albert Graphs

𝒕 = 𝟎 𝒕 = 𝟏 − 𝜺

• Initial graph • Add new node 𝑣

3

• Probability for connecting any old

node 𝑣 to 𝑣

3

is given by ∏ 𝑣 =

deg 𝑣deg 𝑤

• e.g., connect to 𝑣

1 𝑤∈𝑉

• Random decision steered by preferential attachment

𝑣

1

𝑣

2

𝑣

1

𝑣

2

𝑣

3

∏(𝑣2) =1 2

∏(𝑣1) = 1 2

𝑣

1

𝑣

2

𝑣

3

𝒕 = 𝟏

(77)

• Example: 𝒈_𝒃𝒂 𝟓,𝟏

7.4 Barabási–Albert Graphs

𝒕 = 𝟐 − 𝜺

• Add new node 𝑣

4

• Evaluate preferential attachment

• e.g. connect to 𝑣

1

∏(𝑣3) =1 4

𝑣

1

𝑣

2

𝑣

3

∏(𝑣2) = 1 4

∏(𝒗𝟐) = 𝟏 𝟐

𝑣

4

𝒕 = 𝟑 − 𝜺

• Add new node 𝑣

4

• Evaluate preferential attachment

• e.g. connect to 𝑣

2

∏(𝑣3) = 1 6

𝑣

1

𝑣

2

𝑣

3

∏(𝑣2) = 1 6

𝑣

4

𝑣

5

∏(𝒗𝟏) = 𝟏 𝟐

∏(𝑣4) = 1 6

(78)

• Comparing Barabási–Albert Graphs

– 𝑛 = 50, ~50 edges

– coloring by node degree

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 78

7.4 Barabási–Albert Graphs

Erdős-Rényi Graph Barabási–Albert Graphs

(79)

• Comparing Barabási–Albert Graphs

– 𝑛 = 100, ~100 edges

7.4 Barabási–Albert Graphs

(80)

• Comparing Barabási–Albert Graphs

– 𝑛 = 100, ~150 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 80

7.4 Barabási–Albert Graphs

Erdős-Rényi Graph Barabási–Albert Graphs

(81)

• Histogram of node coefficients

– Single sample – 100 nodes – 300 edges

Random

– Generally lower degree

Small World

– Homogeneous degree

Scale-Free

– Power-law – Hubs visible

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 81

7.4 Barabási–Albert Graphs

3 5 7 9 11 13 15 17 19 21 23 25 27

Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random

Node Degree Number of Nodes 010203040506070

Dampening factor for decreasing strength of preferential attachment

(82)

Node degree for larger Barabási–Albert graphs

– 200k nodes – 400k edges – Logarithmic

Scale

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 82

7.4 Barabási–Albert Graphs

degree

relativ e fr eq uen cy

(83)

• Histogram of cluster coefficients (𝐶 )

– Same sample

Random

– Low 𝐶

Small World

– Homogeneous high 𝐶

Scale-Free

– Also power-law – Lower than SW

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 83

7.4 Barabási–Albert Graphs

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Barabási(pa=0.5) Watts-Strogatz(p=0.05) Random

Cluster Coefficient Number of Nodes 010203040506070

(84)

• Important property of scale-free networks is robustness against random failures

– Removing a random vertex 𝑣 will likely hit a low-degree node

• Expected damage to network is small

– A failing high-degree node can severely damage a network

• Better fail-safety necessary for high-degree node to ensure overall robustness

• Thus, scale-free networks are very sensitive against attacks

– If a malevolent attacks explicitly target the highest degree nodes, the network can easily decompose

• Note: random graphs are not resilient against random failures, but also not particularly prone to attacks

– Most vertices more or less have the same degree

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 84

7.4 Scale-Free Networks

(85)

• Example: Airline Networks (Ryanair)

7.4 Scale-Free Networks

(86)

• Example: Airline Networks (Ryanair)

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 86

7.4 Scale-Free Networks

(87)

• Example:

Internet (2009)

• Measured by CAIDA skitter monitor in London

– ca. 535k nodes and 600k links

7.4 Scale-Free Networks

(88)

• Example:

Internet (2005)

• From http://www.opte.org

– Try full size!

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 88

7.4 Scale-Free Networks

(89)

• Example: Internet

– By geographic location

7.4 Scale-Free Networks

(90)

• :-)

90

7.4 Scale-Free Networks

(91)

Random Graph: 50 nodes, 50 edges

Color by degree

7.5 Comparing Graphs

Property Value

Connected No

Diameter (conn.) 9

Avg. Path Length 4.39

#Clusters 6

Largest Cluster 39

k-connectedness 0

Avg. Cluster Coeff. 0.033

Avg. Degree 2

(92)

Watts-Strogatz Graph: 50 nodes, 50 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 92

7.5 Comparing Graphs

Property Value

Connected No

Diameter (conn.) 35

Avg. Path Length 12.73

#Clusters 2

Largest Cluster 38

k-connectedness 0

Avg. Cluster Coeff. 0

Avg. Degree 2

𝑝 = 0.05

(93)

Barabási-Albert Graph: 50 nodes, 49 edges

7.5 Comparing Graphs

Property Value

Connected Yes

Diameter 12

Avg. Path Length 5.14

k-connectedness 1

Avg. Cluster Coeff. 0

Avg. Degree 1.96

𝑝𝑎 = 0.8

(94)

Random Graph: 50 nodes, 100 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 94

7.5 Comparing Graphs

Property Value

Connected No

Diameter (conn.) 6

Avg. Path Length 2.88

#Clusters 2

Largest Cluster 49

k-connectedness 0

Avg. Cluster Coeff. 0.058

Avg. Degree 4

(95)

Watts-Strogatz Graph: 50 nodes, 100 edges

7.5 Comparing Graphs

Property Value

Connected Yes

Diameter (conn.) 10

Avg. Path Length 4.6

k-connectedness 2

Avg. Cluster Coeff. 0.43

Avg. Degree 4

𝑝 = 0.05

(96)

Barabási-Albert: 50 nodes, 98 edges

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 96

7.5 Comparing Graphs

Property Value

Connected Yes

Diameter 4

Avg. Path Length 2.55

k-connectedness 2

Avg. Cluster Coeff. 0.23

Avg. Degree 3.92

𝑝𝑎 = 0.8

(97)

• What do real Peer-To-Peer Networks look like?

• Depends on the used protocols

– Some P2P networks like e.g. Freenet evolve

voluntarily in a small-world with a high clustering coefficient and a small diameter

– Analogously, some protocols, e.g., Gnutella, will

implicitly generate a scale-free degree distribution

• Implied by boot-strapping and Ping-Pong

7.6 Models in P2P

(98)

Freenet converges to a small-world network under medium load

• This is achieved by routing table updates

– Every file is correlated with a key (by a hash function)

– A file will then be stored at some node with a similar key

– At each peer, each request is forwarded to the node in its routing table having the closest key to the requested one

– If the request’s time-to-live expires or a node does not have

neighbors to send the file to, a backtracking ‘request failed’ message is sent

– If the request is successful, the file is sent back via the routing nodes and each node saves the file and adds the sending node’s address to its local routing table

• i.e., frequently requested files are replicated

– If the routing table is full, the least recently used (LRU) entry is evicted

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 98

7.6 Models in P2P

(99)

• Example of Freenet Routing

7.6 Models in P2P

A

B

C

D

E

F

key = 9

B’s routing table Key Pointer

6 C

15 D

D’s routing table Key Pointer

9 F

1 E

? key=9

9?

Sorry!

9? 9?

9?

C’s routing table

9 9 9

9 F

(100)

• What should Peer-to-Peer networks look like?

It depends…

• If it should be navigable in a decentralized fashion,

– Make it a small-world and implement Kleinberg‘s routing algorithm (or a variant, e.g., Symphony)

• If the peer-to-peer network could be under attack

– also make it a small-world, where most vertices have the same (low) degree

• If it is peer-to-peer network in a small and secure context, e.g. an intranet in a company,

– Make it a scale-free network.

• This allows to buy only a small number of servers with a high bandwidth.

These will work as 'hubs' of the network

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 100

7.6 Models in P2P

(101)

• The network structure of a peer-to-peer system influences:

– average necessary number of hops (path length)

– possibility of greedy, decentralized routing algorithms – stability against random failures

– sensitivity against attacks

– redundancy of routing table entries (edges)

– many other properties of the system build onto this network

• Important measures of a network structure are:

average path length clustering coefficient – the degree distribution

• Influence the edge generation rules such that a network structure arises showing the desired properties

7.6 Models in P2P

(102)

Content Distribution

– Swarming – BitTorrent

Error Correction

Privacy

– Dark Nets

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 102

Next Lecture

Referenzen

ÄHNLICHE DOKUMENTE

• Dynamo is a low-level distributed storage system in the Amazon service infrastructure.

– Specialized root tablets and metadata tablets are used as an index to look up responsible tablet servers for a given data range. • Clients don’t communicate with

• If an acceptor receives an accept request with higher or equal number that its highest seen proposal, it sends its value to each learner. • A value is chosen when a learner

• Basic storage is offered within the VM, but usually additional storage services are used by application which cost extra.

Distributed Data Management – Christoph Lofi – IfIS – TU Braunschweig 2?. 7.0

– Page renderer service looses connection to the whole partition containing preferred Dynamo node. • Switches to another node from the

– Specialized root tablets and metadata tablets are used as an index to look up responsible tablet servers for a given data range. • Clients don’t communicate with

•  Send accept message to all acceptors in quorum with chosen value.