Distributed Data Management

(1)

Jan-Christoph Kalo Stephan Mennicke

Institut für Informationssysteme

Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

(2)

5.0 Introduction and History 5.1 The First Generation

– Centralized P2P – Pure P2P

5.2 The Second Generation

– Hybrid P2P

5 Unstructured P2P Networks

(3)

• Peer To Peer (P2P) Systems

– P2P systems have been popularized in 1999 by Napster for sharing MP3’s

– Base Problem: How can resources easily be shared within a highly volatile and decentralized network of independent and autonomous peers (nodes)?

• There is a (potentially) large number of peers

• Peers may join or leave the network any time

• Only rudimentary features necessary

5.0 Peer-To-Peer Systems

(4)

• What can be shared?

– Information

• File & document sharing

– Bandwidth

• Load balancing

• Shared bandwidth

– Storage space

• DAS, NAS, SAN

• Storage networks

– Computing Power

5.0 Peer-To-Peer Systems

(5)

• What is a P2P network?

– A virtual overlay network for sharing resources

• Virtual and physical network are logically independent

• Mostly IP based

– Usually decentralized and self-organizing

– Peers can transfer data directly without intermediate servers

5.0 What is Peer-To-Peer?

(6)

• “Virtual” signaling network established via TCP connections between the peers

• Characteristics of the overlay topology:

– completely independent from physical network – Separate addressing and routing scheme

– No relation between physical network edges and overlay network edges

– Overlay network can be seen as graph

• Peers as nodes

5.0 Overlay Networks

(7)

5.0 Overlay Networks

(8)

• The topology of the overlay network may show different properties

– May be centralized or decentralized

– May use strict structures or may be unstructured – May be flat or organized in hierachies

– We will use these properties later to classify P2P systems!

• In this lecture only unstructured networks

5.0 Overlay Networks

(9)

• P2P technology was enabled by various technological and social developments

– Performance increase of home user’s personal computers

• When P2P system have been established in 1999, the average computing performance of a home PC was comparable to high end servers of the late 80s

– General availability of high-speed internet

• In 1999, DSL connections have been introduced

• Flat rate models gained momentum

5.0 Towards P2P

(10)

• Late 1960s: Establishment of the ARPANET

– “Advanced Research Projects Agency Network”

• Based on the concept of the “Intergalactic Computer Network”

of Prof. J.C.R. Licklider

– Funded by DARPA

• Share computing resources and documents between US research facilities

– The rumor that ARPANET was built in order to control the military after a nuclear war is NOT true!

– Most popular applications

• Email (1971), FTP (1974) and TelNet (1969)  client/server model

– Central steering committee to organize the network – Later became “the internet”

5.0 Towards P2P

(11)

• 1979: Development of the UseNet protocol

– Newsgroup application to organize content

– Newsgroup server network exhibits some P2P characteristics

• No central server but a server network (compare to super-peer-networks)

• Clients only communicate with a server which may reroute the requests to other servers

– Different groups for different content

• Initially, only text messages

• Later: infamous BIN groups usually distributing copyrighted music, software or movies

5.0 Towards P2P

(12)

• ~1990 rush of the general public to join the Internet

– The WWW is invented at CERN by Tim-Berners Lee – Centrally hosted, interlinked websites are state-of-art

• Illegal file sharing using warez sites…

5.0 Towards P2P

(13)

• Northeastern University, Boston, June 1999

– Shawn Fanning (19) and Sean Parker (20) invent Napster

• Problem: Both liked to share music and software (for free…)

• But: warez sites, UseNet binary groups, and IRC bots were very painful to use

– Bad search, broken links, tiny retention caches, low bandwidth, etc..

• Idea: establish a system offering powerful search

capabilities, no broken links, and performance which increases with the number of users!

5.0 Towards P2P

(14)

• Basic Idea of Napster

– Users store music on their home PCs

– Users connect to the Napster server and provide a list of all songs they currently have

– Users can query the Napster server for any song

• Result: a list of all users currently possessing that song

– User can download the song directly from another user

• Peer-to-Peer!

5.0 Towards P2P

(15)

5.0 Towards P2P

(16)

• Napster Inc. initially aimed at being a market place for digital music

– Like iTunes today

– Napster tried multiple times

to establish usage agreements with record labels, but they failed

• No legal business model for selling single songs possible

• Labels felt threatened by Napster’s fast growth

• Negotiations have been stopped by labels

5.0 Towards P2P

(17)

• December 1999:

– RIAA files a lawsuit against Napster Inc.

• Target of the RIAA: the central lookup server of Napster

• February 2001:

– 2.79 billion files exchanged via the Napster network per month

• July 2001: Napster Inc. is convicted

– Napster has to stop the operation of the Napster server – Napster network breaks down

– BUT: Already a number of promising successors available

5.0 Towards P2P

(18)

• May 2002:

– Bertelsmann tries to buy Napster assets for $85 Million

– American courts blocks the transaction and forces Napster to liquidate all assets

• Roxio buys the logo and name in the bankruptcy auction

– Roxio owned an iTunes-like store called “pressplay” which was rebranded with the Napster cooperate design

– Launch of new Napster in October 2003 as a centralized paid- subscription service

• Not very successful because it launched shortly after iTunes and without hardware support

– Sold 2008 to BestBuy for $121M

• Rhapsody purchased Napster in 2011

– Rhapsody is a music streaming service who changed its name to

5.0 Towards P2P

(19)

• Generally, the RIAA lawsuit is considered a big failure

– Napster could have become an early iTunes if labels had cooperated

– The lawsuit gave birth to an even more “dangerous”

software: e.g., Gnutella

• Open source

• Fully decentralized

– A Gnutella network cannot be shut down

– No company to sue, no servers to disconnect, …

– P2P piracy became even stronger after Napster was convicted due to publicity

5.0 Towards P2P

(20)

• The “hot” years for P2P have been 1999-2008

• In 2006, nearly 70% of all network traffic was attributed to P2P traffic

– Nowadays, P2P traffic declines in favor of video streaming and social networks...

5.0 P2P Development

(21)

• 2013:

– Netflix , Hulu, and Youtube eat 50% of US Internet Downstream Traffic

• File sharing via P2P is below 5.0%

• In the UK, P2P is still around 20%

5.0 P2P Development

(22)

• 2013

5.0 P2P Development

(23)

• 2013

5.0 P2P Development

(24)

• First generation peer-to-peer networks tried simple paradigms to build the network

– Centralized directory model:

all content listed in a central directory whose server also is used as a central point of connection

– Pure peer-to-peer model:

there is no central authority, but

peers do only connect to neighbors in the network

5.1 The First Generation

(25)

• Centralized directory model

– Index service provided centrally by a coordinating entity – Search requests are issued to the coordinating entity

• Returns a list of peers having the desired files available for download

– Requesting peer obtains respective files directly from the peer offering them

• Characteristics

– Lookup of existing documents can be guaranteed

– Index service as single point of failure

• Representative: Napster

5.1 Centralized Directories

(26)

? Prince

purple rain ? ! Prince

purple rain !

Central database with index of all shared files Peer X shares on his

client several MP3- files

Peer Y shares on his client several MP3-files,

too.

5.1 Example: Napster

! Prince purple rain !

@ Peer Y

? Prince purple rain ?

(27)

• All peers are connected to a central entity

– Central entity is necessary to provide network services

• Joining the network: central server is also the bootstrap-server

• Central entity can be established as a server farm, but one single entry point (also single point of failure)

• All signaling connections are directed to central entity

– Central entity is some kind of index/group database – Central entity has a lookup/routing table

• Peers establish connections between each other on demand to exchange user data

5.1 Centralized P2P

(28)

• Peer  Central Entity: special P2P protocol, e.g., Napster protocol

– Registering/logging on to the overlay – Finding content

– Updating shared content information – Update the routing tables

• Peer  Peer: HTTP

– Exchanging the actual content

5.1 Protocols Used

(29)

5.1 Centralized Topology

Peer

Connection between 2 peers (TCP)

Connection between router & peer Connection between

routers (Core)

(30)

5.1 Centralized Topology

• SETI@Home:

– An Internet-based public volunteer computing project employing the BOINC software platform at the

University of California, Berkeley, in the United States.

– SETI (Search for Extraterrestrial Intelligence) is a

scientific area whose goal is to detect intelligent life

outside the earth.

(31)

5.1 Centralized Topology

(32)

• SETI@Home:

– Released to the public on May 17, 1999.

– Not P2P in its design, but it illustrates how many computers can solve a large divisible problem.

http://setiathome.berkeley.edu/kiosk/ (map showing transmissions of data)

5.1 Centralized Topology

(33)

5.1 Centralized Topology

(34)

• Analysis of over 160 TB of data.

• Over 6 million volunteers have run SETI@home during its more than 10 year history.

• SETI@home is one of the largest supercomputers on our planet, currently averaging 3.5 Peta-FLOPS actual performance.

5.1 Centralized Topology

(35)

• It uses Berkeley Open Infrastructure for Network Computing (BOINC)

– A high performance distributed computing platform.

– Software that can use unused CPU and GPU cycles on a computer to do scientific computing.

– Source: https://boinc.berkeley.edu/

5.1 Centralized Topology

(36)

• BOINC, scientific applications using it:

– Climate modeling/global warming studies, (climatePrediction.net),

– HIV, malaria and cancer drug research (World Community Grid),

– Particle physics (LHC@home), gravity waves

(Einstein@home), protein structure (Rosetta@home) – More than 50 distributed projects using BOINC.

5.1 Centralized Topology

(37)

• The PeerJS library

– P2P data, video, and audio calls using WebRTC.

• WebRTC initiative is a project supported by Google, Mozilla and Opera, amongst others.

• Provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs.

5.1 Developer perspective

(38)

5.1 Developer perspective

(39)

• Search Request

– User sends out a music file request and Napster

searches its central data base

5.1 Wrap-Up

0101 1001 1001

(40)

• Search Response

– The Napster Server sends back a list of peers that share the file

5.1 Wrap-Up

(41)

• File Download

– The requesting user downloads the file directly from the

computer of

another Napster user via HTTP

5.1Wrap-Up

(42)

• Advantages

– Fast and complete lookup (one hop lookup) – Central managing/trust authority

– Easy bootstrapping

• Disadvantages

– Single Point of Failure is a bottleneck and makes it easily attackable

– Central server in control of all peers

• Usage

– Application areas: file sharing, VoIP (SIP, H.323)

5.1 Discussion: Centralized P2P

(43)

• March 2000: Nullsoft releases Gnutella for free

– Nullsoft planned to release the source code und GPL license a couple of days later

• Developed by Justin Frankel and Tom Pepper

– Nullsoft’s mother company AOL cancels the distribution and further development of Gnutella a day after its release

• AOL merged with Time Warner shortly after buying NullSoft for

$100M

– The Gnutella protocol is reverse engineered and distributed under GPL license

– Many compatible clients and various forks of Gnutella are developed

– Became extremely popular after Napster had to shut down

5.1 Gnutella

(44)

• August 2001

– Users adapt very fast to the breakdown of Napster

– Already 3.05 billion files exchanged per months via the Gnutella network

• 2001

– Invention of structured and hybrid P2P networks

• Gnutella scaled badly, new network paradigms necessary

• e.g. KaZaA which quickly gains popularity

• August 2002

– Amount of exchanged data in KaZaA (FastTrack) decreases, caused by a high number of defected files

• weak hash keys to identify files provoked file collisions

5.1 Gnutella

(45)

• May 2003

– BitTorrent is released

– BitTorrent quickly becomes most popular file sharing protocol

• Middle of 2003

– Beyond the exchange of content, new concepts are developed to use P2P also for other applications

– Skype, a Voice over P2P application, is developed

• 2005:

– Major efforts are made to increase the reliability of P2P- searches, also in mobile networks, …

5.1 Gnutella

(46)

– In 2005, Ebay buys Skype for communication between bidders and sellers for $2.6 Billion

– In 2009, an investor group buys 65% of Skype for $1.9 Billion

– Plans are to turn Skype into an own company again in 2010

• But instead is bought by Microsoft for $8.5B

– 13% of all international phone calls are handled by Skype in 2010

5.1 Skype

(47)

• Which protocols are used?

– Traffic measured between 2002 and 2004 in Abilene backbone

5.1 Gnutella

(48)

• The base idea of Gnutella was to avoid the weaknesses of Napster

• Result:

– Fully decentralized, unstructured and flat P2P network

– Initially: All peers are equal!

• Gnutella 0.4

• Thus called a pure P2P system

5.1 Gnutella

Short Break!

(49)

• The base idea of Gnutella was to avoid the weaknesses of Napster

• Result:

– Fully decentralized, unstructured and flat P2P network

– Initially: All peers are equal!

• Gnutella 0.4

• Thus called a pure P2P system

5.1 Gnutella

(50)

• Pure P2P systems have following characteristics

– Decentralized

• There is no central authority

• “Any” peer can be removed without loss of functionality

– Unstructured

• Overlay network is constructed randomly without any structure

• All peers are equal

5.1 Pure P2P

Peer Connection between

(51)

5.1 Pure P2P : Graphs

Major component

Separate sub networks

Sample Graph

(52)

• To query pure networks, a Flooding Request Model is used

– Search request is passed on to neighbors.

– Neighbors forward the message to their respective neighbors

– When a node can answer the request located, results notifications are sent to the requesting node

– Requesting peer then can establish a direct connection to any peer

which sent a result notification

5.1 Pure P2P: Flooding

(53)

• Request flooding relies on message forwarding

– If a peer receives a request, it usually forwards the request to all its neighbors

• Forwarding every message to all neighbors in an

uncontrolled fashion will soon overload the network

– One node could spam the whole network – Message can be caught in unbounded cycles

• Restrictions needed!

– Each message has a maximum time-to-live (TTL) and a hop counter

• The hop counter is initially set to 0

• Each forwarded message will have the hop counter increased by 1

• A message with TTL=hop counter is not forwarded and dies

• TTL thus limits the maximum distance a message can travel

– Prevents spamming the whole network

5.1 Pure P2P: Flooding

(54)

– Every message which is forwarded is cached by the forwarding peer for a short time

– Message cache is used to prevent message cycles

• Don’t forward a message which you already forwarded!

5.1 Pure P2P: Flooding

(55)

• Response messages are routed back to the original requester using the same message trail

– Use message caches to perform back-tracking

• For each forwarded message stored in the cache, also store the node from which the message was received

• If a response message is received, look up the respective request message in cache

– Forward response to the node which sent in the request

5.1 Pure P2P: Flooding

(56)

• Effects of flooding technique

– Fully decentralized, no central lookup server

• No single point of failure or control

– Unreliable lookup (no guarantees)

• Time-to-live limits the maximum distance a query can travel

• Query may be restricted to a sub-network not containing the desired results

– System doesn't scale well

• Number of message increases drastically with number of peers

• Peers with low-bandwidth connections are rendered

5.1 Pure P2P: Flooding

(57)

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

• Broadcast

Query[XYZ, TTL = 3, …]

(58)

5.1 Pure P2P: Flooding

Query […]

Query-Hit […]

Requested Data

• 1. Query-Hit

• Broadcast

Query[XYZ, TTL = 2, …]

(59)

5.1 Pure P2P: Flooding

Query […]

Query-Hit […]

Requested Data

• 2. Query-Hit

• Broadcast

Query[XYZ, TTL = 1, …]

(60)

5.1 Pure P2P: Flooding

Query […]

Query-Hit […]

Requested Data

• 3. + 4. Query-Hit

• [TTL = 0]  no further

Broadcast

(61)

5.1 Pure P2P: Flooding

Query […]

Query-Hit […]

Requested Data

• Establish HTTP

Connection

(62)

5.1 Pure P2P: Flooding

Query […]

Query-Hit […]

Requested Data

• HTTP Connection Get[XYZ, …, …]

Download Data

(63)

• How can a new node join a pure P2P network?

– No central server – Network is volatile

• Bootstrapping necessary

– Usually not part of the protocol specification

• Implemented by client

– Necessary to know at least one active participant of the network

• Otherwise no participation at the overlay possible for a new node

5.1 Pure P2P: Bootstrapping

(64)

• The address of an active node can be retrieved by different means

– Bootstrap cache

• Try to establish a connection to any node known from the last user session

– Stable nodes

• Connect to a “well known host” which is usually always in the network

– Bootstrap server

• Ask a bootstrap server to provide a valid address of at least one active node

• Realizations:

– FIFO of all node-addresses which recently used this bootstrap (a node which just connected is assumed to be still active)

– Random pick of addresses which recently connected via this server to the overlay

5.1 Pure P2P: Bootstrapping

(65)

– Broadcast on the IP layer

• Use multicast channels

• Use IP broadcasting

– limited to local network

– Bootstrap lists

• Maintain a list of potential bootstrap servers outside the network, e.g. on a website

– Used by most file sharing clients

5.1 Pure P2P: Bootstrapping

(66)

• Bootstrapping

– Via bootstrap-server (host list from a web server) – Via peer-cache (from previous sessions)

– Via well-known host

• Routing

– Completely decentralized

– Reactive protocol: routes to content providers are only established on demand, no content announcements

– Requests: flooding (limited by TTL and GUID)

– Responses: routed (Backward routing with help of GUID)

• Content transfer connections (temporary)

– Based on HTTP

– Out of band transmission

5.1 Pure P2P: Summary

(67)

• How is pure P2P implemented in Gnutella 0.4?

• Application-level, peer-to-peer protocol over point-to- point TCP

– Router Service

• Flood incoming requests

– regard TTL!

• Route responses for other peers

– Regard GUID of message

– Keep alive messages (PING/PONG) – Content responses (QUERYHIT)

– Lookup Service

• Initialize query requests

• Initialize keep alive requests

– Download service

• Establish direct connection for download

5.1 Gnutella 0.4

G

G G

G

TCP connection G

Peer

G

(68)

– Connect to at least one active peer

• address received from bootstrap

– Explore your neighborhood

• PING/PONG protocol

– Submit Query with a list of keywords to your neighbors

• Neighbor forward the query

– Receive QUERYHIT messages

• Select the most promising QUERYHIT message

– Connect to providing peer for file transfer

5.1 Gnutella 0.4

(69)

• Ping-Pong Messages

– Each “Ping” message is answered by a “Pong” message – Keep-Alive-Ping-Pong

• Simple messages with TTL one sent to neighbors

• Tests if neighbor is offline / disconnected / overloaded

– Exploration-Ping-Pong

• Used to explore and gather

information of a node’s neighborhood

• Higher TTL

• Pings are forwarded, Pongs are returned carrying information on other peers

– Uptime, Bandwidth, number of shared files, etc.

• Store information about neighboring nodes in a peer cache

5.1 Gnutella 0.4

(70)

• Exploiting the Ping-Pong cache

– Use stable peers in next bootstrap process – Boost your connectivity

• Add additional direct links to strong remote neighbors

– Compensate direct neighbor failures

• Just reconnect to a remote neighbor

• Ping-Pong protocols use a lot of bandwidth and are avoided in most modern protocols

5.1 Gnutella 0.4

(71)

5.1 Gnutella 0.4

Measurements taken at the LKN in May 2002

(72)

5.1 Gnutella 0.4

General Header Structure:

Describes the

message type (e.g.

ping/pong, search,…)

Describes

parameters of the message (e.g. IDs, keywords,…)

General Header Structure:

GnodeID

16 Bytes

Function

1 Byte

MESSAGEHEADER: 23Byte

TTL

1 Byte

Hops

1 Byte

Payload Length

4 Bytes

• GnodeID: unique 128bit Id of the Hosts

• TTL(Time-To-Live): number of nodes a message may pass before it is killed

(73)

5.1 Gnutella 0.4

2 BytesPort IP Address 4 Bytes

PING (Function:0x00)

Nb. of shared Files

4 Bytes Nb. of Kbytes shared 4 Bytes

No Payload PONG (Function:0x01)

Minimum Speed

2 Bytes Search Criteria

n Bytes

QUERY (Function:0x80)

Nb. of Hits

1 Byte Port

2 Bytes GnodeID

16 Bytes Result Set

n Bytes

QUERY HIT (Function:0x81)

Speed 1 Byte

File Index 4 Bytes

File Name n Bytes IP Address

4 Bytes

(74)

5.1 Gnutella 0.4

• Flooding: Received PINGS and QUERIES must be forwarded to all connected Gnodes

• PINGS or QUERYS with the same FUNCTION ID and GNODE ID as previous messages are

destroyed (avoid loops)

• Save Origin of received PINGs and QUERIEs

• Increase Hops by 1

• If Hops equals TTL, kill the message

• PONG and QUERY HIT are forwarded to the origin of the according PING or QUERY

• Basic Routing Principle: „Enhanced“

Flooding

(75)

5.1 Gnutella 0.4: Ping-Pong

GNODE ID: 2000

IP: 002

GNODE ID: 3000 IP: 003

GNODE ID: 4000

IP: 004 GNODE

ID: 1000 IP: 001

2526

17

2224

17Gnutella Connect 18Gnutella OK

19PING 20PONG/IP:004

21PING 23PONG/IP:001 27PONG/IP:001

22PING 24PONG/IP:003 28PONG/IP:003 25PING

26PING 18

1920 2728

Gnode 2000 establishes a connection to 4000

(76)

5.1 Gnutella 0.4: Ping-Pong

1

7 3

2

4 5

6

8

Gnu-Con Gnu-Con

Peer7 Peer3 Peer1 Peer5 Peer2 Peer4 Peer6

Gnu-Con OK

OK

OK PING PING

PING PING

PING

PONG PING

PING

PONG

Peer8

PING

PONG

PONG PONG

PONG PING

Sample Gnutella 0.4 network: Sample message sequence

chart according to the sample network:

(77)

• Disadvantages

– High signaling traffic due to flooding

– Low bandwidth nodes may become bottlenecks – No search guarantees

– Overlay topology not optimal

• no complete view available

• no coordinator

– If not adapted to physical structure network load is sub-optimal

• Zigzag routes

• Loops

• Advantages

– No single point of failure

– Can be adapted to physical network – Can provide anonymity

• Routing anonymously, direct connection for transfer

• Application areas

– File-sharing (Freenet, Gnutella, Gnunet) – Context based routing Systems

5.1 Pure P2P: Discussion

(78)

• A major problem of pure P2P systems is the limited scalability

– Main reason

• Random network layout

– Possibly degenerated network with high

diameters and potentially small bisection width

– Especially, weak nodes may easily become bottlenecks

– Request message often don’t reach their intended destinations (TTL too short) or clog the whole network (TTL too long)

• Also, in reality, not all peers are equal

– Weak modem peer always going on and off

5.2 Hybrid P2P

(79)

• Idea of, e.g. Gnutella 0.6:

– Take advantage of “stronger” peers, minimize damage “weak” peers can do

– Strong peers are promoted to super peers or ultra peers

• Have high uptime

• Posses high-bandwidth, low-latency network

• High computational power

• High storage capacity

5.2 Hybrid P2P

(80)

• Hybrid P2P uses a hierarchical network layout

– Super peers form a pure P2P network among themselves

– All other peers (Leaf peers) directly attach to one super-peer

– Super-peer network acts as distributed file index

• Super-peers request file lists from their leaf peers

• i.e. each super “knows” what is offered by its leafs

– Queries are distributed in super-peer subnet only – Combination of Pure and Central P2P!

5.2 Hybrid P2P

(81)

• Network characteristic, compared to pure P2P

– Hub based network

– Reduces the signaling load without reducing the reliability

– Election process to select and assign super peers

• Voluntarily or by statistics

– Super peers:

• high node degree (degree>>20, depending on network size)

– Leaf nodes:

• connected to one or

more Superpeers (degree<7)

5.2 Hybrid P2P

Leafnode

(82)

5.2 Hybrid P2P

Sample graph

Major component

Separate sub networks

Hub connections (2nd hierarchy)

Superpeer

Leafnode

(83)

• Bootstrapping:

– Via bootstrap-server

(hosted list from a web server)

• Contains super peer addresses

– Via peer-cache (from previous sessions)

– Registration of each leaf node at the super peer it connects to

• e.g. it announces its shared files to the super peer

• Super peer updates routing tables

– Table containing which file is shared by which node

– Super-peer may perform some load balancing

• Hand peer over to another super peer if super peers are unbalanced

• Suggest a node to be promoted to a super peer

5.2 Hybrid P2P

(84)

• Routing

– Partly decentralized

• Leaf nodes send request to a Super peer

• Super peer distributes this request in the Super peer layer

• If a Super peer has information about a matching file shared by one of its leaf nodes, it sends this information back to the

requesting leaf node (backward routing)

– Hybrid protocol (reactive and proactive)

• Routes to content providers are only established on demand;

content announcements from leaf nodes to their super peers

– Routing within super peer layer equal to Pure P2P

5.2 Hybrid P2P

(85)

• Signaling connections (stable, as long as neighbors do not change):

– Based on TCP

– Keep-alive Ping-Pong – Content search

• Content transfer connections (temporary):

– Based on HTTP

– Out of band transmission (directly between leaf nodes)

• Out-of-band ≡ not using signal routes

5.2 Hybrid P2P

(86)

• Query Requests

– Leafnode sends request to super peer

– A super peer receiving a request looks up in its routing tables whether content is offered by one of its leaf nodes

• If yes, response message is returned to request sender with information on the node offering the content

– Back-track-routing of responses is similar to pure P2P

– Additionally, the superpeer forwards the request to the super peer network per flooding

• Flooding similar to pure P2P, but messages remain in the super peer network (i.e. TTL, hopcounters, message caches, etc)

• No query communication with leafs necesarry due to routing

5.2 Hybrid P2P: Routing

(87)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(88)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(89)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(90)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(91)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(92)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(93)

5.2 Hybrid P2P: Ping-Pong

Sample Gnutella 0.6 network:

Sample message sequence chart according to the

sample network:

4

L1 L3

S2 S3

S1

L2

L5 L4 L6 L7

Gnu-Con

L2 L3 L1 S1 S3 S2 L7

OK

PONG

L6 L5 L4

RTU PING

PONG PONG

PING PING PONG

PONG PINGPING

QUERY

QUERY QUERY QUERY

QUERY QUERY

QUERY

QUHIT

QUHIT QUHIT QUHIT

QUHIT

(94)

5.2 Gnutella 0.6: Topology

43 39

7

100

3

118 116

18

39 118

7

116

3, 43 18 100

Abstract network structure of a part of the Gnutella network (222 nodes

Geographical view given by Figure on the right, measured on 01.08.2002

Geographical view of a part of the Gnutella network (222 nodes); The numbers depict the node numbers from the abstract view (Figure on the left, measured on 01.08.2002)

(95)

• Disadvantages

– Still high signaling traffic because of decentralization

– No definitive statement possible if content is not available or not found

– Overlay topology not optimal, as

• no complete view available

• no coordinator

– Difficult to adapt to physical network completely because of hub structure

• Advantages

– No single point of failure – Can provide anonymity

• Application areas

– File-sharing (Gnutella, eDonkey, Kazaa)

– Context based routing (see chapter about mobility)

5.2 Hybrid P2P: Discussion

(96)

Summary

Client-Server Peer-to-Peer

1. Server is the central entity and only provider of service and content.

Network managed by the Server

2. Server as the higher performance system.

3. Clients as the lower performance system Example: WWW

1. Resources are shared between the peers

2. Resources can be accessed directly from other peers 3. Peer is provider and requestor (Servent concept)

Unstructured P2P Structured P2P

Centralized P2P Pure P2P Hybrid P2P Pure P2P Hybrid P2P

1. All features of Peer-to- Peer included

2. Central entity is necessary to provide the service 3. Central entity is some kind

of index/group database Example: Napster

1. All features of Peer-to-Peer included

2. Any terminal entity can be removed without loss of functionality

3. No central entities Examples: Gnutella 0.4, Freenet

1. All features of Peer-to-Peer included

2. Any terminal entity can be removed without loss of functionality

3. dynamic central entities Example: Gnutella 0.6, JXTA

(97)