Peer-to-Peer
Data Management
Hans-Dieter Ehrich
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
7. Unstructured P2P Networks
The transparencies of this chapter are based on the package
Unstructured Peer-to-Peer Networks by
Wolf-Tilo Balke and Wolf Siberski 24.10.2007
●
Original slides partially provided by
►
Rüdiger Schollmeier
►
Jörg Eberspächer
7. Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
►
Centralized P2P
►
Pure P2P
►
Hybrid P2P
Review: Driving Forces of P2P – File Sharing
●
Sharing of otherwise unused resources
►
Storage
►
Bandwidth
►
(Processing)
●
No central control
►
No single point of failure
►
No administrative efforts
►
Difficult to attack with judicial means
Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
►
Centralized P2P
►
Pure P2P
►
Hybrid P2P
How It All Began: From Arpanet to Peer-to-Peer
1. How It All Began: From Arpanet to Peer-to-Peer
2.
The Napster Story
3.
Gnutella and its Relatives: Fully Decentralized Architectures
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
From ARPANET to Peer-to-Peer
●
Late 1960s: Establishment of the ARPANET
► Goal: share computing resources and documents between US research facilities
► The logical network matched the physical network to a large extent
► Applications: FTP and TelNet client/server model
► Central steering committee to organize the network
●
1979: Development of the UseNet protocol
► Newsgroup application to organize content
► Newsgroup server network exhibits P2P characteristics
Self organizing approach to add and remove newsgroup servers
Fully distributed content replication
► Still client/server application with respect to endpoints
●
~1990 rush of the general public to join the Internet
► Applications following the client/server approach: WWW, email, streaming
► Straightforward model to administrate and control the content distribution
The Napster Story
1.
How It All Began: From Arpanet to Peer-to-Peer
2. The Napster Story
3.
Gnutella and its Relatives: Fully Decentralized Architectures
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
The Napster Story
●
MAY 1999: Disruption of the Internet community First Generation of P2P
► Introduction of Napster
► User not only consume and download content but also offer and provide content to other participants
► Users establish a virtual network, entirely independent from physical network and administrative authorities or restrictions
► Basis: UDP and TCP connections between the peers
●
December 1999: RIAA files a lawsuit against Napster Inc.
► Target of the RIAA: the central lookup server of Napster
●
February 2001: 2.79 billion files exchanged via the Napster network per month
●
July 2001: Napster Inc. is convicted
► Napster has to stop the operation of the Napster server
► Napster network breaks down
► BUT: Already a number of promising successors available
Centralized Directory Model
● Centralized Directory Model
►
The index service is provided centrally by a coordinating entity.
►
Search request is issued to the coordinating entity which delivers a list of peers having the desired files available for download.
►
Requesting peer obtains the respective files directly from the peer offering them.
●
Characteristics
► Lookup of existing documents can be guaranteed.
► Index service is a “Single Point of Failure”.
► Centralized P2P system
●
Representative: Napster
? Prince
purple rain ? ! Prince
purple rain ! Central database with
index of all shared files Mr. Müller shares on
his client several MP3- files.
Mr. Arayama shares on his client several MP3-
files, too.
Centralized Directory Model - Example
! Prince purple rain !
@ Mr. Arayama
? Prince purple rain ?
Gnutella and its Relatives
1.
How It All Began: From Arpanet to Peer-to-Peer
2.
The Napster Story
3. Gnutella and its Relatives: Fully Decentralized Architectures
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
Gnutella
●
March 2000: Nullsoft releases Gnutella as an open source project
► Major developer: Gene Khan
► Additionally to servent functionality, the peers also take over routing tasks
► Becomes extremely popular after Napster has to shut down
●
October 2000: introduction of hierarchical routing layers.
► Gnutella 0.6: Ultrapeer concept
► Increases the scalability significantly
●
Variety of similar fully decentralized P2P-protocols followed soon:
► Audiogalaxy
► FastTrack/KaZaA
► iMesh
► Freenet
Flooded Request Model
● Flooded Request Model:
► Atomic P2P system
Without central coordination authority (all peers are equal).
Search request is passed on to neighbors.
If they cannot answer the request, they pass it on to various other nodes until a predetermined search depth (ttl=time-to-live).
When requested file has been located, positive search results are sent to the requesting entity.
Requesting peer can then download the desired file directly from the entity which is offering it.
●
Characteristics:
► Fully decentralized, no central lookup server
no single point of failure or control
► Unreliable lookup (no guarantees)
► System doesn't scale.
► Pure P2P system
●
Representative: Gnutella 0.4
Mr. Müller is searching for
Prince No central
Database
?
?
?
?
?
?
? ? ?
http://www.gnutelliums.com/
Flooded Request Model - Example
?
Mr. Arayama serves Prince
Super-peer based Flooding
●
Super-peer based Flooding:
► Hierarchical P2P system
Super-peers (Ultrapeers) form pure P2P subnet
All other peers (Leaf nodes) directly attach to one super-peer
Super-peer netwerk acts as distributed file index o Super-peers request file list from their leaf peers
Queries are distributed in super-peer subnet only
► Combination of Pure and Central P2P
●
Characteristics:
► Fully decentralized, no central lookup server
no single point of failure or control
► Systems scales much better
► Unreliable lookup
(but more success due to smaller network)
► Hybrid P2P system
●
Representative: Gnutella 0.6
Mr. Müller is searching for
Prince
?
?
?
http://www.gnutelliums.com/
Super-peer based Flooding- Example
?
Mr. Arayama serves Prince
?
? ?
?
?
?
Gnutella and its Relatives: The Story Goes on
● August 2001
► Users adapt very fast to the breakdown of Napster
► Already 3.05 billion files exchanged per months via the Gnutella network
● Year 2001
► Invention of structured P2P networks (regular instead of random network graph)
● August 2002
► Amount of exchanged data in KaZaA decreases, caused by a high number of defected files (reason: weak hash keys to identify files)
► Edonkey and Gnutella regain popularity
● May 2003
► Bittorrent is released
► Soon causes majority of the observed traffic, due to its efficiency
● Middle of 2003
► Beyond the exchange of content, new concepts are developed to use P2P also for other applications
► Skype a Voice over P2P application is developed
● Today:
► Major efforts are made to increase the reliability of P2P-searches, also in mobile networks, …
► In 2005 Ebay buys Skype to use the paradigm for the communication between bidders and sellers
1
stand 2
ndGenerations of P2P
Client-Server Peer-to-Peer
1. Server is the central entity and only provider of service and content.
Network managed by the Server
2. Server as the higher performance system.
3. Clients as the lower performance system
Example: WWW
1. Resources are shared between the peers
2. Resources can be accessed directly from other peers 3. Peer is provider and requestor (Servent concept)
Unstructured P2P Structured P2P
Centralized P2P Pure P2P Hybrid P2P Pure P2P Hybrid P2P
1. All features of Peer-to- Peer included 2. Central entity is
necessary to provide the service
3. Central entity is some kind of index/group database
Example: Napster
1. All features of Peer-to- Peer included
2. Any terminal entity can be removed without loss of functionality
3. No central entities Examples: Gnutella 0.4,
Freenet
1. All features of Peer-to- Peer included
2. Any terminal entity can be removed without loss of functionality
3. dynamic central entities
Example: Gnutella 0.6, JXTA
1st Gen. 2nd Gen.
Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
►
Central P2P: Napster
►
Pure P2P: Gnutella 0.4
►
Hybrid P2P: Gnutella 0.6
Overlay Networks
● “Virtual” signaling network established via TCP connections between the peers
● Characteristics of the overlay topology:
► completely independent from physical network
► Separate addressing and routing scheme
► No relation between physical network edges and overlay network edges
► May include hierarchies (hub network) (e.g. rendezvous peers in JXTA)
► May include centralized elements (star network) (lookup server in Napster)
► May be a completely randomized network (Gnutella 0.4) (randomly meshed network)
► Overlay network can be seen as graph
Peers as nodes
Conceptual connections as edges
General Characteristics of 1
stAnd 2
ndGen. P2P
1st and 2nd Generation P2P systems are overlay architectures, with the following characteristics:
► TCP/IP based
► Decentralized and self organizing (with possible centralized elements)
► Content:
Distributed “randomly” on the network, with several replicas (due to popularity)
Content stays at provider peer
Content transfer:
o
Out of band, i.e. on separate connections and not via signaling connections
o
Mostly via HTTP
► Employ distributed shared resources (data storage, bandwidth)
► Generally two kinds of requests:
Content requests: to find content in the overlay
Keep-alive requests: stay connected in the overlay
► Initially developed for file-sharing
► Various realizations exist
Basic Routing Behavior
● Request messages:
► Include a hop-counter, a GUID (Globally Unique Identifier) and a TTL (Time-To-Live) in the header
► TTL determines along how many hops a message may be forwarded
► Are flooded in the overlay network
Every node forwards every incoming message to all neighbors except the neighbor, it received the message from
Exceptions: see below
► Request messages terminate, if
Same message-type with same GUID is received more than once (loop!!)
Hop-counter=TTL
● Response messages:
► Include a hop-counter, a GUID and a TTL (Time-to-Live) in the header
► GUID is the same as of the initializing request message
► Are routed back on the same way to the requestor, the request message was transmitted to the responding peer
every peer has to store the GUID of each request for a certain amount of time
No flooding to save resources
Basic Bootstrapping
● Mostly not part of the protocol specification
● Necessary to know at least one active participant of the network
● Otherwise no participation at the overlay possible for a new node
● Address (TCP) of an active node can be retrieved by different means:
► Bootstrap cache: Try to establish one after another a connection to a node seen in a previous session
► Bootstrap server:
Connect to a “well known host”, which almost always participates
Ask a bootstrap server to provide a valid address of at least one active node
Realizations:
o FIFO of all node-addresses which recently used this bootstrap (a node which just connected is assumed to be still active)
o Random pick of addresses which recently connected via this server to the overlay (+ no loops, -may be outdated)
► Broadcast on the IP layer
Use multicast channels
Use IP broadcasting (-limited to local network)
Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
► Central P2P: Napster 1. Basic Characteristics 2. Signaling Characteristics 3. Discussion
►
Pure P2P: Gnutella 0.4
►
Hybrid P2P: Gnutella 0.6
Definition of centralized P2P
●
All peers are connected to central entity
●
Peers establish connections between each other on demand to exchange user data (e.g. mp3 compressed data)
●
Central entity is necessary to provide network services
●
Central entity is some kind of index/group database
●
Central entity is lookup/routing table
Basic Characteristics of centralized P2P
●
Bootstrapping: Bootstrap-server = central server
●
Central entity can be established as a server farm, but one single entry point = single point of failure (SPOF)
●
All signaling connections are directed to central entity
●
Peer central entity: P2P protocol, e.g. Napster protocol
►
To find content
►
To log on to the overlay
►
To register
►
To update the routing tables
►
To update shared content information
●
Peer Peer: HTTP
►
To exchange content/data
Topology of Centralized P2P
Servent
Connection between 2 servents (TCP)
Connection between router & servent Connection between
routers (Core)
Napster: How Does It Work
● Application-level, client-server protocol over point-to-point TCP
● Partcipants:
► Napster Hosts/peers
► Client Service
Login
Data-requests
Download-requests
► P2P Service
Data-transfer
► Napster Indexserver
Pure Server
● Five steps:
► Connect to Napster Server
► Upload your list of files (push) to server
► Query Indexserver with a list of keywords to search the full list with
► Select “best” of correct answers
► Connect to providing host/peer
Central Napster Index server
Data Transfer
Napster Host
Napster Host
Napster Host Napster
Host
Napster Message Structure
<Payload Length>
2byte <Function>
2Byte
HEADER 4byte PAYLOAD
General Header Structure:
Describes the
message type (e.g.
login, search,…)
Describes parameters of the message (e.g.
IDs, keywords,…)
Napster: Initialization
<Nick> <Password> <Port>
1: LOGIN (Function:0x02)
Napster Host IP: 001 Nick: LKN
<Client-Info> <Link-type>
LOGIN(0x02)
lkn 54332 6699 „nap v0.8“ 9 LOGIN ACK(0x03)
2: LOGIN ACK (Function: 0x03)
„<Filename>“ <MD5>
3: NOTIFICATION OF SHARED FILE (0x64)
<Size> <Bitrate> <Freq> <Time>
NOTIFICATION(0x64)
„band - song.mp3“ 3f3a3... 5674544 128 44100 342
Central Napster
Index server
Client/Server Service
Napster: File Request Procedure
[FILENAME CONTAINS „Search Criteria“]
[LINESPEED <Compare> <Link-Type>]
1: SEARCH (Function: 0xC8)
[BITRATE <Compare> “<Bitrate>”]
SEARCH(0xC8)
FILENAME CONTAINS „song“ MAX_RESULTS 100 LINESPEED „AT LEAST“ 6 BITRATE „AT LEAST“
„128“
FREQ „EQUAL TO“ „44100“
[FREQ <Compare> “<Freq>”]
[MAX_RESULT <Max>]
Napster Host IP: 002 Nick: MIT
2: SEARCH RESPONSE (Function: 0xC9)
„<Filename>“ <MD5> <Size> <Bitrate> <Freq>
<Time> <Nick> <IP> <Link-Type>
Central Napster
Index server
Summary of Napster Signaling
Napster Peer (Req)
Napster Server
Napster Peer (Prov)
Login: [0x24|0x02|…]
Login Ack: [0x00|0x03|…]
HTTP: GET[Filename]
OK[data]
Notif: [0x46|0x64|…]
Notif: [0x46|0x64|…]
Notif: [0x46|0x64|…]
Search: [0x7E|0xC8|…]
Response: [0xC4|0xC9|…]
Response: [0xC4|0xC9|…]
Sample message sequence chart for one Napster server with one requesting and one providing peer
1.
Search Request
User sends out a music file request and Napster searches its central
data base.
Napster: Wrap-Up I
0101 1001 1001
2.
Search Response
The Napster Server sends back a list of peers that share the file.
Napster: Wrap-Up II
3.
File Download
The requesting user
downloads the file directly from the computer of
another Napster user via HTTP.
Napster: Wrap-Up II
Centralized P2P: Discussion
● Disadvantages
► Single Point of Failure easily attackable
► Bottleneck
► Potential of congestion
► Central server in control of all peers
● Advantages
► Fast and complete lookup (one hop lookup)
► Central managing/trust authority
► No keep alive necessary, beyond content updates
● Application areas
► File Sharing
► VoIP (SIP, H.323)
► Conceptually: „Social Web‟ applications (eBay, YouTube, del.icio.us, etc.)
● Systems
► BitTorrent, Audiogalaxy, WinMX
Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
►
Central P2P: Napster
► Pure P2P: Gnutella 0.4 1. Basic Characteristics
2. Signaling Characteristics 3. Discussion
►
Hybrid P2P: Gnutella 0.6
Definition of Pure P2P
●
Any terminal entity can be removed without loss of functionality
●
No central entities employed in the overlay
●
Peers establish connections between each other randomly
► To route request and response messages
► To insert request messages into the overlay
=
c d d
Model of Pure P2P Networks
Major component Separate sub
networks Degree distribution:
According Sample Graph:
( ) ( )
( )
1.4 1
, 0 7 0 , ,
: 2.2 var 1.63
d
p d with c p d in any other case c
average d d
- -
==
=
Basic Characteristics of Pure P2P
●
Bootstrapping:
► Via bootstrap-server (host list from a web server)
► Via peer-cache (from previous sessions)
► Via well-known host
► No registration
●
Routing:
► Completely decentralized
► Reactive protocol: routes to content providers are only established on demand, no content announcements
► Requests: flooding (limited by TTL and GUID)
► Responses: routed (Backward routing with help of GUID)
●
Signaling connections
(stable, as long as neighbors do not change):
► Based on TCP
► Keep-alive
► Content search
●
Content transfer connections
(temporary):
► Based on HTTP
► Out of band transmission
Topology of Pure P2P
Servent
Connection between 2 servents (TCP)
Connection between router & servent Connection between
routers (Core)
Gnutella 0.4: How Does It Work
● Application-level, peer-to-peer protocol over point-to-point TCP Partcipants:
► Gnutella peers/servents
► Router Service
Flood incoming requests (regard TTL!)
o Keep alive o content
Route responses for other peers (regard GUID of message)
o Keep alive (PING/PONG) o Content (QUERY/QUERYHIT)
Data-requests
Download-requests
► Lookup Service
Initialize Data requests
Initialize keep alive requests
► “Server”-Service
Serve Data-requests (HTTP)
● Five steps:
► Connect to at least one active peer (address received from bootstrap)
► Explore your neighborhood (PING/PONG)
► Submit Query with a list of keywords to your neighbors (they forward it)
► Select “best” of correct answers (which we receive after a while)
► Connect to providing host/peer
G
G
G G
G
G
G
TCP connection G
Peer/
Servent G
G
G
G
The Gnutella Network
Measurements taken at the LKN in May 2002
Gnutella Message Structure
General Header Structure:
Describes the
message type (e.g.
login, search,…)
Describes parameters of the message (e.g.
IDs, keywords,…)
General Header Structure:
GnodeID
16 Bytes Function
1 Byte
MESSAGEHEADER: 23Byte
TTL
1 Byte Hops
1 Byte Payload Length
4 Bytes
• GnodeID: unique 128bit Id of any Hosts
• TTL(Time-To-Live): number of servents, a message may pass before it is killed
• Hops: number of servents a message already passed
Gnutella Messages
2 BytesPort IP Address 4 Bytes
PING (Function:0x00)
Nb. of shared Files
4 Bytes Nb. of Kbytes shared 4 Bytes
No Payload PONG (Function:0x01)
Minimum Speed
2 Bytes Search Criteria
n Bytes
QUERY (Function:0x80)
Nb. of Hits
1 Byte Port
2 Bytes GnodeID
16 Bytes Result Set
n Bytes
QUERY HIT (Function:0x81)
Speed 1 Byte
File Index 4 Bytes
File Name n Bytes IP Address
4 Bytes
Gnutella Routing
• Flooding: Received PINGS and QUERIES must be forwarded to all connected Gnodes
• PINGS or QUERYS with the same FUNCTION ID and GNODE ID as previous messages are
destroyed (avoid loops)
• Save Origin of received PINGs and QUERIEs
• Increase Hops by 1
• If Hops equals TTL, kill the message
• PONG and QUERY HIT are forwarded to the origin of the according PING or QUERY
• Basic Routing Principle: „Enhanced“
Flooding
Gnutella Connection Setup
GNODE ID: 2000
IP: 002
GNODE ID: 3000
IP: 003
GNODE ID: 4000
IP: 004 GNODE
ID: 1000 IP: 001
2526
17
2224
17Gnutella Connect 18Gnutella OK
19PING 20PONG/IP:004
21PING 23PONG/IP:001 27PONG/IP:001
22PING 24PONG/IP:003 28PONG/IP:003
25PING 26PING 18
1920 2728
Gnode 2000 establishes a connection to 4000
Summary of the Signaling in Gnutella 0.4
1
7 3
2 4 5
6
8
Gnu-Con Gnu-Con
Peer7 Peer3 Peer1 Peer5 Peer2 Peer4 Peer6
Gnu-Con OK
OK
OK PING PING
PING PING
PING PING
PING
PONG PING
PING
PING
PONG
PONG
PONG
Peer8
PING
PONG
PONG
PONG
PONG
PONG PONG
PONG
PONG PING
Sample Gnutella 0.4 network:
Sample message sequence chart according to the sample network:
Gnutella Wrap-Up I
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• Broadcast
Query[XYZ, TTL = 3, …]
Gnutella Wrap-Up II
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• 1. Query-Hit
• Broadcast
Query[XYZ, TTL = 2, …]
Gnutella Wrap-Up III
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• 2. Query-Hit
• Broadcast
Query[XYZ, TTL = 1, …]
Gnutella Wrap-Up IV
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• 3. + 4. Query-Hit
• [TTL = 0] no further
Broadcast
Gnutella Wrap-Up V
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• Establish HTTP
Connection
Gnutella Wrap-Up VI
Requesting Servent Servent
Query […]
Query-Hit […]
Requested Data
• HTTP Connection Get[XYZ, …, …]
Download Data
Discussion
●
Disadvantages
► High signaling traffic, because of decentralization
► Modem nodes may become bottlenecks
► Overlay topology not optimal, as
no complete view available,
no coordinator
► If not adapted to physical structure delay and total network load increases
Zigzag routes
loops
●
Advantages
► No single point of failure
► Can be adapted to physical network
► Can provide anonymity
► Can be adapted to special interest groups
●
Application areas
► File-sharing
► Context based routing (see chapter about mobility)
●
Systems
► Freenet, Gnutella, Gnunet
Unstructured P2P Networks
●
History
●
Overlay Network Characteristics
●
Network Types
►
Central P2P: Napster
►
Pure P2P: Gnutella 0.4
► Hybrid P2P: Gnutella 0.6 1. Basic Characteristics
2. Signaling Characteristics 3. Discussion
Definition of Hybrid P2P
●
Main characteristic, compared to pure P2P: Introduction of another dynamic hierarchical layer
●
Hub based network
●
Reduces the signaling load without reducing the reliability
●
Election process to select and assign Superpeers
●
Superpeers: high degree (degree>>20, depending on network size)
●
Leafnodes: connected to one or more Superpeers (degree<7)
Superpeer
leafnode
- =
, 1 7 c d d
average d
Model of Hybrid P2P Networks
Degree distribution:
According sample graph:
Major component
Separate sub networks
Hub connections (2nd hierarchy)
Superpeer
leafnode
( ) ( )
( )
1.4 1.4 1
1 0.05, 1 0.05, 20 , 0,
: 2.8 var 3.55
d
p d with cc d p d c d c
in any other case
d
- - -
== =
= =
Basic Characteristics of Hybrid P2P
● Bootstrapping:
► Via bootstrap-server (host list from a web server)
► Via peer-cache (from previous sessions)
► Via well-known host
► Registration of each leafnode at the Superpeer it connects to, i.e. it announces its shared files to the Superpeer
● Routing:
► Partly decentralized
Leafnodes send request to a Superpeer
Superpeer distributes this request in the Superpeer layer
If a Superpeer has information about a matching file shared by one of its leafnodes, it sends this information back to the requesting leafnode (backward routing)
► Hybrid protocol (reactive and proactive): routes to content providers are only established on demand;
content announcements from leafnodes to their Superpeers
► Routing within Superpeer layer equal to Pure P2P
● Signaling connections (stable, as long as neighbors do not change):
► Based on TCP
► Keep-alive
► Content search
● Content transfer connections (temporary):
► Based on HTTP
► Out of band transmission (directly between leafnodes)
Gnutella 0.6 Network Organization
New connection/network setup
►
Upon connection to the network via a Superpeer, each node is a leafnode
►
It announces its shared content to the Superpeer it connected to
►
Superpeer thus updates its routing tables
►
Election mechanism decides which node becomes a Superpeer or a leafnode (depending on capabilities (storage, processing power)
network connection, the uptime of a node,…), if
Too many nodes are connected to one Superpeer
A Superpeer leaves the network
To less nodes are connected to a Superpeer
Gnutella 0.6 Routing
● Content requests:
► Leafnode sends request to Superpeer
► Superpeer looks up in its routing tables whether content is offered by one of its leafnode. In this case the request is forwarded to this node.
► Additionally the Superpeer increases the hopcounter and forwards this request to the Superpeers it is connected to.
► To enable backward routing, the peer has to store the GUID of the message connected to the information from which peer it received the request in the previous hop
► If a Superpeer receives such a request from another Superpeer, this request is handled the same way, as if it would have received it from one of its leafnodes
► After the hopcounter of the request reaches the TTL-value it is not forwarded any further (prevent circles)
● Content responses:
► If a leafnode receives a request, it double-checks whether it shares the file (should be the case, as long as the routing tables of the Superpeer are correct)
► In case of success, the leafnode sends a content reply back to the requesting peer, by sending it back to that node (Superpeer) it received the message from (backward routing)
► Hop by hop the message can thus be routed back to the requesting node
● Content exchange:
► Directly between the leafnodes, via HTTP connections
Topology of Hybrid P2P
43 39
7
100
3
118 116
18
39 118
7
116
3, 43 100 18
Abstract network structure of a part of the Gnutella network (222 nodes
Geographical view given by Figure on the right, measured on 01.08.2002
Geographical view of a part of the Gnutella network (222 nodes); The numbers depict the node numbers from the abstract view (Figure on the left, measured on 01.08.2002)
• Virtual network not matched to physical network. See path from node 118 to node 18.
• Superpeer (hub) structure clearly visible in abstract view
Gnutella 0.6 Messages
● Content requests and responses
► QUERY (defined as in Gnutella 0.4)
► QUERY_HIT (defined as in Gnutella 0.4)
● Keep alive:
► PING (defined as in Gnutella 0.4)
► PONG (defined as in Gnutella 0.4)
● Announcement of shared content:
► ROUTE_TABLE_UPDATE (0x30), Reset variant (0x0): to clear the routing table and to set a new routing table for one leafnode
► ROUTE_TABLE_UPDATE (0x30), Patch variant(0x1): to update and set a new routing table with a certain number of entries (e.g. new shared files)
0 1 4 5
Variant Table_Length Infinity
0 1 2 3 4 5 n+4
Variant Seq_No Seq_Size Compressor Entry_Bits DATA
Summary of the Signaling in Gnutella 0.6
Sample Gnutella 0.6 network:
Sample message sequence chart according to the sample network:
4
L1 L3
S2 S3
S1 L2
L5 L4 L6 L7
Gnu-Con
L2 L3 L1 S1 S3 S2 L7
OK
PONG
L6 L5 L4
RTU PING
PONG PONG
PING PING PONG
PONG PINGPING
QUERY
QUERY
QUERY QUERY QUERY
QUERY QUERY
QUERY
QUHIT
QUHIT QUHIT QUHIT
QUHIT QUHIT QUHIT
QUHIT
Gnutella 0.6: How Does It Work I
Leafnode Ultrapeer
Gnutella 0.6: How Does It Work II
Leafnode Ultrapeer
Gnutella 0.6: How Does It Work III
Leafnode Ultrapeer
Gnutella 0.6: How Does It Work III
Leafnode Ultrapeer
Gnutella 0.6: How Does It Work IV
Leafnode Ultrapeer
Gnutella 0.6: How Does It Work V
Leafnode Ultrapeer
Discussion
● Disadvantages
► Still High signaling traffic, because of decentralization
► No definitive statement possible if content is not available or not found
► Overlay topology not optimal, as
no complete view available,
no coordinator
► If not adapted to physical structure delay and total network load increases
Zigzag routes
Loops
► Difficult to adapt to physical network completely because of hub structure
● Advantages
► No single point of failure
► Can provide anonymity
► Can be adapted to special interest groups
● Application areas
► File-sharing
► Context based routing (see chapter about mobility)
● Systems
► Gnutella, eDonkey, Kazaa
Topology combinations
●
Each approach comes with a different set of advantages/disadvantages
► Suitability depends on application context
●
Combination of approaches
► Use different techniques for different application aspects
► Example: Skype
Centralized P2P for Login/Account Mgmt.
o Routed by super-nodes if necessary
Attempts to establish direct Voice over IP connections
Hybrid P2P to route through firewall, between NATs, etc.
Figure from Salman A. Baset and Henning G. Schulzrinne: An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol, INFOCOM2006
1
stand 2
ndGenerations of P2P
Client-Server Peer-to-Peer
1. Server is the central entity and only provider of service and content.
Network managed by the Server
2. Server as the higher performance system.
3. Clients as the lower performance system
Example: WWW
1. Resources are shared between the peers
2. Resources can be accessed directly from other peers 3. Peer is provider and requestor (Servent concept)
Unstructured P2P Structured P2P
Centralized P2P Pure P2P Hybrid P2P Pure P2P Hybrid P2P
1. All features of Peer-to- Peer included 2. Central entity is
necessary to provide the service
3. Central entity is some kind of index/group database
Example: Napster
1. All features of Peer-to- Peer included
2. Any terminal entity can be removed without loss of functionality
3. No central entities Examples: Gnutella 0.4,
Freenet
1. All features of Peer-to- Peer included
2. Any terminal entity can be removed without loss of functionality
3. dynamic central entities
Example: Gnutella 0.6, JXTA
1st Gen. 2nd Gen.
Outlook
●
Structured Networks
►
Distributed Hash Table (DHT) Basics
►
DHT Algorithms
►
DHT Dynamics
●