• Keine Ergebnisse gefunden

5.0 What is Peer-To-Peer? •

N/A
N/A
Protected

Academic year: 2021

Aktie "5.0 What is Peer-To-Peer? •"

Copied!
15
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Wolf-Tilo Balke Christoph Lofi

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Distributed Data Management

5.0 Introduction and History 5.1 The First Generation

Centralized P2P Pure P2P

5.2 The Second Generation Hybrid P2P

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 2

5 Unstructured P2P Networks

Peer To Peer (P2P) Systems

P2P systems have been popularized in 1999 by Napster for sharing MP3’s

Base Problem: How can resources easily be shared within a highly volatile and decentralized network of independent and autonomous peers (nodes)?

There is an (potentially) large number of peers

Peers may join or leave the network any time

Only rudimentary features necessary

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 3

5.0 Peer-To-Peer Systems

What can be shared?

Information

File & document sharing Bandwidth

Load balancing

Shared bandwidth Storage space

DAS, NAS, SAN

Storage networks Computing Power

High Performance Computing

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 4

5.0 Peer-To-Peer Systems

What is a P2P network?

A virtual overlay network for sharing resources

Virtual and physical network are logically independent

Mostly IP based

Usually decentralized and self-organizing Peers can transfer data directly without intermediate

servers

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 5

5.0 What is Peer-To-Peer?

“Virtual” signaling network established via TCP connections between the peers

Characteristics of the overlay topology:

completely independent from physical network Separate addressing and routing scheme No relation between physical network edges and

overlay network edges

Overlay network can be seen as graph

Peers as nodes

Conceptual connections as edges

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.0 Overlay Networks

(2)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 7

5.0 Overlay Networks

The topology of the overlay network may show different properties

May be centralized or decentralized

May use strict structures or may be unstructured May show be flat or be organized in hierachies

We will use these properties later to classify P2P systems!

In this lecture only unstructured networks

Next lecture: structured networks

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.0 Overlay Networks

P2P technology was enabled by various technological and social developments Performance increase of home user’s personal

computers

When P2P system have been established in 1999, the average computing performance of a home PC was comparable to high end servers of the late 80s General availability of high-speed internet

In 1999, DSL connections have been introduced

Flat rate models gained momentum

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 9

5.0 Towards P2P

Late 1960s: Establishment of the ARPANET “Advanced Research Projects Agency Network”

Based on the concept of the “Intergalactic Computer Network”

of Prof. J.C.R. Licklider Funded by DARPA

Share computing resources and documents between US research facilities

The rumor that ARPANET was build in order to control the military after a nuclear war is NOT true!

Most popular applications

Email (1971), FTP (1974) and TelNet (1969)  client/server model

Central steering committee to organize the network Later became “the internet”

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 10

5.0 Towards P2P

1979: Development of the UseNet protocol Newsgroup application to organize content Newsgroup server network exhibits some P2P

characteristics

No central server but a server network (compare to super-peer-networks)

Clients only communicate with a server which may reroute the requests to other servers

Different groups for different content

Initially, only text messages

Later: infamous BIN groups usually distributing copyrighted music, software or movies

5.0 Towards P2P

~1990 rush of the general public to join the Internet

The WWW is invented at CERN by Tim-Berners Lee Centrally hosted, interlinked websites are state-of-art

Illegal file sharing using warez sites…

5.0 Towards P2P

(3)

Northeastern University, Boston, June 1999 Shawn Fanning (19) and Sean Parker (20) invent

Napster

Problem: Both liked to share music and software (for free…)

But: warez sites, UseNet binary groups, and IRC bots were very painfully to use

Bad search, broken links, tiny retention caches, low bandwidth, etc..

Idea: establish a system offering powerful search capabilities, no broken links, and performance which increases with the number of users!

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 13

5.0 Towards P2P

Basic Idea of Napster

Users store music on their home PCs

Users connect to the Napster server and provide a list of all songs they currently have

Users can query the Napster server for any song

Result: a list of all users currently possessing that song User can download the song

directly from another user

Peer-to-Peer!

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 14

5.0 Towards P2P

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 15

5.0 Towards P2P

Napster Inc. initially aimed at being a market place for digital music

Like iTunes today

Napster tried multiple times to establish usage agreements with record labels, but they failed

No legal business model for selling single songs possible

Labels felt threatened by Napster’s fast growth

Negotiations have been stopped by labels

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 16

5.0 Towards P2P

December 1999:

RIAA files a lawsuit against Napster Inc.

Target of the RIAA: the central lookup server of Napster

February 2001:

2.79 billion files exchanged via the Napster network per month

July 2001: Napster Inc. is convicted

Napster has to stop the operation of the Napster server Napster network breaks down

BUT: Already a number of promising successors available

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 17

5.0 Towards P2P

May 2002:

Bertelsmann tries to buy Napster assets for $85 million American courts blocks the transaction and forces

Napster to liquidate all assets

Roxio buys the logo and name in the bankruptcy auction

Roxio owned an iTunes-like store called “pressplay” which was rebranded with the Napster CD

Launch of new Napster in October 2003 as a centralized paid-subscription service

Not very successful because it launched shortly after iTunes and without hardware support

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 18

5.0 Towards P2P

(4)

Generally, the RIAA lawsuit is considered a big failure

Napster could have become an early iTunes, if labels had cooperated

The lawsuit gave birth to an even more dangerous software: e.g., Gnutella

Open source

Fully decentralized

A Gnutella network cannot be shut down No company to sue, no servers to disconnect, …

P2P piracy became even stronger after Napster was convicted due to publicity

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 19

5.0 Towards P2P

The “hot” years for P2P have been 1999-2008

In 2006, nearly 70% of all network traffic was attributed to P2P traffic

Nowadays, P2P traffic declines in favor of video streaming and social networks...

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 20

5.0 P2P Development

First generation peer-to-peer networks tried simple paradigms to build the network

Centralized directory model:

all content listed in a central directory whose server also is used as a central point of connection

Pure peer-to-peer model:

there is no central authority, but peers do only connect to neighbors in the network

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 21

5.1The First Generation

Centralized directory model

Index service provided centrally by a coordinating entity Search requests are issued to the coordinating entity

Returns a list of peers having the desired files available for download

Requesting peer obtains respective files directly from the peer offering them

Characteristics

Lookup of existing documents can be guaranteed

Index service as single point of failure

Representative: Napster

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Centralized Directories

? Prince

purple rain ? ! Prince

purple rain ! Central database with index of all shared files Peer X shares on his

client several MP3- files

Peer Y shares on his client several MP3-files,

too.

5.1 Example: Napster

! Prince purple rain !

@ Peer Y

? Prince purple rain ?

All peers are connected to a central entity Central entity is necessary to provide network services

Joining the network: central server is also the bootstrap-server

Central entity can be established as a server farm, but one single entry point (also single point of failure)

All signaling connections are directed to central entity Central entity is some kind of index/group database Central entity has a lookup/routing table

Peers establish connections between each other on demand to exchange user data

5.1 Centralized P2P

(5)

Peer  central entity: special P2P protocol, e.g., Napster protocol

– Registering/logging on to the overlay – Finding content

– Updating shared content information – Update the routing tables

Peer  Peer: HTTP

– Exchanging the actual content

5.1 Protocols Used

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Centralized Topology

Peer Connection between 2 peers (TCP)

Connection between router & peer Connection between routers (Core)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Application-level, client-server protocol over point-to-point TCP

Participants Napster hosts/peers Client Service

Login

Data-requests

Download-requests P2P Service

Data-transfer Napster Indexserver

Pure server

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 How Does Napster Work?

Central Napster Index server Data Transfer

Napster Host

Napster Host

Napster Host Napster

Host

General Header Structure

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Napster Messages

<Payload Length>

2byte <Function>

2Byte

HEADER 4byte PAYLOAD

Describes the message type (e.g.

login, search,…)

Describes parameters of the message (e.g. IDs, keywords,…)

5.1 Napster Initialization

<Nick> <Password> <Port>

1: LOGIN (Function:0x02)

Napster Host IP: 001 Nick: LKN

<Client-Info> <Link-type>

LOGIN(0x02)

lkn 54332 6699 „nap v0.8“ 9 LOGIN ACK(0x03)

2: LOGIN ACK (Function: 0x03)

„<Filename>“ <MD5>

3: NOTIFICATION OF SHARED FILE (0x64)

<Size> <Bitrate> <Freq> <Time>

NOTIFICATION(0x64)

„band - song.mp3“ 3f3a3... 5674544 128 44100 342

Central Napster Index server

Client/Server Service

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Napster: File Requests

[FILENAME CONTAINS Search Criteria]

[LINESPEED <Compare> Link-Type]

1: SEARCH (Function: 0xC8)

[BITRATE <Compare> Bitrate]

SEARCH(0xC8)

FILENAME CONTAINS song MAX_RESULTS 100 LINESPEED AT_LEAST 6 BITRATE AT_LEAST 128 FREQ EQUAL_TO 44100

[FREQ <Compare> Freq]

[MAX_RESULT Max]

Napster Host IP: 002 Nick: MIT 2: SEARCH RESPONSE (Function: 0xC9)

Filename <MD5> <Size> <Bitrate> <Freq>

<Time> <Nick> <IP> <Link-Type>

Central Napster Index server

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

(6)

Sample message sequence chart for a Napster server with one requesting and one providing peer

5.1 Napster Signaling

Napster Peer (Req)

Napster Server

Napster Peer (Prov)

Login: [0x24|0x02|…]

Login Ack: [0x00|0x03|…]

HTTP: GET[Filename]

OK[data]

Notif: [0x46|0x64|…]

Notif: [0x46|0x64|…]

Notif: [0x46|0x64|…]

Search: [0x7E|0xC8|…]

Response: [0xC4|0xC9|…]

Response: [0xC4|0xC9|…]

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Search Request User sends out a

music file request and Napster searches its central data base

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Wrap-Up

0101 1001 1001

Search Response The Napster Server

sends back a list of peers that share the file

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Wrap-Up

File Download The requesting

user downloads the file directly from the computer of another Napster user viaHTTP

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1Wrap-Up

Advantages

Fast and complete lookup (one hop lookup) Central managing/trust authority

Easy bootstrapping

Disadvantages

Single Point of Failure is a bottleneck and makes it easily attackable

Central server in control of all peers

Usage

Application areas: file sharing, VoIP (SIP, H.323) Systems: Skype, Audiogalaxy, WinMX

5.1 Discussion: Centralized P2P

March 2000: Nullsoft releases Gnutella for free Nullsoft planned to release the source code

und GPL license a couple of days later

Developed by Justinf Frankel and Tom Pepper

Nullsoft’s mother company AOL cancels the distribution and further development of Gnutellas a day after its release

AOL merged with Time Warner shortly after buying NullSoft for $100M The Gnutella protocol is reverse engineered and distributed

under GPL license

Many compatible clients and various forks of Gnutella are developed

Became extremely popular after Napster has to shut down

5.1 Gnutella

(7)

August 2001

Users adapt very fast to the breakdown of Napster

Already 3.05 billion files exchanged per months via the Gnutella network

2001

Invention of structured and hybrid P2P networks

Gnutella scaled badly, new network paradigms necessary

e.g KaZaA which quickly gains popularity

August 2002

Amount of exchanged data in KaZaA (FastTrack) decreases, caused by a high number of defected files

weak hash keys to identify files provoked file collisions EDonkey and Gnutella regain popularity

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Gnutella

May 2003

BitTorrent is released

BitTorrent quickly becomes most popular file sharing protocol

Middle of 2003

Beyond the exchange of content, new concepts are developed to use P2P also for other applications Skype a Voice over P2P application is developed

2005:

Major efforts are made to increase the reliability of P2P- searches, also in mobile networks, …

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Gnutella

In 2005 Ebay buys Skype to use the paradigm for the communication between bidders and sellers for $2.6 billion

In 2009, an investor group buys 65% of Skype for $1.9 billion

Plans are to turn Skype into an own company again in 2010

13% of all international phone calls are handled by Skype in 2010

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 39

5.1 Skype

Which protocols are used?

Traffic measured between 2002 and 2004 in Abilene backbone

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 40

5.1 Gnutella

The base idea of Gnutella was to avoid the weaknesses of Napster

Result:

Fully decentralized, unstructured and flat P2P network

Initially: All peers are equal!

Gnutella 0.4

Thus called a pure P2P system

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 41

5.1 Gnutella

Pure P2P systems have following characteristics Decentralized

There is no central authority

Any peer can be removed without loss of functionality Unstructured

Overlay network is constructed randomly without any structure

All peers are equal

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 42

5.1 Pure P2P

Peer Connection between 2 peers (TCP)

Connection between router & peer Connection between routers (Core)

(8)

5.1 Pure P2P : Graphs

Major component

Separate sub networks

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Sample Graph

To query pure networks, a Flooded Request Model is used

Search request is passed on to neighbors.

Neighbors forward the message to their respective neighbors

When a node can answer the request located, results notifications are sent to the requesting node Requesting peer then can establish a

direct connection to any peer which sent a result notification

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Request flooding relies on message forwarding If a peer receives a request, it usually forwards the

request to all its neighbors

Forwarding every message to all neighbors in a uncontrolled fashion will soon overload the network

One node could spam the whole network Message will can be caught in infinite cycles

Restrictions needed!

Each message has a maximum time-to-life (TTL) and a hop counter

The hop counter is initially set to 0

Each forwarded message will have the hop counter increased by 1

A message with TTL=hop counter is not forwarded and dies

TTL thus limits the maximum distance a message can travel Prevents spamming the whole network

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 45

5.1 Pure P2P: Flooding

Every message which is forwarded is cached by the forwarding peer for a short time

Message cache is used to prevent message cycles

Don’t forward a message which you already forwarded!

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 46

5.1 Pure P2P: Flooding

Response messages a routed back to the original requester using the same message trail

Use message caches to perform back-tracking

For each forwarded message stored in the cache, also store the node from which the message was received

If a response message is received, look up the respective request message in cache

Forward response to the node which sent in the request

5.1 Pure P2P: Flooding

Effects of flooding technique

Fully decentralized, no central lookup server

No single point of failure or control Unreliable lookup (no guarantees)

Time-to-live limits the maximum distance a query can travel

Query may be restricted a sub-network not containing the desired results

System doesn't scale well

Number of message increases drastically with number of peers

Peers with low-bandwidth connections are rendered useless

5.1 Pure P2P: Flooding

(9)

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

Broadcast

Query[XYZ, TTL = 3, …]

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

1. Query-Hit

Broadcast

Query[XYZ, TTL = 2, …]

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

2. Query-Hit

Broadcast

Query[XYZ, TTL = 1, …]

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

3. + 4. Query-Hit

[TTL = 0]  no further Broadcast

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

Establish HTTP Connection

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Flooding

Requesting Peer Peer

Query […]

Query-Hit […]

Requested Data

HTTP Connection Get[XYZ, …, …]

Download Data

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

(10)

How can a new node join a pure P2P network?

No central server Network is volatile

Bootstrapping necessary

Usually not part of the protocol specification

Implemented by client

Necessary to know at least one active participant of the network

Otherwise no participation at the overlay possible for a new node

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 55

5.1 Pure P2P: Bootstrapping

The address of an active node can be retrieved by different means

Bootstrap cache

Try to establish a connection to any node known from the last user session

Stable nodes

Connect to a “well known host” which is usually always in the network Bootstrap server

Ask a bootstrap server to provide a valid address of at least one active node

Realizations:

FIFO of all node-addresses which recently used this bootstrap (a node which just connected is assumed to be still active)

Random pick of addresses which recently connected via this server to the overlay

»no loops, but may be outdated

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Bootstrapping

Broadcast on the IP layer

Use multicast channels

Use IP broadcasting limited to local network

Bootstrap lists

Maintain a list of potential bootstrap servers outside the network, e.g. on a website

Used by most file sharing clients

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Bootstrapping

Bootstrapping

Via bootstrap-server (host list from a web server) Via peer-cache (from previous sessions) Via well-known host

Routing

Completely decentralized

Reactive protocol: routes to content providers are only established on demand, no content announcements Requests: flooding (limited by TTL and GUID)

Responses: routed (Backward routing with help of GUID)

Content transfer connections (temporary) Based on HTTP

Out of band transmission

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: Summary

How is pure P2P implemented in Gnutella 0.4?

Application-level, peer-to-peer protocol over point-to- point TCP

Router Service

Flood incoming requests regard TTL!

Route responses for other peers Regard GUID of message Keep alive messages (PING/PONG) Content responses (QUERYHIT)

Lookup Service

Initialize Queries requests

Initialize keep alive requests Download service

Establish direct connection for download

5.1 Gnutella 0.4

G G

G G

G G

G

TCP connection G

Peer

G

G

G

G

Five steps

Connect to at least one active peer

address received from bootstrap Explore your neighborhood

PING/PONG protocol

Submit Query with a list of keywords to your neighbors

Neighbor forward the query Receive QUERYHIT messages

Select the most promising QUERYHIT message Connect to providing peer for file transfer

5.1 Gnutella 0.4

(11)

Ping-Pong Messages

Each “Ping” message is answered by a “Pong” message Keep-Alive-Ping-Pong

Simple messages with TTL one sent to neighbors

Tests if neighbor is offline / disconnected / overloaded Exploration-Ping-Pong

Used to explore and gather information of a node’s neighborhood

Higher TTL

Pings are forwarded, Pongs are returned carrying information on other peers

Uptime, Bandwidth, number of shared files, etc.

Store information about neighboring nodes in a peer cache

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 61

5.1 Gnutella 0.4

Exploiting the Ping-Pong cache Use stable peers in next bootstrap process Boost your connectivity

Add additional direct links to strong remote neighbors Compensate direct neighbor failures

Just reconnect to a remote neighbor

Ping-Pong protocols use a lot of bandwidth and are avoided in most modern protocols

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 62

5.1 Gnutella 0.4

5.1 Gnutella 0.4

Measurements taken at the LKN in May 2002

5.1 Gnutella 0.4

General Header Structure:

Describes the message type (e.g.

ping/pong, search,…)

Describes parameters of the message (e.g. IDs, keywords,…) General Header Structure:

GnodeID

16 Bytes Function 1 Byte MESSAGEHEADER: 23Byte

TTL 1 Byte Hops

1 Byte Payload Length 4 Bytes

GnodeID: unique 128bit Id of the Hosts

TTL(Time-To-Live): number of nodes a message may pass before it is killed

Hops: number of nodes a message already passed

5.1 Gnutella 0.4

Port

2 Bytes IP Address 4 Bytes PING (Function:0x00)

Nb. of shared Files

4 Bytes Nb. of Kbytes shared 4 Bytes No Payload PONG (Function:0x01)

Minimum Speed

2 Bytes Search Criteria

n Bytes QUERY (Function:0x80)

Nb. of Hits 1 Byte Port

2 Bytes GnodeID

16 Bytes Result Set

n Bytes QUERY HIT (Function:0x81)

Speed 1 Byte

File Index

4 Bytes File Name n Bytes IP Address

4 Bytes

5.1 Gnutella 0.4

Flooding: Received PINGS and QUERIES must be forwarded to all connected Gnodes

PINGS or QUERYS with the same FUNCTION ID and GNODE ID as previous messages are destroyed (avoid loops)

Save Origin of received PINGs and QUERIEs

Increase Hops by 1

If Hops equals TTL, kill the message

PONG and QUERY HIT are forwarded to the origin of the according PING or QUERY

Basic Routing Principle: „Enhanced“

Flooding

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

(12)

5.1 Gnutella 0.4: Ping-Pong

GNODE ID: 2000 IP: 002

GNODE ID: 3000 IP: 003

GNODE ID: 4000 IP: 004 GNODE

ID: 1000 IP: 001

25 26

17

22 24

17Gnutella Connect 18Gnutella OK

19PING 20PONG/IP:004 21PING 23PONG/IP:001 27PONG/IP:001 22PING 24PONG/IP:003 28PONG/IP:003 25PING

26PING 18

19 20 27 28

Gnode 2000 establishes a connection to 4000

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Gnutella 0.4: Ping-Pong

1

7 3 2

4 5

6

8

Gnu-Con Gnu-Con

Peer7 Peer3 Peer1 Peer5 Peer2 Peer4 Peer6

Gnu-Con OK OK

OK PING PING

PING PING

PING PING

PING

PONG PING PING

PING

PONG

PONG

PONG

Peer8

PING

PONG PONG

PONG

PONG

PONG PONG

PONG

PONG PING

Sample Gnutella 0.4 network: Sample message sequence chart according to the sample network:

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Disadvantages

High signaling traffic due to flooding Low bandwidth nodes may become bottlenecks Overlay topology not optimal

no complete view available

no coordinator

If not adapted to physical structure network load is sub-optimal

Zigzag routes

Loops

Advantages

No single point of failure Can be adapted to physical network Can provide anonymity

Routing anonymous, direct connection for transfer

Application areas

File-sharing (Freenet, Gnutella, Gnunet) Context based routing Systems

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.1 Pure P2P: DIscussion

A major problem of pure P2P systems is the limited scalability

Main reason

Random network layout Possibly degenerated network with high

diameters and potentially small bisection width Especially, weak nodes may easily become bottlenecks Request message often don’t reach their intended destinations

(TTL too short) or clog the whole network (TTL too long)

Also, in reality, not all peers are equal Weak modem peer always going on and off

Stable high-performance server directly in university backbone

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 70

5.2 Hybrid P2P

Idea of, e.g. Gnutella 0.6:

Take advantage of “stronger” peers, minimize damage “weak” peers can do

Strong peers are promoted to super peers or ultra peers

Have high uptime

Posses high-bandwidth, low-latency network

High computational power

High storage capacity

5.2 Hybrid P2P

Hybrid P2P uses a hierarchical network layout Super peers form a pure P2P network among

themselves

All other peers (Leaf peers) directly attach to one super-peer

Super-peer network acts as distributed file index

Super-peers request file lists from their leaf peers

i.e. each super “knows” what is offered by its leafs Queries are distributed in super-peer subnet only Combination of Pure and Central P2P!

5.2 Hybrid P2P

(13)

Network characteristic, compared to pure P2P Hub based network

Reduces the signaling load without reducing the reliability

Election process to select and assign super peers

Voluntarily or by statistics Super peers:

high node degree (degree>>20, depending on network size) Leaf nodes:

connected to one or more Superpeers (degree<7)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 73

5.2 Hybrid P2P

Superpeer Leafnode

5.2 Hybrid P2P

Sample graph

Major component

Separate sub networks

Hub connections (2nd hierarchy)

Superpeer Leafnode

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Bootstrapping:

Via bootstrap-server (hosted list from a web server)

Contains super peer addresses

Via peer-cache (from previous sessions)

Registration of each leaf node at the super peer it connects to

e.g. it announces its shared files to the super peer

Super peer updates routing tables Table containing which file is shared by which node

Super-peer may perform some load balancing

Hand peer over to another super peer if super peers are unbalanced

Suggest a node to be promoted to a super peer

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P

Routing

Partly decentralized

Leaf nodes send request to a Super peer

Super peer distributes this request in the Super peer layer

If a Super peer has information about a matching file shared by one of its leaf nodes, it sends this information back to the requesting leaf node (backward routing)

Hybrid protocol (reactive and proactive)

Routes to content providers are only established on demand;

content announcements from leaf nodes to their super peers Routing within super peer layer equal to Pure P2P

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P

Signaling connections (stable, as long as neighbors do not change):

Based on TCP Keep-alive Ping-Pong Content search

Content transfer connections (temporary):

Based on HTTP

Out of band transmission (directly between leaf nodes)

Out-of-band ≡ not using signal routes

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P

Query Requests

Leafnode sends request to super peer

A super peer receiving a request looks up in its routing tables whether content is offered by one of its leaf nodes

If yes, response message is returned to request sender with information on the node offering the content

Back-track-routing of responses is similar to pure P2P

Additionally, the superpeer forwards the request to the super peer network per flooding

Flooding similar to pure P2P, but messages remain in the super peer network (i.e. TTL, hopcounters, message caches, etc)

No query communication with leafs necesarry due to routing tables

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Routing

(14)

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

5.2 Hybrid P2P: Flooding

Leafnode Ultrapeer

(15)

5.2 Hybrid P2P: Ping-Pong

Sample Gnutella 0.6 network:

Sample message sequence chart according to the sample network:

4

L1 L3

S2 S3

S1

L2

L5 L4 L6 L7

Gnu-Con

L2 L3 L1 S1 S3 S2 L7

OK

PONG

L6 L5 L4

RTU PING

PONG PONG

PING PING PONG

PONG PING PING

QUERY QUERY

QUERY QUERY QUERY

QUERY QUERY

QUERY

QUHIT

QUHIT QUHIT QUHIT

QUHIT QUHIT QUHIT QUHIT

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Gnutella 0.6: Topology

43 39

7

100

3

118 116

18

Abstract network structure of a part of the Gnutella network (222 nodes Geographical view given by Figure on the right, measured on 01.08.2002

Geographical view of a part of the Gnutella network (222 nodes); The numbers depict the node numbers from the abstract view (Figure on the left, measured on 01.08.2002)

• Virtual network not matched to physical network. See path from node 118 to node 18.

• Superpeer (hub) structure clearly visible in abstract view

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Content requests and responses – QUERY (as defined in Gnutella 0.4) – QUERY_HIT (as defined in Gnutella 0.4)

Keep alive

– PING (as defined in Gnutella 0.4) – PONG (as defined in Gnutella 0.4)

Announcement of shared content

– ROUTE_TABLE_UPDATE (0x30), Reset variant (0x0): to clear the routing table and to set a new routing table for one leafnode

– ROUTE_TABLE_UPDATE (0x30), Patch variant(0x1): to update and set a new routing table with a certain number of entries (e.g. new shared files)

5.2 Gnutella 0.6: Messages

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

0 1 4 5

Variant Table_Length Infinity

0 1 2 3 4 5 n+4

Variant Seq_No Seq_Size Compressor Entry_Bits DATA

Disadvantages

Still high signaling traffic because of decentralization

No definitive statement possible if content is not available or not found

Overlay topology not optimal, as

no complete view available

no coordinator

Difficult to adapt to physical network completely because of hub structure

Advantages

No single point of failure Can provide anonymity

Application areas

File-sharing (Gnutella, eDonkey, Kazaa)

Context based routing (see chapter about mobility)

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

5.2 Hybrid P2P: Discussion

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig

Summary

Client-Server Peer-to-Peer

1.Server is the central entity and only provider of service and content.

 Network managed by the Server 2.Server as the higher

performance system.

3.Clients as the lower performance system Example: WWW

1.Resources are shared between the peers 2.Resources can be accessed directly from other peers 3.Peer is provider and requestor (Servent concept)

Unstructured P2P Structured P2P

Centralized P2P Pure P2P Hybrid P2P Pure P2P Hybrid P2P

1.All features of Peer-to- Peer included 2.Central entity is necessary

to provide the service 3.Central entity is some kind

of index/group database Example: Napster

1.All features of Peer-to-Peer included 2.Any terminal entity can be

removed without loss of functionality 3. No central entities Examples: Gnutella 0.4, Freenet

1.All features of Peer-to-Peer included 2.Any terminal entity can be

removed without loss of functionality 3. dynamic central entities Example: Gnutella 0.6, JXTA

1st Gen. 2nd Gen.

Structured Peer-To-Peer Systems The Third Generation

Distributed Hashtables CAN, Chord,…

Distributed Data Management – Wolf-Tilo Balke – Christoph Lofi – IfIS – TU Braunschweig 90

Next Lecture

Referenzen

ÄHNLICHE DOKUMENTE

– Impact of node failures on lookup failure rate. – lookup failure rate roughly equivalent to node failure

Jeder Knoten leitet ein Broadcast-Paket mit RangeHash X an alle ihm bekannten Knoten (mit aktualisiertem Range) zwischen seiner ID und X weiter.. Der Startknoten sendet

Basics of peer-to-peer systems: motivation, characteristics, and examples Distributed object location and routing in peer-to-peer systems3. Unstructured

Napster provided a service where they indexed and stored file information that users of Napster made available on their computers for others to download, and the files

ƒ Peer-to-Peer: Anwendungen, die Ressourcen am Rand des Internets ohne feste IP-Adressen ausnutzen Ressourcen: Speicherkapazität, CPU-Zeit, Inhalte, menschliche Präsenz.. Î

Eine Anfrage Q ergibt einen Treffer bei einem Dateinamen F, wenn die Wörter in Q eine Teilmenge von F sind.. Beispiel.: „Beatles Submarine“ ist ein Treffer „Beatles

The number of steps to find the remaining k−1 closest nodes can be no more than the bucket height of the closest node in the kth-closest node, which is unlikely to be more than

Moreover, it is very likely the design philosophy underlying P2P networks will gain importance in the development of mobile business and ubiquitous computing, particularly when the