Peer-to-Peer
Data Management
Hans-Dieter Ehrich
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
6. Peer-to-Peer Basics
The transparencies of this chapter are based on the package
Characteristics and Applications of Peer-to-Peer Infrastructures
by
Wolf-Tilo Balke and Wolf Siberski
24.10.2007
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
What is P2P?
P2P systems are overlay architectures, with the following characteristics:
►
Two logically separate networks
►
Mostly IP based
►
Decentralized and self organizing
►
Employ distributed shared resources (computing power and data storage)
►
Initially developed for file-sharing
►
Various realizations
►
Common basis for signaling: IP (TCP and UDP)
►
Common basis for data transmission: HTTP or special directly IP- based protocols
►
Use flooding in the overlay to a certain extent
Impacts of P2P
●
Rising flow sizes (60 kbyte -> 2 Gbyte)
●
30%-60% of the traffic in the Abilene backbone is caused by P2P applications
●
70% of the traffic in the German Research Network (DFN) is caused by P2P applications.
●
T-Online observes an increasing symmetry at the access-level.
●
LRZ (Munich Network Center) observes an increasing symmetry between
US and Europe
Impacts of P2P at the Abilene Backbone
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
18.02.2002 18.04.2002
18.06.2002 18.08.2002
18.10.2002 18.12.2002
18.02.2003 18.04.2003
18.06.2003 18.08.2003
18.10.2003 18.12.2003
18.02.2004 18.04.2004
18.06.2004 18.08.2004
Traffic portions in % per week
Unidentified Data_Transfers File_Sharing
Core of Internet2 infrastructure, connecting 190 US
universities and research centers
Only Signaling Possible data
transfers
●
Unidentified + data_transfers + file_sharing causes 90% of the traffic
●
Unidentified traffic and data_transfers increased significantly
Parts of P2P is hidden (port hopping,…)
Some P2P applications use port 80 data_transfers
Impacts of P2P at the Abilene Backbone
●
P2P Traffic amount (only signaling)
► Is still high (~50 TByte per week)
► Becomes a constant part of the traffic (since end 2002)
●
Slumps are assumed to be caused by
► Port closures (firewalls, NATs)
► Verdicts (Napster Case,…)
●
Data Transfers are caused presumably to a large extent by P2P apps
0 50 100 150 200 250 300
18.02.2002
18.04.2002
18.06.2002
18.08.2002
18.10.2002
18.12.2002
18.02.2003
18.04.2003
18.06.2003
18.08.2003
18.10.2003
18.12.2003
18.02.2004
18.04.2004
18.06.2004
18.08.2004
traffic in TByte per week
Unidentified Data_Transfers File_Sharing
Reason for These Experiences
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
Driving Forces Behind Peer-to-Peer
Development of the terminal capabilities:
●
1992:
►
Average hard disk size: ~0.3Gbyte
►
Average processing power (clock frequency) of personal computers: ~ 100MHz
●
2002:
►
Average hard disk size: 100 Gbyte
●
2007:
►
Average processing power (clock frequency) of personal computers: ~ 3GHz
Personal computers have capabilities comparable
to servers in the 1990s
Driving Forces Behind Peer-to-Peer
Development of the communication networks:
●
Early 1990s: private users start to connect to the Internet via 56kbps modems
●
1997/1998
►
first broadband connections for residential users become available
►
cable modem with up to 10Mbps
●
1999
►
Introduction of DSL and ADSL connections
►
Data rates of up to 8.5Mbps via common telephone connections become available
►
The deregulation of the telephone market shows first effects with significantly reduced tariffs, due to increased competition on the last mile
bandwidth is plentiful and cheap!
Development of P2P Applications
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
18.02.2002 18.05.2002 18.08.2002 18.11.2002 18.02.2003 18.05.2003 18.08.2003 18.11.2003 18.02.2004 18.05.2004 18.08.2004
datavolumes in % per week
Freenet
Direct Co nnect++
Carracho B lubster Neo -M o dus FastTrack WinM X Sho utcast A udio galaxy eDo nkey2000 Ho tline Gnutella B itTo rrent
BitTorrent FastTrack
Gnutella
edonkey
Shoutcast
Traffic portions of the different P2P applications and protocols from the traffic measured per week in the Abilene backbone from 18.02.2002 until 18.010.2004
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
Applications and Classification of P2P
● Abstract definition of the peer-to-peer paradigm
►
[A peer-to-peer system is] a self-organizing system of equal,
autonomous entities (peers) [which] aims for the shared usage of distributed resources in a networked environment avoiding central services.
►
Andy Oram (ed.). Peer-to-Peer: Harnessing the Power of Disruptive
Technologies. O‟Reilly, 2001.
No clear distinction
Some cases even misleading
Applications and Classification of P2P
Conventional Classification of P2P
File Sharing (Napster, Gnutella, Freenet)
Grid Computing (SETI@home)
Instant Messaging (ICQ, AIM)
Collaboration (Groove Workspace)
Classification by Means of Shared Resources
Information
Files
Bandwidth
Storage space
Processor cycles
P2P Applications Can Be Classified by Shared Resources
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
What is shared?
1. Information
►
File Sharing and Document Management
►
Presence Information
►
Collaboration 2. Bandwidth
►
Increased Load Balancing
►
Shared Use of Bandwidth 3. Storage Space
►
DAS, NAS, SAN
►
P2P Storage Networks 4. Processor Cycles
►
High Performance Computing
Information (1/5)
● File sharing
►
Classical application of P2P systems
Users offer files (music , videos, etc.) for free download
The application provides a unified view
Napster, Gnutella & Co
►
First large scale occurrence of digital copyright infringement
Strong reactions from industry,
e.g. Recording Industry Association of America (RIAA)
Information (2/5)
● Distribution of Software/Updates
►
Basic idea of distributing software updates or patches in a P2P fashion
Obviously used for obtaining updates for P2P client software (Gnutella & Co)
But also for a wide variety of other software distributions
►
Prominent examples
Patches for the game „World of Warcraft‟ by Blizzard Entertainment
Linux company Lindows distributes their Linspire (prev. LindowsOS) via P2P
►
Technology used
Today mostly BitTorrent (Block-based File Swarming)
Microsoft‟s Avalanche (File Swarming with Network Coding)
Information (3/5)
● Document Management
►
Usually centrally organized
►
But
Large portion of the documents created in a company are distributed among desktop PCs
without a central repository having any knowledge of their existence.
►
Solution
P2P networks which create a connected repository from the local data
on the individual peers.
Indexing and categorization of data by each peer on the basis of individually selected criteria.
Self organized aggregation of information from areas of knowledge.
Information (4/5)
● Presence Information
►
Important role in the self-organization of P2P networks and in scenarios related to omnipresent computers and information availability (ubiquitous computing).
►
Provides information about which peers and which resources are available in the network.
● Example: Instant Messaging Systems
►
P2P application which essentially uses presence information.
►
Peers pass on information via the network, whether or not they are available for
communication.
http://www.trillian.cc/
Information (5/5)
● Collaboration
►
Members of working groups can communicate synchronously, conduct joint online meetings and edit shared documents.
● Groupware :
►
offers functions like instant messaging, file sharing, notification, co-browsing, whiteboards, voice conferences and databases with real time synchronization.
►
Client/server based groupware has to be set up and administered on the server for each working group.
►
P2P groupware avoid additional administrative task and
central data management:
All of the data created is stored on each peer and is synchronized automatically.
Users can set up shared working environment for virtual teams (so-called shared spaces).
Users can invite other users to work in these teams.
Bandwidth (1/4)
●
Typical Centralized Approach
►
Files are held on the server of an information provider.
►
Files are transferred from there to the requesting client.
►
Spontaneous increases in demand exert a negative influence on the
Unicast
Router
Receiver Receiver
Router
Router
Receiver Receiver Receiver
Router Sender
Bandwidth (2/4)
● Increased Load Balancing
►
Achieve increased load balancing by taking advantage of transmission routes which are not being fully exploited.
● Peer-to-Peer Unicast:
►
Initial requests for files have to be served by a central server.
►
Further requests can be automatically forwarded to peers within the network, who have already received and
replicated these files.
►
Sample application: Skype
Router Sender
Router
Receiver/
Sender
Receiver/
Sender
Router
Receiver/
Sender
Receiver/
Sender
Receiver/
Sender
Receiver/
Sender Receiver/
Sender
Bandwidth (3/4)
●
Increased Load Balancing
►
Achieve increased load balancing by taking advantage of transmission routes which are not being fully exploited.
●
Information Channel Approach: new
new
new
new
new
new
info channel
info channel
info channel
info channel
info channel
info channel info
channel
new
Bandwidth (4/4)
●
Shared Use of Bandwidth
►
also facilitate the shared use of the bandwidth provided by the information providers.
●
Segmentation Approach:
Doc Part
3Part 2 Part
3
Part 1
Part 1
Part 2Part
Part 1 Part
2
Part 2Part
3
Part 3
Doc
Doc
Doc
Centralized Design Concepts Used to Store Data in a Company
Disadvantages:
Inefficient use of the available storage.
Additional load on the company network.
Necessity for specially trained personnel.
Additional backup solutions.
P2P Storage Networks (1/5)
Direct Attached Storage (DAS)
Network Attached Storage (NAS)
Storage Area
Networks (SAN)
P2P Storage Networks (2/5)
●
A P2P Storage Network is a cluster of computers, formed on the basis of existing networks, which share all storage available in the network
►
Examples: PAST, Pasta, OceanStore
●
Organization:
►
Each peer receives a public/private key pair
►
The public key is used to create an unambiguous identification number for each peer (with the aid of a hash function)
►
Each peer must make available some of its own storage, or pay a fee
►
Corresponding to its contribution, each peer is assigned a maximum volume of data which can be added to the network
►
A file is assigned an unambiguous identification number (hash function from the name or the content and the public key of the owner)
►
Storing the file and searching for it in the network takes place in the manner
described for the document routing model
P2P Storage Networks (3/5)
●
Buildup
ID 3
ID 25
ID 4
Hello ???
Hello ???
ID 1
Hash
neighbors
3 4 25
ID 1
ID 4
ID 25
ID 3
ID 17
ID 10
ID 8
P2P Storage Networks (4/5)
●
Store Documents
ID 1
ID 4
ID 25
ID 3
ID 17
ID 10
ID 8 3
4 25
1 17 25
1 4 10
3 4 8
1 10 17
10 17
3 8 25
Hash ID 11
ID 11
ID 11
ID 11
ID 11
ID 11
P2P Storage Networks (5/5)
●
Retrieve Documents
ID 1
ID 4
ID 25
ID 3
ID 17
ID 10
ID 8 3
4 25
1 17 25
1 4 10
3 4 8
1 10 17
10 17
3 8 25
ID 11
ID 11
ID 11 requestor: 1
ID 11
ID 11
requestor: 1
Processor Cycles
●
Increasing Requirements for High Performance Computing
►
i.e. in the field of bio-informatics, logistics or the financial sector
●
Available Computing Power of the Networked Entities often Unused
Using P2P Applications to Bundle Processor Cycles:
►
Forming a cluster of independent, networked computers in which a single computer is transparent and all networked nodes are combined into a single logical computer
►
Achieve computing power which even the most expensive super-computers can scarcely provide
►
“Grid Computing”
●
Examples:
►
Popular example: SETI@home
Calculations during the idle processor cycles of participating peers
►
Advanced vision of grid computing: Globus Toolkit
Standardized middleware for grid application
Note: The core of SETI@home is a classical Client/Server
application
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
Financial Motivation in P2P Systems
●
P2P applications often lack revenue generation
►
Needed? Usually barter structures are instantiated (principle of reciprocity)
●
Revenue model of P2P
►
Currently only indirect revenues (e.g. ads, cross-selling)
►
Viable direct business models are sought
●
Key Questions
►
Who are the players?
►
What open issues are to be solved?
►
How can parties recover their costs
and earn a margin of profit?
P2P Business Applications need Revenue Creation
Instant Messaging
►
Direct message exchange
►
At least two interaction partners
► Services like AIM have to provide infrastructure for about 200 Mio users
Grid Computing
►
Offering of computing resources
Digital Content Sharing
►
Exchange of content
►
Additional functionality connected with content
Collaboration
►
Work or play in ad hoc groups
►
Support regarding coordination and
cooperation
Application Style vs. Service Style
P2P Application Style
►
Packaged solutions
(e.g. Lotus Instant Messaging, Groove)
►
Set of common definitions (e.g. .NET, Gnutella)
P2P Service Style
►
Services based on P2P interaction model
►
No once-bought-
used-forever model
P2P Interaction Styles
Providing Interaction
Partner
Receiving Interaction
Partner Legal Owner
of the Object
Mediating Service Object of Interaction
provides receives
owns the rights of
facilitates the interaction
Business Models vs. Revenue Models
Business Model:
Totality of processes and arrangements that define a company‘s approach to commercial markets in order to sell services and/or goods and generate profits.
Revenue Model:
Includes all arrangements that permit the participants in business interactions to charge fees which are covered by one or several other participants in order to cover costs and add a margin to create profit.
Revenue Model is part of a business model
Revenue Models
Revenue Models
Indirect Revenue Models
Product is free of charge
Gain received from third party
Realisations:
Advertisement
Affiliate Model
Bundling
Revenue Models
Revenue Models
Indirect Revenue Models Direct Revenue Models
Product is free of charge
Gain received from third party
Realisations:
Advertisement
Affiliate Model
Bundling
Receipts come directly from customer
Realisations:
Sales
Transaction fees
Subscription
Requirements of a Revenue Model
● Differentiated Charging
►
Charge according to criteria of usage
►
Prerequisite for efficient revenue models
Intense usage leads to high charges
● Allocation Effectiveness
►
Revenue stream to the appropriate receiver
Party that has incurred the cost receives revenue
Revenue models for… Instant Messaging (1/3)
● Features of Instant Messaging
►
Text and/or voice message exchange between peers
►
Services like “Buddy list” and other functionalities
Services don‟t have to be central
Object: message
Owner Provider: peer/ sender of message
Receiver: peer
Mediator: instant messaging service
Revenue models for… Instant Messaging (2/3)
●
Not P2P from
technological point of view
●
Communication between peers
Topologies
Server provides service Server only lists buddies Pure P2P-topology
Communication self- governed by peers
Message exchange and buddy list
service
decentralised
No server involved
Revenue models for… Instant Messaging (3/3)
● Revenue model for IM provided in application style
►
License fees
►
Optional professional services
● Revenue model for IM provided in service style
►
Subscription fees
Undifferentiated Not efficient
Fees per log on Not very efficient
Hard to realise in pure topology
Usage dependent Efficient
Only problem-free in C/S-Topology
Revenue models for… Digital Content Sharing (1/3)
● Features of digital content sharing
►
Prominent example: Exchange of entertainment media files
►
But: Sharing of any content possible,
in particular for decentralised knowledge management
►
Streaming of Content
►
Catalogue service
Object: digital content
Owner: provider or third party
Provider: peer
Receiver: peer
Mediator: digital content sharing service (not necessarily)
Revenue models for… Digital Content Sharing (2/3)
● Revenue model for DCS as application style
►
License fees
►
Consulting services
● Revenue model for DCS as service style
a) Legal owner is not identical with provider
►
Membership/Subscription fees
►
Fees per log on
►
Matchmaking fees
legally
problematic
Legal owner of the rights of exchanged files is not a participant in the transaction, therefore he cannot cover his costs
Revenue models for… Digital Content Sharing (3/3)
● Revenue model for DCS as service style
b) Legal owner is not identical with provider but the owner receives compensation
► Billing step implemented into content exchange
► Mediator aggregating middleman
► P2P-Distribution: No clear economic value for owners
c) Legal Owner is identical with provider
► Differentiated charging and owner is compensated
► Providers don‟t sell object but limited rights to its usage
In all cases: Additional content protection
scheme is needed to enforce payment!
Revenue models for… Grid Computing (1/3)
● Features of grid computing
► Utilization of distributed computing resources
► Often C/S-based, not true P2P from technological point of view but: complex problems are solved by more or less independent peers
► Pure understanding: Peers can provide and demand resources
Object: computing resources
Owner Provider: peer, providing resources
Receiver: using the computing resources
Mediator: management of resource provision, often central server application
Revenue models for… Grid Computing (2/3)
● Revenue model for GC as application style
Enterprise software sale
►
License fees
►
Professional services
● Revenue model for GC as service style
Public internet exchange or cross-company a) Compensating the Mediator
►
Management of Grid on behalf of a third party:
Cost of mediating service + margin has to be charged
►
Management of Grid by Receiver:
Business utilisation has to cover the cost
Revenue models for… Grid Computing (3/3)
● Revenue model for GC as service style b) Compensating the Provider
►
Often: providing for free or for a part of the results
►
Desirable: monetary reimbursement
Pay-per-use model feasible from technical point of view
Problem: high transactional costs for payment
Highly efficient methods for micropayment needed
Problem: Financial incentives may be incapable of attracting providers
No problems regarding allocation effectiveness and efficiency but
Problem of micropayment
Problem of sufficient business value
Revenue models for… Collaboration (1/2)
● Features of Collaboration
►
Providing functions beyond email und workflow
►
Supporting standard groupware applications
►
P2P adds flexibility, e.g. ad hoc working groups
►
Here: groupware applications used in business context
defined and authenticated members
Object: message or document
Owner Provider: partner in workgroup
Receiver: partner in workgroup
Mediator: collaboration server (not necessarily)
Revenue models for… Collaboration (2/2)
● Revenue model for Collaboration as application style
► Licensing models
► High demand for professional services
● Revenue model for Collaboration as service style
hosted as a service
► Undifferentiated Not efficient
► Fees for buddy list/ Not very efficient
catalogue service etc. Hard to realise in the pure topology
► Transaction-based fees Efficient
(e.g. transferred data) Only problem-free in C/S-Topology
► Further consideration: User or group based bills possible
Discussion (1/3)
● Revenue models for P2P application style are not different from those for traditional application style
● Differentiated charging difficult for IM and Collaboration Groupware
►
Providers Infrastructure
● Allocation effectiveness difficult for DCS
►
DCS affects copyrights belonging to third party
● GC suffers from overhead of micropayments
Accounting centre
required
P2P
Strategies for different parties
to increase their revenue?
Discussion (2/3) – How to increase Revenue?
Instant messaging
►
Bundling with Interactive agents
►
Providing Location based services
►
Multiple service levels (example: Skype/ Skype Out)
►
…
Digital content sharing
Try to own communities
Bundling digital content with other goods (example: concert tickets)
…
What further possibilities are conceivable?
Discussion (3/3) – How to increase Revenue?
Grid computing
►
Bundling is no solution and
micropayment may not be feasible
►
Barter-like structures:
provide information goods as reimbursement
►
…
Collaboration
Bundling similar to IM
Multiple service levels
…
Build ‘Closed Communities’:
P2P technology, but strong access control
Overview
1.
Status Quo: Networks (Over)Filled with Peer-to-Peer Traffic
2.
Driving Forces Behind Peer-to-Peer
3.
Applications and Classification of P2P
4.
What is shared?
5.
Markets and Revenue Generation
6.
Where is P2P technology reasonable?
1) Freenet 2) Buzzpad 3) WuWu 1)
2)
3)
[Most relevant P2P-Applications in the year 2001]
To Peer-to-Peer or not to Peer-to-Peer
● Often Discussed Problem: Where is P2P Really Needed?
►
Multiple classification systems have been designed to judge how suitable a P2P solution might be for a particular problem
● E.g. in the Form of Decision Trees
Conclusions
►
Based on characteristics of a wide range of P2P systems including budget, resource relevance, trust, rate of system change, criticality
►
“…the characteristics that motivate a P2P solution are limited budget, high relevance of the resource, high trust between nodes, a low rate of system
change, and a low criticality of the solution. We believe that the limited budget requirement is the most important motivator.”
►
M. Roussopoulos, M. Baker, D. Rosenthal, T. Giuli, P. Maniatis and J. Mogul: “2 P2P or not 2 P2P?“, IPTPS 2004
http://www.springerlink.com/content/bvx594yud8rd2gfp/
►