• Keine Ergebnisse gefunden

Introduction to Peer-to-Peer Networks

The term Peer-to-Peer (P2P) in general describes a communication approach of a distributed system. Many Peer-to-Peer definitions are proposed by different researchers. The IETF proposed a definition covering the common aspects of P2P [CI09]:

We consider a system to be P2P if the elements that form the system share their resources in order to provide the service the system has been designed to provide. The elements in the system both provide services to other elements and request services from other elements.

P2P systems are designed to locate and share resources. The type of resources may vary from processor power, memory, storage to bandwidth or even a combi-nation of all of them. Devices participating in a P2P system are called nodes or peers.

Each node neither acts as plain server nor as plain client; nodes act as the service provider and requester at the same time. There is no distinction between clients and servers [CI09]; this concept is often referred to as the servent concept [S+01].

The first larger scale P2P implementations were file sharing applications like Nap-ster [Inc02], one approach where storage information on data objects was managed centrally, whereas the actual objects were stored within nodes. This server-based management approach makes the concept vulnerable, shutting down the manage-ment server makes the network unusable. Kazaa [Smi03] was designed to distribute

media data among users and completely decentralized. More P2P applications have followed, and more prominent fields are instant messaging, telephony, distributed computing and live video streaming. A successful application for instant messaging and telephony is Skype [Tho06]; a widely used, open source application for dis-tributed computing is SETI@home [KWA+01], and the Huazhong University of Science and Technology developed the video streaming application PPLive.

Structured and Unstructured P2P Networks Two types of P2P overlay net-works are the subject of active research and used in commercial products: structured and unstructured [SW05].

Unstructured P2P Networks One considers a P2P networkunstructuredwhen the links do not form a predefined topology, but may be chosen randomly. One advantage is that a new peer can easily join the network without establishing a set of mandatory links. The search process in an unstructured P2P overlay is performed by (partially) flooding the network, which we call exhaustive search in the remainder of this thesis. One obvious disadvantage is the often unsatisfying hit rate; the search process creates high network traffic but offers little or no guarantees about finding the object, and short of a byte-for-byte comparison, it is not possible to determine whether two objects are the same. If a node is searching for widely distributed informations it is very likely that the search process will return several successful hits; looking for data located at only a few peers may return an unsuccessful query.

However, many P2P overlay networks are unstructured and scale to several millions of users. The reason might be that unstructured networks are very suitable for human-friendly keyword searches. Note that the search is not limited to simple keywords; any query that can be evaluated locally on a peer is possible.

Structured P2P Networks AstructuredP2P overlay network does create a prede-fined pattern of node connections, which needs to be actively maintained. Compared to unstructured node arrivals, departures and failures cause considerable mainte-nance signaling traffic. Nevertheless, structured P2P overlay networks allow fast key-lookup mechanisms using distributed hash tables (DHTs). They hash peer and object identifiers and distribute the hash buckets among the peers. A DHT-specific routing algorithm defines how peers can route through the overlay when they want to retrieve a certain object. Typically, the number of messages needed to locate an object in a DHT grows logarithmically with the number of peers in the system.

Thus, DHTs are very efficient for simple key-value lookups (for which they have been designed). Because objects are addressed with their unique names, searching in a DHT is difficult to make efficient [YDRC06, LLH+03, RV03a]. Furthermore, DHTs require the use of (globally) unique object identifiers, typically SHA-1 hashes, which are not very suitable for human users to type.

The hash uniquely determines the location of a peer in the overlay and its neighbors, as well as the placement of content on peers. Current DHTs assume that there is

some out-of-band mechanism for mapping more human-friendly names into object identifiers, but none of them goes into detail about such mappings.

The key-feature of DHTs is the fast lookup process and the high query hit rate; even rare information can be reliably addressed. The query success rate is independent from the number of objects available and therefore equally high for popular as well as rare objects. DHTs have strict rules about how the overlay is formed and where content should be placed in the network. The research world has seen several exam-ples of DHTs [KK03, DZD+03, MNR02, RD01b, I. 01, ZKJ01, RFH+01a, Pla99, MBR03]. They build unique network topologies, which determine large parts of the runtime behavior. For example, a topology may be social network-inspired e.g., Symphony [MBR03]. However, there are many other overlay networks with unique topologies and characteristic runtime behavior.

Search and Lookup As discussed above, unstructured and structured networks have different strengths and weaknesses. The terms ‘Search’ and ‘Lookup’ can be easily misinterpreted, because they both deal with the process of locating informa-tion, but they describe a completely different approach.

Exhaustive Search In P2P networks ‘Search’ or often ‘exhaustive search’ de-scribes a mechanism for searching fulltext information. It may be distributed in the P2P network among several peers. Exhaustive Search usually refers to a probabilis-tic search method, i.e., only a fraction of the available peers will receive and process the query, otherwise the immense message load caused by a search query would seriously decrease the performance of the overlay network. There have been several implementations of search algorithms. Gnutella used a decentralized flooding of queries [SW05]. Kazaa’s method has been found to be very efficient and robust in practice [LKR05]. The BubbleStorm network [TKLB07] is a fully decentralized network based on random graphs and is able to provide efficient exhaustive search with tunable success rates. Irrespective of the search mechanism, the actual content transfer happens directly between the two peers.

Key Lookup ’Key lookup’ or simply ‘lookup’ usually refers to the lookup mech-anism used in DHTs. They are very efficient for simple key-value lookups (for which they have been designed), but fulltext search is still an ongoing challenge and usually imposes high signaling traffic on the network. Because the content is placed with hash functions, real search queries are hardly feasible in DHTs. For example, it is not feasible to query a DHT for all objects whose name begins with ‘Foo.’ This would usually require askingevery peerwhether it has any matching objects.

Structured networks, on the other hand, offer very efficient means of looking up known objects, but their ability to perform widely spanning searches is limited.