• Keine Ergebnisse gefunden

For SOUP to be feasible, it first and foremost must not introduce excessive bandwidth con-sumption. Here, the traffic that was originally flowing towards a sophisticated centralized infrastructure has to be absorbed by the participants themselves (in particular by the mir-rors of data), as they form and maintain the SOUP overlay themselves. In particular, there are two kinds of overhead, for which SOUP must show that it is able to keep them within reasonable boundaries.

(i) First, the overhead to maintain the overlay itself must remain low. Recall that while maintaining the overlay, some nodes act as bootstrapping nodes or relay DHT requests for mobile nodes.

(ii) Also, the overhead that is induced due to the selection of mirrors must be manageable.

Recall that most nodes select around seven other nodes as their replicas. The com-munication overhead must be limited at both the mirrors and the selecting node itself.

Other participants may request data from the former, while the latter must not change its mirrors very often, since doing so results in a retransmission of the whole data to the new mirror(s).

0 200 400 600 800 1000 1200 1400 1600 1800 0

10 20 30 40 50

Time (s)

Bandwidth Used (KB/s)

Node Join or Leave

Gateway Traffic

(New Mobile Connected)

Figure 11.1: The control overhead introduced by SOUP is low.

11.2.1 Overlay Overhead

The bandwidth consumption of the DHT at the bootstrapping node is shown in Figure 11.1.

The figure depicts those thirty minutes of the whole deployment in which the bootstrapping node—as the most utilized node in the deployment—experienced the most requests.

Recall that the bootstrapping node is not only responsible for booting new nodes into the DHT ring, but also acts as a gateway for all mobile nodes and additionally has to play its role as a regular node on the DHT, and is therefore responsible for queries towards a certain range in the key-space.

Overall, the overlay overhead is very low. Only upon join and leave operations (i.e., shifting some entries in the DHT) the network interface is utilized at around 20-40 KB/s.

At the same time, lookups do not have a visual impact.

As a result, the cost of relaying for a mobile node (i.e., forwarding lookups and their results) is low as well. Only during the join procedure of a mobile node, which requires several DHT operations, relaying consumes a noticeable amount of bandwidth, which, how-ever, still remains very low.

151 11.2 Bandwidth Consumption

0 200 400 600 800 1000 1200

0

Figure 11.2: The communication overhead of SOUP remains manageable.

11.2.2 Mirroring Overhead

The traffic introduced by SOUP itself is also manageable. Figure 11.2 shows the most band-width intense period of 20 minutes observed for any user during the time of data collection.

Messaging or simple profile requests do not consume a lot of bandwidth and are hardly distinguishable from an idle link. A more intense activity like skipping through a photo album does not consume a regular user’s bandwidth as well, as she takes her time to view the pictures.

Note that this kind of data traffic, i.e., the traffic a user generates byconsumingcontent, is approximately the same as in centralized OSNs, as the user needs to download the data in those systems as well. As shown above, the overhead introduced by SOUP to lookup the location of the data item of interest is low.

Only whenproducing or mirroringcontent, SOUP will generate additional traffic. The reason is that produced data has to be distributed to the mirrors as well. At the same time, the uplink of a user who acts as a mirror might be utilized. As a consequence, in Figure 11.2, the most traffic is generated at the creation of a photo album, and distributing that photo album to the user’s mirrors. As in centralized OSNs, mobile users on a data plan should delay such uploads until they can access a WiFi link.

0 5 10 15 0

1 2 3 4 5

Selection Round

Difference to Previous Round

One Random Node Added Each Round

Figure 11.3: SOUP incurs little variance in Mirror Set.

11.2.3 Stability of Mirror Sets

It is further important for SOUP to offer stable mirror sets. The reason is that every time a user changes her mirrors, she needs to transmit all her data to the new mirror(s). At the same time, if there is no fluctuation in her mirror set, she can then transmit the—much smaller—updates to her data to her mirrors.

Figure 11.3 shows that, overall, the mirror sets remain stable and do not differ much between selection rounds. After the initial rounds, most mirror changes are additions of a random node as described in Chapter 8. Each round, only few nodes change additional mirrors. As a consequence, the whole data of a user does not have to be transmitted often, and the communication overhead remains modest.

Note that during the initial mirror selections many users might not have uploaded much data to the OSN yet. They have just joined the network and are interested in creating an audience for their data first. Since users find a well suited set of mirrors quickly, unnecessary retransmissions of a large set of multimedia data might thus not occur often.

153 11.2 Bandwidth Consumption

10−2 100 102 104 106

0 0.5 1

Size(KB)

CDF

Figure 11.4: The CDF of item sizes in the collected dataset. Most items are text items and therefore relatively small in size. Only 1% of all collected items is larger than 1 MB.

11.2.4 Stress-testing SOUP

With regards to bandwidth consumption, the real-world deployment did not push SOUP to its limits. Thus, in another experiment, SOUP was specifically stress-tested. To emulate a higher load, one particular mirror was chosen to store a large amount of user data, which was then requested with much higher frequencies than during the deployment.

As the large amount of user data the complete Facebook and Google+ profiles of 20 users—including private data—were collected. These profiles offer details beyond a crawler’s results, as those often do not include major parts of a user’s data (e.g., photos on Facebook are not publicly available by default) [36].

The average profile size was approximately 10 MB, with the largest profile containing hundreds of photos in 27 photo albums and one video. This profile consumed 60 MB of disk space in total. At the same time, many profiles are small-sized, i.e., people do not upload much multimedia information. This coincides with observations that only few users in OSNs have a high degree and therefore are encouraged to share their data [156].

Overall, the data disclosed 2035 unique data items, for which Figure 11.4 shows the CDF of item sizes. More than 35% of all items are less than 10 KB in size, and 93%—including most images—are less than 100 KB in size, while large items rarely exist. These findings mainly coincide with those in [36]. The whole data sums up to 206 MB. Note that even if some profiles in the OSN happen to be much bigger, this would mainly affect the storage required at the mirrors, as long as the item sizes remain small.

0 30 60 90 120 150 180 210 240 270 300 0

200 400 600 800 1000

Time (Seconds)

Bandwidth Used (KB/s)

1req/s 10req/s 20req/s

Figure 11.5: Bandwidth consumption at high request rates.

One mirror was then selected as a host for all this data. Recall that storing these 20 profiles is three times as much as 90% of SOUP nodes will have to store (see Chapter 9.2.2).

Afterwards, the mirror received requests asking for text, photo and video data according to the request probabilities for each data type as described in [181].

As shown in Figure 11.5, the average bandwidth consumption is well below 600 KB/s, even if the mirror has to handle 20 requests per second. With an increasing request fre-quency large items are hit more often, which causes the spikes in the measurements.

As a result, a request might time out once a mirror becomes overloaded or limits its bandwidth, which may happen especially to nodes mirroring popular data. Note that unlike other approaches, in which users are stuck with a static or heteronomous set of mirrors [42–

46], SOUP will adapt to this situation.