Universität Stuttgart

(1)

Universität Stuttgart

Fakultät Informatik, Elektrotechnik und Informationstechnik

Diplomarbeit Nr. 2555

Protocol to Acquire and Cache Large Data in Sensor Networks

Harald Weinschrott

Studiengang: Softwaretechnik

Prüfer: Prof. Dr. Kurt Rothermel

Betreuer: Andreas Lachenmann, Daniel Minder begonnen am: 13. November 2006

beendet am: 15. Mai 2007

CR-Nummer: C.2.1, C.2.2, C.2.4, C.3

Institut für Parallele und Verteilte Systeme Abteilung Verteilte Systeme

Universitätsstraße 38 D-70569 Stuttgart

(2)

Abstract

In this diploma thesis a protocol to acquire and cache large data objects in sensor networks is developed. This work is motivated by the increasingly hybrid character of these networks which, in conjunction with application adaptation, leads to the need to exchange code modules or other data objects on the nodes of the network.

The special character of this protocol is based on request-driven data transfer from within the network, combined with a caching mechanism to reduce network load.

No related work is known with the special nature of this work supporting a high number of large objects.

The protocol developed in this thesis provides a discovery mechanism based on ex- panding ring search, a sophisticated caching mechanism which covers node selec- tion and a proper caching policy and, finally, a transport mechanism to transfer the objects. In the evaluation, different aspects are separately studied, but also the over- all behavior of the protocol is investigated. The combined approach of this thesis achieves a balanced network load, and an overall reduction of energy consumption in the network with only little overhead.

Zusammenfassung

Diese Diplomarbeit hat die Entwicklung eines Protokolls zur Anfrage und Über- tragung großer Datenobjekte in Sensornetzwerken zum Ziel. Die Notwendigkeit dieses Protokolls liegt im verstärkt hybriden Charakter dieser Netzwerke, der, be- dingt durch Adaptionen der Anwendung, die Notwendigkeit zum Austausch von Code-Modulen entstehen lässt. Der besondere Charakter dieses Protokolls ergibt sich durch das Verhalten der Netzwerkknoten aktiv Objekte anzufragen und deren Übertragung zu initiieren und zudem durch den Caching-Mechanismus zur Re- duzierung der Netzwerklast. Kein anderer Ansatz entspricht diesen Anforderungen und unterstützt dabei eine hohe Anzahl großer Objekte.

Das hier entwickelte Protokoll bietet einen Mechanismus zum Auffinden von Ob- jekten, einen Caching-Mechanismus, der die Knotenauswahl sowie eine Caching- Strategie abdeckt. Zusätzlich bietet dieses Protokoll eine Transport-Komponente, um die Objekte zu übertragen. Die Evaluierung untersucht gesondert einzelne As- pekte, aber auch das Gesamtverhalten des Protokolls. Die Kombination von Mech- anismen dieser Arbeit erzielt ein gleichmäßig ausgelastetes Netzwerk, sowie eine Reduzierung des Energieverbrauchs im Netzwerk bei nur geringem Overhead.

(3)

Contents iii

List of Tables

2.1. Properties of microcontrollers [33] . . . 7

2.2. Memory size of sensor nodes [33] . . . 7

2.3. Sensor node radio modules [33] . . . 8

5.1. Parameters for node selection . . . 26

8.1. Caching characteristics . . . 87 A.1. Node coordinates for topology with 100 nodes . . . I

(7)

List of Figures vii

List of Figures

2.1. Computer network classification . . . 4

2.2. Sensor nodes: Mica2 [1], Mica2Dot [2], and Telos [3] . . . 6

2.3. TinyOS system architecture . . . 13

5.1. Node selection along the route of data transfer . . . 24

5.2. Node selection with extended along the route approach . . . 25

5.3. Node selection with random approach . . . 26

5.4. Node selection with simple hopcount approach . . . 27

5.5. Node selection with advanced hopcount approach . . . 28

5.6. Source discovery in fixed-size n-hop neighborhood . . . 34

5.7. Source discovery with expanding ring search . . . 35

5.8. Source discovery with the equal-distance approach. . . 37

5.9. Source discovery with the tree search approach. . . 37

5.10.Counter-based request propagation . . . 40

5.11.Search area overlap signaling with concurrent requests . . . 41

5.12.Search area overlap with concurrent requests for same object . . . 41

5.13.Reply propagation via broadcast . . . 43

5.14.Reply propagation via unicast . . . 43

5.15.Transport problem division . . . 47

5.16.Transport tree with multiple requesters . . . 47

6.1. Component model of PACLD. . . 51

6.2. Complete discovery and data transport handling at the requester . . . 52

6.3. Source Discovery at requester . . . 53

6.4. Source Discovery at node within the search area . . . 54

6.5. Data Transport at requester . . . 55

6.6. Data Transport at source . . . 56

6.7. Counter-based broadcast propagation mechanism . . . 58

6.8. Implicit-ACK unicast propagation mechanism . . . 60

7.1. Crossbow Mica2 component diagram . . . 62

7.2. Cache structure in flash memory . . . 66

8.1. Random topology of 100 nodes (left) and 50 nodes (right) . . . 72

8.2. Line topology . . . 73

8.3. Cost per discovery . . . 74

8.4. Time per discovery . . . 75

8.5. Percentage of Hits . . . 76

8.6. Average distance to source . . . 76

(8)

List of Figures viii

8.7. Deviation of discovery results . . . 77

8.8. Varying the start value of the ERS . . . 78

8.9. Discovery with varying cache size . . . 79

8.10.Discovery with varying network density . . . 79

8.11.Average distance to source with non-random object distribution . . . 80

8.12.Average time per transfer . . . 81

8.13.Average cost per transfer . . . 82

8.14.Object throughput . . . 82

8.15.Average distance on the route . . . 83

8.16.100 nodes . . . 84

8.17.80 nodes . . . 84

8.18.65 nodes . . . 84

8.19.50 nodes . . . 84

8.20.Cache fill state at different points in time. . . 86

8.21.Development of the average distance to source . . . 87

8.22.Varying network strain with caching . . . 88

8.23.Varying network strain without caching . . . 89

8.24.Comparison of average distance to source . . . 90

8.25.Average transmission overhead per data transfer . . . 91

8.26.Average energy consumption . . . 91

8.27.Distribution of energy consumption. . . 92

(9)

Listings ix

Listings

2.1. NesC interface definition . . . 11

2.2. NesC module . . . 11

2.3. NesC configuration . . . 12

2.4. NesC concurrency . . . 12

6.1. Algorithm for evict selection . . . 57

6.2. Algorithm for value computation . . . 57

7.1. Interface IPacld . . . 63

7.2. Interface IDiscovery. . . 64

7.3. Interface ITransport . . . 64

7.4. Interface ILookup . . . 65

7.5. Interface ICache . . . 65

7.6. Interface IBCast . . . 67

7.7. Broadcast control information . . . 67

7.8. Interface IUCast. . . 68

7.9. Unicast control information . . . 69

(10)

CHAPTER 1. INTRODUCTION 1

1. Introduction

This introduction gives a short motivation for the problem which is the topic of the diploma thesis ”Protocol to Acquire and Cache Large Data in Sensor Networks”. Then, the problem is described in detail and, afterwards, the main goals of this thesis are listed. Finally, this chapter is closed with an outline of the remainder of this document.

1.1. Motivation

Today wireless sensor networks are an advanced and active research field. Since the development of highly integrated computing devices is still in progress, the decline in price of these small sensor nodes makes a large number of applications feasible.

A lot of research work has been done concerning the basic challenges of sensor networks. The limited energy supply is the most challenging problem when dealing with sensor networks. However, limited computing resources, i.e., processing power and memory, and unreliable low-bandwidth wireless communication links amplify these basic challenges.

In order to ease the application development on wireless sensor networks, research has focused on the development of middleware systems which cope with the basic challenges. Freed from these challenges, the application programmer can concentrate on the tasks of the sensor network and let the middleware system meet the demands like, e.g., long lifetime of the network. Since such middleware systems introduce some overhead, special care has to be taken to minimize this overhead.

One possibility for a middleware system of providing support for a wide range of applications is adaptation. By adapting the application according to environmental conditions as well as changing requirements, the need for different code modules on the sensor nodes can arise.

Due to the fact that sensor networks are often embedded into a physical environment, manual code updates are often impossible. Furthermore, it is expensive to maintain an administrative view outside the network whereon update decisions can be made because this would require frequent transmissions of status information through the network. This is especially of importance in large-scale sensor networks. Therefore, the ability of the sensor nodes to make update decisions, based on their local state or based on information about their neighborhood, is promising.

(11)

The possibility of local update decisions on the nodes of the network leads to the need for protocols to acquire large data objects, i.e. code modules, in wireless sensor networks. The development of such a protocol is subject of this thesis. In the following, this protocol is referred to as PACLD (Protocol toAcquire andCacheLargeData). In the next section, the problem is described in more detail, and a motivation is given based on the TinyCubus [28] middleware system.

1.2. Problem Description

As motivated in the section above, protocols to acquire large data in sensor networks are inevitable. The special requirements for this thesis are based on the TinyCubus [28] project. TinyCubus is a framework for TinyOS-based [16] sensor networks, which yields complexity reduction for sensor networks. It provides selection and adaptation of system and application components. Furthermore, it supports cross-layer cooperation of these components and their efficient dissemination and installation.

Since different nodes can run different software components through the course of adaptation, specific needs for data objects can arise on the different nodes in a sensor network running TinyCubs. In TinyCubus, at first, there are two types of data objects, which may be requested by the nodes in the network. First, there are the software components. Second, there are optimized memory layout descriptions [22] for each combination of components.

In contrast to other dissemination protocols, not all of the nodes have to receive the same data objects. Objects may be changed at the receiver in order to be installed on the sensor node and, therefore, they are not available for propagation to other requesters. Moreover, data transfer is initiated at the nodes, and not outside the network.

This thesis focuses on a protocol which integrates multiple properties. In addition to the code acquisition component, a component for caching has to be developed in order to avoid that every request has to be sent to the base station of the network. This component has to select dedicated caching nodes which provide access to some data objects. With these caches further requests can then be answered within the network.

However, replication mechanisms are not in the scope of this thesis.

In addition to these requirements, sensor network challenges, as for example energy efficiency and scalability, have to be considered. For a more detailed discussion of these challenges see Section 2.4. For a complete presentation of the system model, which is the base for this protocol development, see Chapter4.

1.3. Goals

This section defines the main goals of the protocol which is subject of this diploma thesis. PACLD is intended to reliably retrieve large amounts of data in sensor networks,

(12)

considering energy efficiency as an important design constraint. The main requirements for this protocol are listed in the following:

• Data source discovery

• Reliable data transfer

• Caching

In addition to these requirements, the following design principles are important:

• Resource-efficiency

• Scalability

• Robustness

For a detailed discussion on how to achieve these goals see Chapter 5which presents the design space.

1.4. Outline

The remainder of this document is organized as follows. After the introduction, Chap- ter2explains the principles and characteristics of the wireless sensor networks. Chal- lenges are named, and strategies to cope with them are explained.

Chapter3presents related work and shows the differences to this thesis. Finally, this chapter closes by pointing out the contribution of this thesis.

To define the problem and the requirements, Chapter 4 presents the system model.

This is the base for design decisions studied in Chapter 5 which presents the design space.

Chapter 6explains the architecture and design of the protocol and presents the components which build up the protocol and their behavior. The design is the base for the implementation of the protocol which is documented in Chapter7.

The protocol is evaluated using the implementation. Procedure and results of the evaluation are documented in Chapter 8. Finally, Chapter9 gives a summary of the document and shows possible extensions and enhancements of the protocol developed in this diploma thesis.

(13)

CHAPTER 2. PRINCIPLES OF SENSOR NETWORKS 4

2. Principles of Sensor Networks

This chapter shows a classification of computer networks and the location of wireless sensor networks within this classification. Then, various fields of application are presented. To implement these applications, sensor network specific challenges have to be handled. These are listed and explained afterwards. This chapter is closed by a short introduction of a widely used programming model of wireless sensor networks which is the base for the implementation of PACLD.

2.1. Network Classification

This section presents a classification of computer networks (see Figure 2.1) according to [41]. The first criterion to categorize networks is the type of network interface, whether it is wired or wireless. Wireless links between nodes in a network normally imply a lower bandwidth than wired links. Furthermore, wireless links are more error- prone which culminates in frequent link interruptions.

Figure 2.1: Computer network classification

A second criterion is the dependency on infrastructure. Without this dependency (ad- hoc), easy and fast network establishment is possible. However, without any infrastructure network protocols for these networks are more complex.

Finally as a third criterion, this classification considers the frequency of topology changes in the network. Frequent topology changes require specialized network protocols, e.g., new routing protocols.

(14)

Wireless sensor networks [6] belong to the class of networks with wireless links, without infrastructure, and with limited topology changes compared to mobile ad-hoc networks (MANETs). MANETs are similar to sensor networks. However, they differ in the frequency of topology changes which is higher because of the mobility of network nodes. Another difference is the number of nodes in a network which is larger by order of magnitude in case of sensor networks. Furthermore, sensor nodes are much more prone to failure. This is one consequence of their higher resource restriction. This thesis focuses on sensor networks. A more detailed discussion of challenges imposed by this type of network can be found in Section2.4.

2.2. Applications

There are various applications for sensor networks, and with the development in sensor network technology, the scope and the complexity of applications even widens. In the following, fields of application are presented and shortly explained. According to this presentation, this section identifies main characteristics of sensor network applications.

The first field in this listing of applications is monitoring of ecological phenomena.

The sensor network is placed in range of the environment where these ecological phenomena are located. Sensor data is locally acquired according to the application needs and, afterwards, processed and transmitted to a base station. Examples for this field are ZebraNet [18], a project to monitor animal behavior, and Redwood [44] which is a project to monitor the environment of a tree.

Another field of application is assistance in military operations. Due to the fact that sensor networks can easily be deployed and with their self-organizing ability, a sensor network can also assist in various tasks in combat areas. One example is enemy activ- ity detection in a combat field (e.g. SOSUS [31])

A promising field of application is medical care. Sensor networks can assist disabled persons in their daily life or they can be used to monitor the state of health of patients (e.g. CodeBlue [27]).

Disaster relief is a field of application for sensor networks, because in case of disaster, available infrastructure may be damaged. Therefore, a technology is necessary which can be rapidly deployed. Sensor networks can help to coordinate steps in rescue operations where no infrastructure is available.

When a sensor network is equipped with sensors as well as with actuators, new fields of application arise where the network can react independently on conditions. This may lead to the intelligent house where for example the system darkens the windows according to the light conditions.

(15)

Figure 2.2: Sensor nodes: Mica2 [1], Mica2Dot [2], and Telos [3]

In addition to these applications, many specific engineering solutions can be built by including a sensor network as part of the engineering solution system. Examples for this are Cartalk 2000 [35], a driving support system based on inter-vehicle communication. Another project to solve an engineering task with support of sensor network technology is Sustainable Bridges [5]. This project monitors the state of bridges.

Common to these applications are the following characteristics:

• Rapid Deployment: The sensor network has to be set-up without administrative overhead.

• Inaccessible Terrain: Physical access on the sensor network after deployment is hardly possible.

• Self-organization: Without administrative access, self-organization of the nodes is necessary.

• Fault Tolerance: The system has to deal with frequent node and link failures.

• Cooperation of Nodes: Cooperation of the nodes is needed to solve a global task of the whole network.

• In-Network Data Processing: To avoid high communication costs, data has to be partially processed in the network

With this model of application requirements, the next section identifies low-level challenges which have to be handled, in order to satisfy the requirements.

2.3. Hardware Resources

The nodes (see Figure2.2) in a sensor network are small computing devices with net- working capability, and a variety of different sensors. The following sections describe the characteristics of the main hardware components of sensor nodes. Table2.1, Table 2.2, and Table2.3show capabilities of some sensor nodes.

(16)

2.3.1. Processing Unit

Since the processing unit of a sensor node is clocked at low frequency of only a few MHz, it has only limited throughput. Therefore, algorithms supposed to be executed on a sensor node have to reflect this scarce processing capacity.

Microcontroller ATmega163 ATmega128 TI MSP430

Active power (mW) 15 33 3

Sleep power (µW) 45 75 15

Wakeup time (µs) 36 180 6

Table 2.1: Properties of microcontrollers [33]

Additionally to this capacity constraint, sensor node processing units are mainly RISC (Reduced Instruction Set Computer) processors with limited word-length, compared to personal computers’ processing units. With their reduced instruction set, floating-point operations are not supported and, e.g., division operations have to be built in software which makes them expensive. Most processing units support different states with different levels of energy consumption (see Table2.1).

2.3.2. Memory

A sensor node is equipped with different types of memory. See Table 2.2 for typical memory sizes. First, there is the data memory, with very limited capacity of only a few kilobytes. Second, there is the program memory with a capacity which is an order of magnitude higher than the data memory. Third, there is the external flash memory with even higher capacity, however it has only a limited lifetime, based on the maximum number of write accesses.

Sensor node Dot (2000) Mica2 (2002) Telos (2004)

Data memory (KB) 1 4 10

Program memory (KB) 16 128 48

External flash (KB) 32 512 1024

Table 2.2: Memory size of sensor nodes [33]

Access to the external flash memory is page-based, reading and writing is done in blocks of the memory’s page size. With increasing capacity of these different memory types, the access costs rise. On the one hand, access times increase and, on the other hand, the energy consumption for data access increases (see [22] for details). This leads to the strategy to implement algorithms with low memory usage and to reduce memory

(17)

2.3.3. Network Interface

As a fundamental component of sensor nodes, the network interface can be used to access and configure the sensor nodes from outside the network. Furthermore, the network interface is used to transport the sensor data to the user, and it allows coordination between the nodes to achieve their common goals.

Radio TR1000 CC1000 CC2420

Receive power (mW) 9 29 38

Transmit power at 0dBm (mW) 36 42 35

Data rate (kbps) 10 38.4 250

Table 2.3: Sensor node radio modules [33]

Sensor nodes are equipped with a low-range and low-bandwidth wireless interface. Ta- ble2.3shows typical bandwidth values. To circumvent these restrictions, the amount of data communicated over the links has to be reduced.

2.4. Challenges

Due to the specific characteristics of sensor networks, many challenges have to be handled which have higher importance compared to other network types. The following sections explain in detail the challenges which are of importance for this thesis and propose strategies to handle them.

2.4.1. Energy Constraints

Sensor nodes only have low power consumption compared to PDAs or personal computers. However, with their deployment in an environment without infrastructure, and with their long lifetime, energy is a highly challenging problem. Table 2.1 and Table2.3show typical values for power consumption of sensor node components.

The power supply of sensor nodes is, in most cases, a battery. There are alternatives, as for example solar cells or fuel cells, but these alternatives are not powerful enough to produce enough energy for the nodes or they simply cannot be deployed in every environment. Being restricted on battery powered sensor nodes and without the option of replacing batteries, the lifetime of the nodes depends highly on reduced energy consumption. Research projection in the field of battery technology indicates that battery capacity will advance only slowly in the future.

There are various power consumers embedded in a sensor node sharing the same power supply: sensors, processing unit, memory, network interface, and others. However, as this thesis covers data transfer in sensor networks, only the relevant consumers are covered in the following.

(18)

One power consumer which has to be considered in almost all software developments in sensor networks is the network interface. Depending on the node hardware, sending may consume by order of magnitude more energy than receiving. Furthermore, the transmission range correlates with the sending power to the power of three (see [13]

for more advanced estimations). These characteristics lead to the strategy of multi- hop communication over small distance links, and to the utilization of the benefits of broadcast communication.

As this thesis analyzes the usage of caching mechanisms, the external flash memory has to be considered as a power consumer. An important characteristic is the difference in energy consumption of write and read accesses. A write access on the external flash memory consumes much more energy than a read access (see [22] for details).

2.4.2. Wireless Link Properties

In addition to the low-range and low-bandwidth characteristic of the wireless interfaces embedded in sensor nodes, the wireless communication is prone to failure. On the one hand, frequent link failures may lead to frequent topology changes in the network. On the other hand, high bit error rates on the links have to be considered in the protocol design.

The broadcast characteristic of the medium can lead to the broadcast storm problem [30] where redundancy, contention, and collisions can reduce the efficiency of broadcast communication, if this problem is not considered in network protocol design. Nevertheless, the broadcast medium property can be used to cheaply transfer data from a sender to all nodes in its neighborhood.

The hidden terminal problem is another issue in wireless communication. Multiple nodes which are not in transmission range of each other may send in case of CSMA MAC at the same time. This however, may lead to collisions at nodes within transmission range of at least two of these senders.

Another property of wireless communication in sensor networks can be the different signal strengths received by two nodes. This can lead to asymmetric or even unidi- rectional links where communication is possible in one direction, although impossible in the other direction. Schemes as for example three-phase handshake can be used to cope with this property by ensuring bi-directionality of the links.

2.4.3. Scalability

Since sensor networks can consist of a very large number of nodes, scalability is an important challenge in software development for these networks. There are scenarios with thousands or even more nodes cooperating in a single network.

(19)

There are multiple dimensions of scalability which are of interest in network protocol design for sensor networks:

1. Number of nodes: Does the protocol scale with increasing number of nodes?

2. Node density: Does the protocol work with increasing node density?

3. Number of messages: Does the protocol work with increasing request frequency?

The priorities of these dimensions of scalability in this thesis are defined by their enu- meration rank.

In a network with a large number of nodes, the state of the network can hardly be main- tained at every single node because of the limited resources of these nodes. Therefore, to achieve scalability, one option is to use local information in protocol design. Local in this context means a n-hop neighborhood with fixed value for n. Another option is to use reduced information about distant nodes.

A network with high node density can benefit from the redundancy. However, since this redundancy can lead to increased processing overhead, schemes, as for example suppression, have to be considered and implemented.

Due to the fact that queries to the protocol developed in this thesis are rather infrequent, the lowest priority can be assigned to the number of messages scalability dimension.

2.5. Programming Model

2.5.1. Programming Language: nesC

One widely used programming language in sensor networks is the programming language nesC (networked embedded system C) which is the implementation language of the operating system TinyOS. According to the domain specific needs of sensor networks, nesC [14] provides a programming model with support of event-driven execu- tion, a concurrency model, and component-oriented application design. This section explains the properties of nesC which are important implementation details and influ- ence the low-level software design.

Design and Properties

The nesC language is an extended subset of the programming language C [19]. C offers low-level hardware access features which are necessary in sensor networks because of the deep interaction of applications with the limited hardware resources.

Due to the fact that a main goal of nesC is to reduce runtime errors, it prohibits features of the programming language C, i.e., dynamic memory allocation, and function pointers. This makes nesC a static language which can more easily be analyzed at compile-time.

(20)

In addition to the language features provided by C, nesC offers support for the event- driven characteristic of sensor network applications. This is realized by a component model which allows to build components with bidirectional interfaces. With this, the event flow can be modeled.

Furthermore, a simple concurrency model is provided to satisfy the needs for asyn- chronous tasks, i.e., concurrent data processing and event arrival. The concurrency model defines tasks and atomic sections. However, nesC does not support multithread- ing or multitasking. Tasks are processed sequentially and atomar sections cannot be interrupted by interrupt handling procedures.

The next section presents the nesC language constructs, which realize these concepts, and show how they can be used.

Language

This section presents the realization of the programming language concepts which ex- tend the C programming language. First, the component model is presented. Then, the concurrency model is presented.

Component Model The nesC language uses the concept of interfaces as fundamental service description of a component. Interfaces are bidirectional and consist of command and event functions. Listing 2.1shows a sample interface definition with a command doJoband an eventjobFinished.

1 i n t e r f a c e J o b I n t e r f a c e {

2 command v oi d doJob ( u i n t3 2 _t jobNum ) ;

3 event v oi d j ob F i n i s h ed ( ) ;

4 }

Listing 2.1: NesC interface definition

1 module JobberM {

2 p rov i d es i n t e r f a c e J o b I n t e r f a c e ;

3 uses i n t e r f a c e Timer ;

4 }

5 implementation {

6 command v oi d J o b I n t e r f a c e . doJob ( u i n t3 2 _t jobNum ) {

7 / / . . .

8 }

9 event r e s u l t _ t Timer . f i r e d ( ) {

10 / / . . .

11 }

12 }

(21)

There are two types of components in nesC. First, there are modules which consist of two parts: module specification, and module implementation. Listing2.2shows a sample module written in nesC. A module is specified by the list of interfaces it provides, and the list of interfaces it uses. Providing an interface means implementing the command functions whereas using an interface means implementing the event functions of that interface. The module implements the interfaces specified in the module specification.

1 c o n f i g u r a t i o n JobberC {

3 }

4 implementation {

5 components JobberM , Timer ;

6

7 J o b I n t e r f a c e = JobberM . J o b I n t e r f a c e ;

8

9 JobberM . Timer −> OtherModule . Timer ;

10 }

Listing 2.3: NesC configuration

The second type of components in nesC are the configurations (see Listing2.3). They specify the wiring of the modules which build up the application. Interfaces provided by a component are connected to interfaces used by other components. This concept allows development of simple isolated services which are then connected to more complex services or applications.

Concurrency Functions in nesC can be marked with the keyword task. This modification leads to an immediate return of the function when it is called and it is enqueued into the list of tasks for sequential processing when the system is idle.

1 module JobberM {

3 uses i n t e r f a c e Timer ;

4 }

5 implementation {

6 u i n t1 6 _t count1 , count2 ;

7

8 command v oi d J o b I n t e r f a c e . doJob ( u i n t3 2 _t jobNum ) {

9 / / . . .

10 }

11 task v oi d d o I t ( ) {

12 / / . . .

13 }

14 event r e s u l t _ t Timer . f i r e d ( ) {

15 atomic {

(22)

16 count1 = count1 + 1 ;

17 count2 = count2 + 1 ;

18 }

19 p os t d o I t ( ) ;

20 }

21 }

Listing 2.4: NesC concurrency

Since events can interrupt the program flow, nesC defines the atomic keyword which can be used to mark a section which must not be interrupted by events. Listing 2.4 shows a sample module which applies the concurrency concepts introduced in this section.

2.5.2. Operating System: TinyOS

TinyOS is an operating system framework for sensor nodes. Since PACLD has to be implemented as a TinyOS component, this section presents the main concepts of TinyOS which are essential implementation details.

Due to the fact that the design of the nesC programming language is influenced by the features of TinyOS, the nesC concepts presented in the previous section can be found in TinyOS as a base for its main features. These features are: component-based architecture, simple event-based concurrency model, and split-phase operations.

Figure 2.3: TinyOS system architecture

Figure2.3shows the system architecture of TinyOS. A scheduler handles assignment of the processing unit to the components. The components are statically wired as described in the previous section. This architecture allows compile-time configuration by selecting only the operating system components necessary for an application. Addition- ally, the inter-component communication via events allows easy event driven systems which are a main characteristic of sensor networks.

(23)

With the concept of events and commands, as presented in the context of nesC in the previous section, TinyOS supports split-phase operations. A command is called, time- consuming processing is asynchronously executed in a task and, finally, the completion of the task is signaled with an event. The split-phase concept is motivated by the need to reduce processing time of commands and events in order to increase the reactivity of the system on interrupts.

2.6. Summary

In this chapter the properties of sensor networks were presented. At first, main characteristics as, e.g., wireless, ad-hoc, and infrequent topology changes were identified in a classification of computer networks. Then, different fields of applications were introduced, and common properties and requirements of these applications were identified.

Afterwards, the hardware properties of the sensor nodes were characterized, namely the processing unit, the different memory types, and the network interface. Finally, the energy constraints, wireless link properties, and scalability issues were discussed as main challenges in sensor networks, and strategies were proposed to circumvent them.

Finally, a widely used programming model of sensor networks was introduced. This model is built by the programming language nesC and the operating system TinyOS.

(24)

CHAPTER 3. RELATED WORK 15

3. Related Work

PACLD, as a protocol to request and transfer data objects in sensor networks, follows the approach of requester initiated interaction and caching along the route. No other work is known with this approach. However, in the following sections, research topics are presented which have relations to some aspects of the PACLD approach. First, there are dissemination protocols which are intended to deliver data or code objects to a set of nodes in the network. Second, data centric storage approaches attempt to store data in the network in order to reduce transfer costs and search cost for specific objects. Third, caching approaches are relevant since PACLD implements a caching mechanism. Fourth, peer-to-peer network research topics and search protocols are related to PACLD in their common need to search for objects in the network.

3.1. Dissemination Protocols

The field of dissemination protocols has a wide range from protocols which are intended to perform network reprogramming to protocols which are used to send data to a base station or to send data or queries from the base station to certain nodes in the network.

Deluge [17] is a dissemination protocol for TinyOS-based networks. It epidemically and reliably propagates large code images to the complete network and reduces the number of messages needed for this task with suppression mechanisms. This approach does not match the request for systems which can handle and benefit from multiple objects and groups of receivers. However, concepts of how to propagate data reliably and efficiently also have relevance for this thesis.

Similar to Deluge, the Firecracker protocol [24] aims at propagating data objects to the complete network. However, it introduces a hierarchy based propagation model.

First, data is routed to dedicated nodes in the network and, afterwards, simultaneous broadcast propagation is started from these nodes. It introduces caching along the route to increment the number of broadcasters. However, Firecracker is limited to dissemination to the complete network.

MOAP [40], MNP [21], and Impala [25] also support dissemination of data or code objects to the complete sensor network. While MNP, as well as Deluge, support pipelining, this is not the case for MOAP. In case of pipelining, a large object is divided into segments, and a node can start propagating as soon as it has a complete segment. This leads to parallel data transfers and, therefore, could increase overall throughput.

An extension of Deluge is Aqueduct [32]. It extends Deluge in order to support het-

(25)

scope selection function allows handling multiple objects. Since not all nodes in the network need a specific object, Aqueduct establishes routes to efficiently propagate data objects to regions where the object is required. This avoids flooding to the complete network, but at the same time guarantees eventual delivery to all requesting nodes in the network. Aqueduct also supports caching of fixed-size object segments in a small circular buffer in the external flash memory.

Melete [46] supports concurrent applications in a heterogeneous network. In contrast to the dissemination protocols presented above, Melete introduces reactive object retrieval where nodes request objects on demand. This approach makes Melete most comparable to PACLD. However, PACLD supports larger objects and implements a more sophisticated caching mechanism.

Other schemes to disseminate data in a network are, e.g., multicast schemes (Vlm2 [39]). However, due to dynamics in object demands, multicast is not an appropriate solution for dissemination of code objects, since this would require repeated multicasts to satisfy the demands of ever changing requester groups for a specific object.

3.2. Data Centric Storage

Many approaches in the field of data centric storage make assumptions about network properties which are more restrictive than the assumptions of PACLD. One example is the dependency on location information.

GHT [34] is a geographic hash table. It maps keys to geographic locations, and stores data objects, corresponding to this key, at the node nearest to this location. Retrieval of these data objects can then be done by sending a query to the location which corresponds to the key. This results in linear cost for search of an object, but at cost of the dependency on location information.

Another approach in this field is pathDCS [12] which, in contrast to GHT, does not rely on location information. pathDCS maps keys to path descriptions. These path descriptions are based on few dedicated nodes which periodically send beacons. Each node stores its parent towards the sender of the beacon. With these trees, rooted at the dedicated nodes, a path can be defined as a starting point, i.e., one of these dedicated nodes, followed by a series of directions towards others of these dedicated nodes.

Common to both, GHT and pathDCS, is the need to store the data objects at special nodes in the network in order to be accessed. However, this implies proactive data replication which is not practicable in case of a high number of large objects.

Sheng et al. [38] propose placement of special storage nodes in the sensor network in order to reduce overall access cost. This approach is not fully applicable to the system model definition of PACLD. However, it proves that the location of the base station within the network affects the overall performance.

(26)

3.3. Caching Schemes

Caching mechanisms are widely used in many fields of computer science. Web caches reduce access time and overall network traffic. Caches in a memory hierarchy reduce access time on data stored in slow memory components. Both, the cache placement problem and the caching policy, are relevant topics for this thesis.

Concerning the first topic, Krishnan et al. propose a scheme for the placement of transparent web caches in [20]. However, this scheme cannot be transferred to sensor networks because these are unstructured and request patterns are different.

A research project concerning data caching on the route in mobile ad-hoc networks is discussed in [42]. Each node in the network stores its nearest neighbor to every data object in the network. Establishment and deletion of a cache are signaled to every node in the neighborhood which, in turn, updates its nearest neighbor lists and, in case of change, propagates this change. This approach differs from PACLD in the way of handling information about cache locations. Storing nearest neighbor lists introduces an overhead in memory consumption, and a message overhead for maintaining these lists.

However, this overhead can hardly be accepted for sensor networks with even scarcer resources than mobile ad-hoc networks.

Many caching policies have been proposed in the field of web caching algorithms. LRU (least recently used) considers the time of the last reference on an object, whereas LFU (least frequently used) considers the number of references on an object while it resides in the cache. Additionally, schemes are proposed which do not consider the reference history of an object, e.g., [45] considers only the object size and evicts large objects first.

Other policies consider both, recency and frequency of references, e.g., LRV (lowest rel- ative value) [26], LNC-R-W3 (least normalized cost replacement) [37], and LUV (least unified value) [7]. The caching policy implemented in PACLD is similar to the LUV policy where an object is assigned a value that corresponds to the retrieval cost normalized by the likelihood of being referenced again.

Yet another class of caching policies are the self-optimizing caching policies. Members of this class are, e.g., ARC (adaptive replacement cache) [29] and UMC (universal mobile caching) [36]. However, sensor network characteristics, which are the base for this thesis, imply the implementation of a simple policy with little resource requirements.

3.4. Search Protocols

Protocols to search nodes or objects within a network have been developed for different network types. In the following, schemes are presented from the field of peer-to-peer networks, mobile ad-hoc networks, and sensor networks.

(27)

In the Freenet system [11], search is performed based on keys which correspond to object hashes. This key is sent in a request to the neighbor node which is known to be nearest to the location of the corresponding object. When a node receives such a request, it checks whether it stores the object or not. In case of success, the reply is propagated upstream to the requester. In case of failure, the request is propagated to the neighbor expected to be nearest to the object. If a node receives the request for a second time, it reports failure upstream, and the upstream node propagates the request to the second nearest node, and so on. This search mechanism is not practicable for sensor networks since it cannot benefit from the broadcast characteristic of the medium and, furthermore, the search may find only a distant source.

The rumor routing [8] approach relies on random walk search where a request is propagated to a random neighbor and, therefore, similar to Freenet, the search may not find the nearest source.

In contrast to the linear search pattern of Freenet, Gnutella [4] sends a request, with a time-to-live (TTL) specified as a hop value, to all nodes it is connected to. Each node receiving a request checks if it can respond to the request or if it needs to propagate the request with decremented TTL value to all of the nodes it is connected to, except the node the request was received from. A request is discarded when the TTL value is zero.

An extension of the search scheme implemented by Gnutella is the expanding ring search [15,9,10], where subsequently search is restarted with strictly larger TTL values in case of search failure. The search scheme implemented in PACLD is basically expanding ring search. However, this thesis considers access costs on the objects re- turned in the search process and, additionally, PACLD benefits from its knowledge of the base station as a default source.

3.5. Summary

This chapter presented related work to PACLD. Different research fields were identified, and their relation to this thesis was illustrated. These fields were dissemination protocols, data centric storage, caching, and search algorithms in computer networks.

The projects presented exemplary for these fields were shortly introduced and their difference to PACLD was stated. However, similarities to some projects were recognized.

In the field of dissemination protocols, differences were found in the number of supported objects, in the number of groups in the network, and in the initiator of data transfer. The project identified as most similar to this thesis, is Melete [46].

In the field of data centric storage, differences were found in the assumptions of the projects which, in general, are more restrictive than in this thesis. Moreover, the character of these projects to proactively store data does not suit PACLD.

(28)

Afterwards, projects in the field of caching were presented. Cache placement tech- niques showed to rely on different request patterns or network structures or they introduce high maintenance overhead. However, the variety of caching policies showed the adequacy of reference history based policies which are the base for the caching policy implemented in this thesis.

Finally, different search algorithms were presented from the field of peer-to-peer networks, of mobile ad-hoc networks, and of sensor networks. Strong similarities to projects implementing the expanding ring search were found, while other algorithms were not suitable for PACLD.

(29)

CHAPTER 4. SYSTEM MODEL 20

4. System Model

This chapter covers the system model of PACLD. The model is based on the requirements of the TinyCubus [28] research project. In the following sections, different aspects of this model are explained and specified. However, due to the variety of different applications with different requirements, no exact system model can be provided.

4.1. Environment Model

The sensor network is built up ofnnodes which are connected. Out of thesennodes,r nodes are gateway nodes, i.e., base stations. The number of base stations is restricted by the following equation:

r<<_n (r >₀) (4.1)

It is assumed that each node knows a route to one base station. No assumptions are made on the availability of location information, topology characteristics, network density, communication reliability, time synchronization, and information about the neighborhood.

Due to the characteristic of PACLD as a support component, only a fraction of the node resources may be used. This implies energy efficiency, less memory consumption, and a restriction of the available external flash memory to 100KB.

4.2. Object Model

The object model assumes a high number of different objects without any object seman- tics. Each object is named with a flat naming system, where names are shorter than the maximum message payload size. The object size ranges from zero to ten KB. The number of small objects is higher than the number of large objects. The average object size is assumed to be about 3KB. The base stations are assumed to be source nodes for every object in the network.

4.3. Query Model

The query model is strongly related to group dynamics. Therefore, the overall characteristic of requests can be described as bursty at low frequency.

This results in correlations of requests for a specific object. First, a temporal relation of requests is assumed. Probability of nodento request object oat timet is correlated to the probability of nodem(n6= m)to request the same object at the same time. Second,

(30)

CHAPTER 4. SYSTEM MODEL 21

a spatial correlation of requests is assumed. Nodes, which are geographically close, have a higher probability to request the same object.

4.4. Summary

This chapter presented the system model of PACLD which is motivated by the requirements of the TinyCubus project. First, assumptions about characteristics of the environment were specified. These are little and mainly correspond to the existence of base stations in the network which hold all objects. The second part was about the object model which expects many large objects. Third, the query model was described as infrequent and bursty with requests having temporal and spatial correlation. With the system model as a base, the next chapter discusses the design space.

(31)

CHAPTER 5. DESIGN SPACE 22

5. Design Space

This chapter introduces the design space of PACLD and discusses the main principles of the protocol design. Fundamental design decisions are motivated as the base for the protocol design which is presented in Chapter 6. This chapter is subdivided into the three main fields of the design space: caching, source discovery, and data transport.

PACLD is a protocol to reliably transport large data objects in a sensor network. Along the route of data transport, the caching mechanism selects nodes where a copy of the transported object is created. These caches can then satisfy other requesters which perform a source discovery. In the following, the origin of a request is named requester, and every node which has a copy of the requested data object is named source.

5.1. Caching

PACLD is a protocol to acquire and cache large data objects in sensor networks. Its caching mechanism is intended to reduce access time and access costs on data objects.

Additionally, reducing the load on the base station and its surrounding nodes is a goal of this mechanism. The data caches are realized as autonomous caches. Prefetch- ing of data objects is not used because requests for data objects are not predictable, and transferring data objects to nodes in the network is expensive and wasteful if the data object is never requested. In contrast, caching data objects along the route of data transport is cheaper because transfer costs are not wasted. The same argument applies for replication mechanisms. Therefore, PACLD does not implement replication.

In the following, the caching decision mechanism is discussed which covers topics as, e.g., node selection. Then, various caching policies are discussed. The discussion about caching mechanisms serves as base for the considerations about a proper transport mechanism.

5.1.1. Decision

The caching decision has to cover several aspects. First, there is the problem of selecting a memory type for the cache which has high importance for the design due to the different characteristics of the different memory types. Second, there is the question whether to cache only complete data objects or also partial objects. Third, the node selection is a challenging problem where nodes have to be selected which need to cache a data object. Fourth, inter-cache communication mechanisms can be discussed as a potential optimization tool. The topic of cache decision is not only of importance for the distribution of objects in the network, but also for the path length of data transport.

(32)

Therefore, the maximum distance between two caches affects the performance of data transfer and needs to be limited (see Section5.3for details).

Memory Type Selection

The different characteristics of memory types in sensor nodes impose different suit- ability as resource for caches. Due to the large size of the data objects which are to be cached, the memory size is a hard criterion in memory type selection. As shown in Section 2.3, only the external flash memory is large enough to hold data objects of the size defined in the system model. Therefore, the protocol design has to reflect the high energy consumption of the external memory’s write and read accesses, and the possible benefit of the fast buffers of the flash.

Partial Objects

The second topic concerning the caching decision conciders the question whether to cache only complete data objects or also object parts. With caching of partial objects, a better utilization of the scarce memory is possible. However, the administrative overhead for caching increases. Additionally, the source discovery problem becomes more challenging, because the source discovery problem for one data object is transferred to the problem of locating multiple object parts. Therefore, only caching of complete data objects is proposed. However, due to aborts of data transfer, the establishment process of a copy can be aborted which, in turn, leads to partially cached objects. Strategies to handle this challenge are discussed below.

Node Selection

As the most important design decision in this section, the node selection mechanism affects the performance of the caching mechanism. On the one hand, selecting every node to cache a data object implies a high overhead which outweighs the benefits of caching. On the other hand, selecting no node disables data caching and, therefore, also every benefit of caching. These considerations motivate the need for a node selection mechanism in PACLD. In the following, different approaches are presented and discussed.

Candidate Selection A first criterion to classify the approaches of node selection is the restriction of possible cache nodes to nodes which are active participants in transfer of a data object. The advantage of such a restriction is that these nodes have to be active anyway. This keeps the possibility open to set nodes in sleep mode which are not participating in data transfer. This restriction leads to the cache-along-the-route approach depicted in Figure5.1. In this example scenario, data is transferred from one source to one requester. Only the nodes along the route are candidates for caches. As a special case, the requester has to store the data object by default and is therefore not considered as a candidate in the following.

(33)

Figure 5.1: Node selection along the route of data transfer

In contrast to the simple cache-along-the-route approach, the set of candidate nodes can be extended by the set of nodes overhearing data transfer. This leads to the extended cache-along-the-route approach. Further extension of the candidates set is not considered in this thesis because this leads to prefetching and replication mechanisms.

However, these are not in the scope of this thesis.

The extended cache-along-the-route approach is depicted in Figure5.2. Nodes in radio range of the transferring nodes are also candidates for node selection. The advantage of the extended approach is that no additional transmission overhead is needed to reach all candidate nodes. Additionally, with a wider range of candidate locations, load balancing of the network is possible. In case of link or node failure, route changes can cause invalid partial data objects in caches. Therefore, the redundancy introduced by the extended approach can better satisfy the cache establishment procedure where, in the simple cache-along-the-route approach, cache establishment for complete data objects can be unlikely in case of frequent route changes.

However, these considerations lead to the challenge of partially cached objects. First, caches along the route can remain with partial objects due to aborted transfers. Sec- ond, caches which overhear data transfer are likely to result in fragmented objects. In both cases, strategies are needed to solve this problem.

On the one hand, evicting these partial objects also eliminates any benefits which could be gained through a completely cached object. On the other hand, completing the partially transferred objects in the cache can introduce additional transfer overhead.

Therefore, a threshold is proposed which allows completing partial objects depending on their percentage of completeness. In such cases, the node with the partial object can

(34)

Figure 5.2: Node selection with extended along the route approach

continue overhearing data transfer or it can act as requester to complete the data object in its cache. When acting as requester, it introduces an additional transfer overhead to complete the partial object. However, nodes with partial objects can be expected to be near other caches or near the route of data transfer which reduces these additional transfer costs.

Centralized vs. Distributed After discussing the candidate selection, a second criterion can be discussed which allows to classify the node selection approaches. This criterion is the type of algorithm used: distributed algorithms or centralized algorithms.

Distributed algorithms allow handling scalability issues more adequately. However, to achieve this scalability, a distributed algorithm must not rely on global input.

Due to scalability considerations above, a centralized algorithm in a large sensor network can only be used for special tasks. Transferred to the node selection problem, a centralized algorithm can be used at the base station to compute node selection parameters to select a proper strategy for data transfers originating at the base station.

This is possible because of the more powerful base stations’ hardware resources. Data transfer of a certain object, originating at the base station, is more likely when the data object is not well distributed in the network. In this case, the centralized algorithm can force more nodes to establish caches.

In order to simplify the implementation of these two approaches, a two phase concept is proposed. In the first phase, parameters for the second phase are selected. In case of a distributed algorithm, the parameters are derived from local information. In case of centralized algorithms, these parameters are provided by the central node, i.e., the

(35)

base station. In the second phase, according to these parameters, the node selection is performed based on the input values of the first phase. This approach allows node selection based on central knowledge, where this knowledge is available; otherwise, the local algorithm does the node selection independently.

Figure 5.3: Node selection with random approach

Parameters In the following, various parameters are discussed as a base for the node selection algorithm. Table 5.1 lists these parameters and gives a short overview on their advantages and disadvantages.

Parameter Advantage Disadvantage

random cheap clustered distribution

node ID cheap clustered distribution

hopcount flexibility, good distribution little transfer costs demand based high expected benefit hardly distinguishing energy level increased network lifetime only for exclusion connectivity increased expected benefit only for exclusion

Table 5.1: Parameters for node selection

A simple criterion for node selection can be a random value. Given a threshold, each candidate node selects a random number. A value above the threshold initiates caching for that node. The random approach is depicted in Figure5.3. The advantage of this

(36)

approach is that no information has to be transmitted. Therefore, the cost for node selection is minimal.

However, this approach leads to differing distances between caches for the same data object. Due to the fact that neighboring caches only achieve limited benefit compared to a single cache, this approach is not resource sparing.

Another approach is to use the node identifier (ID) as a decision parameter. With random placement of nodes and without any knowledge about the placement of nodes according to their node ID, this approach corresponds to the previous random approach.

However, with information about placement, this approach can be optimized. This is especially of importance in case of a centralized algorithm running at the base station.

However, due to the restriction of this mechanism to scenarios with location of sensor nodes known at the base station, this approach does not provide additional benefit for this thesis compared to the random approach.

Figure 5.4: Node selection with simple hopcount approach

Another criterion as base for node selection is the hopcount of a data packet. Originat- ing at the source, the hopcount is initialized with a value of zero. In order to achieve better distribution the initial hop value at the base station can also be initialized to a random value. By propagating the packet, the hopcount is increased. Each node can compute a function which maps the hopcount to one of the values true or false. Accord- ing to this value, the candidate node initiates caching. By applying different functions, different cache placement patterns can be realized. With an additional parameter, a selector of the mapping function, a centralized algorithm can express its results, and inject them at the source node into the network. The advantage of the hopcount mech-

(37)

route of data transfer. However, combined with the extended cache-along-the-route approach, the cache distribution is clustered. This is shown in Figure 5.4. In contrast, Figure 5.5shows the same scenario with an advanced hopcount node selection mechanism. This modification is based on a different node selection function, e.g., random node selection, for the nodes which only overhear the data transfer in contrast to the nodes actively participating in data transfer.

Figure 5.5: Node selection with advanced hopcount approach

In addition to the goal of evenly distributing caches over the network, a demand driven placement can be considered. This can be realized with proper mapping functions for the hopcount approach. Since demand is indicated by requests in the source discovery (see Section 5.2), a proper mapping function has to consider request destination and request propagation. Mapping functions which result in equal distant caches along the route can be cheaply computed. Functions with increasing distance between caches reflect the different load on nodes depending on their distance to the base station.

Since demand can hardly be predicted, the bursty characteristic of requests, as defined in the system model (see Chapter 4), can be used as a simple hint to select caching nodes. This leads to the approach of installing caches at nodes with high request occurrence. The advantage of this mechanism is the fact that benefit can be expected sooner than at nodes with lower request occurrence. However, this approach requires acquiring the request frequency (see Section5.2). Furthermore, this approach can only be used as an extension mechanism because most nodes are expected to receive only one request for a certain data object in the acquisition period. Therefore, nodes with a request counter value of one have to apply another node selection algorithm.

(38)

Further parameters for node selection can be metrics on the status of the node’s resources. A node with a low energy level cannot bring benefit to the network by caching a data object because this node can be expected to fail soon. Another metrics can be the connectivity of the node. If a node has only poor connectivity, cache establishment is more error prone and, therefore, more expensive. As a third metrics, the expected benefit of caching a data object has to be compared with the expected loss of benefit of another already cached object which would have to be evicted. For a detailed overview on replacement strategies see Section5.1.2.

Inter-Cache Communication

One problem, amplified by the different optimization approaches based on local state of the nodes, is the clustering of caches. The problem of clustering is that requests of nodes in a fixed-size n-hop range of the cluster can be satisfied at the same cost by one single node of the cluster. In order to circumvent this problem, an inter-cache communication mechanism can be considered as a means of coordination. Since the problem of clustering is based on local decisions, neighborhood communication can help to reduce cluster building. An easy mechanism can be based on advertisements.

A node which decides to cache a data object communicates this information to all of its neighbors. In addition to the caching decision, the node also communicates the corresponding expected benefit of the caching. A node receiving an advertisement can decide to skip its own caching process or signal its own advertisement. Special care has to be taken to avoid cascading skipping of caching decisions. Since these kinds of advertisements are infrequent, the energy consumption of the message overhead is much lower than establishment of an additional cache in a cluster.

Results

The following results summarize the discussion of the caching decision. First, the cache uses the external flash memory. Second, only complete objects are cached. However, the strategy to complete partial objects is proposed. Third, the node selection is based on the nodes along the route of data transfer. Nodes are selected from this group based on the value of the hopcount to the source. Finally, a simple inter-cache communication mechanism is used to limit formation of cache clusters.

5.1.2. Replacement Strategies

The constrained memory resources, compared to the number and size of data objects in the network, motivate the need for replacement strategies for the caches in PACLD.

Moreover, the node selection is based on a benefit estimation which is also part of the caching strategy. Therefore, this section focuses on decision criteria to evict a data object in the cache in order to free memory for another object, and this section focuses on criteria to estimate the benefit of caching of data objects.

Definition First, the terms cache hit and cache miss have to be defined. Source discovery requests information about the availability of data objects in the cache, but only