The operation span - OPUS 4 | Assessment of the feasibility of distributed shared memory and da

Figure 3.12: An architecture including the operating system independent DSM system

its spatial dimensions, is usually specified in the number of hops (forwarding operations) needed to reach the nodes in the network. An arbitrary size of the network together with the unreliable characteristics of the links between the nodes, results in an unpredictable transmission delay. Thus, from the scalability reasons, it is crucial to specify the start and the end of a data access operation, to avoid access conflicts.

Figure 3.13presents the flow of an example write operation in a DSM system with replicas. The diagram takes the delays into account and presents the different points in time that can indicate the progress and state of the operation. The operation is initiated at tstart, the owner of the master copy (or short, the owner) of the data item receives the request at treq and starts the replica update mechanism. Due to different delays on the path the replica holders receive the update request at different points in time, i.e., between t_update and t_{update max}. The replica holders reply with acknowledgments that again arrive at the owner of the data item at different points in time, i.e., between tack

and tack max. In the presented diagram the owner of the master copy uses a request timeout (t_timeout) to specify the time it waits for the acknowledgment messages. After this time period elapses attready, the owner replies to the issuer node with the result of the complete operation. This reply arrives at the issuer attend.

The fundamental kind of conflict is the write-write conflict. In this case the set of conflicting operations can be extended to include all those that modify the state of the data, e.g., the migration of the master copy of the data item. These operations, if applied in parallel may cause serialization issues and result in inconsistencies in the global view on the data item. An example of such a conflict is a write operation issued from multiple distant nodes at about the same time (see Figure 3.14). These requests are delivered with some delay since they may require different amount of hops to reach the owner.

In the example diagram presented in Figure3.14, the request that arrives at the owner as the first one, is executed and the other two issuers receive replies that their requests failed. This already indicates the first problem in the case of write-write conflicts, i.e., it is necessary to specify what to do with write requests that are received by the owner of the data while another write operation is executed. In general there are three options;

the received requests can be either regarded as failed, they may be executed in parallel or they may be queued for execution after the current access request is processed.

Figure 3.13: The flow of a write operation in a system with replicas

This is also related to the request ordering problem. In an ideal shared memory system the data access operations shall be ordered and executed in the order they were issued, thus based on the t_start of each request. However, this solution requires global time synchronization and sending the tstart in each request. Additionally, in the WSN environment with its unpredictable delays this requirement to order the requests regard-ing their initialization time induces another issue, i.e., handlregard-ing of request issued before the requests that were already executed. Here, in the ideal case the requests that were already executed shall be invalidated, the older request shall be executed and followed by the execution of all the invalidated requests. Such a solution is very expensive and rather not affordable in the WSN environment. Another solution is to execute the missed re-quest and inform all the replica holders about that missing value, but this requires these to store an access history to check which read accesses have to be invalidated, making this solution even more expensive than the previous one. A more optimal option is to regard this outdated request as failed, and thus, to request the issuer to retry the operation.

Usually, none of the issuing nodes is aware of the fact that another node issued a request as well. Global synchronization of requests on the issuer level, e.g., by informing all other nodes about the intention, would require a huge communication overhead and the scalability would suffer.

Figure 3.14: Write-write access conflict example flow

Thus, it is reasonable to define the start of a write as the point in time where the access request arrives at the owner of the master copy (treq). In such a setting, requests under delivery are not regarded as started yet.

In a DSM system with replicated data it is also important to cope with the write-read conflicts. As shown in Figure 3.13, the new value of the data item is available at the owner for reading not before t_req, so tstart delay later. And since the owner is also not aware that a node issued a write request until it arrives read operations between tstart

andtreq result in incorrect value if operation initialization time is regarded as operation start.

Additionally, at any replica holder the new value is not available beforetupdate, thus tstart delay2 after the tstart. This causes an additional problem, because regarding the t_req as the operation start also does not protect against incorrect read operations from replicas in the time between t_req andt_update (or event_{update max}). A currently written data item can be read after the start of a write operation, but before the actual updating of the accessed replica. This issue touches the memory consistency model features and realization problems that are further investigated in Chapter 5. But, generalizing, in order to be sure that the read value is the most recent at any time it would be necessary to use the treq as the operation start and to issue the read requests direct to the owner of the data item, i.e., to abandon the use of replicas for reading. This would result in the

Figure 3.15: A time flow of master-read and any-read operations

central server memory management and all the advantages of the data replication would be lost.

From the perspective of an ideal operation span, a write operation shall be regarded as completed as soon as all the replicas are up-to-date. However, due to temporary node unavailability and to reduce the operation costs, the result of the operation can also be represented as the result of the replication goal achievement (see Section3.2). The owner of the master copy of the shared data item responds to the node that issued the write request as soon as it knows if the replication goal was achieved and this point of time represents the completeness of the write operation. The write operation is also completed after it is regarded as failed, i.e., it actually does not change the state of the data.

The definition of the span in case of a read operation is much simpler, but also not trivial. The operation request is issued attstartand is handled untiltendby the owner of the master copy (master-read) or any of the replica holders (any-read), as presented in Figure 3.15, or it is sent into the network for processing, e.g., in order to trigger voting (quorum-read), as shown in Figure3.16.

For the first two types of a read operation (master-read and any-read) the request can be even handled locally if a replica (or master copy) of the data item is available locally on the requester node. In such a case the differences betweentstart and treq as well as betweentreqandtend can be neglected. In case of the quorum-read (see Figure3.16), the requests are delivered to the replica holders at different points in time (betweent_req and treq max) and the answers from individual replica holders arrive at the requester node between tans and tans max. Thettimeout is defined to specify the maximum processing time of the read operation. If an answer arrives at the request issuer after that period, it is not considered for the result.

The span of the read operation on the nodes processing the read request is also clearly defined by its function, i.e., a version of the data item is read from the replica and delivered to the requester, what completes the operation. The choice of the version to include in the answer is defined by the way the start of the read operation is defined.

Similar to the write operation, it can be either thetstartortreq. The first option requires

Figure 3.16: A time flow of a quorum-read operation

the read requests to be timestamped and, assumes the availability of mechanisms for time synchronization, as well as the historical values of the data item. The processing nodes answer with the correct value of the data item, according to the request initialization time. Regarding the t_req as the start of the read operation simplifies the system, i.e., the replica holders store only the most recent value of the data item and they always use this value to answer the read requests. Additionally, the tstart does not need to be transmitted together with the read request.

The tinyDSM Middleware

This chapter introduces the tinyDSM middleware [170] that was designed within this work, to provide the practical framework for the proof of concept. The term practical means that the main aim of the middleware is to be useful for a number of applications in the wireless sensor network area, so it should not only be a theoretical deliberation.

It should be also easy to use and efficient. Thus, this chapter starts with some discus-sion on the main features the middleware should provide, i.e., the way of providing the distributed shared memory abstraction and supporting the application developer, tak-ing into consideration the specific environment it should work within–the wireless sensor network.

After discussing the set of functions the middleware provides (the shared memory abstraction; compile time event definition and runtime detection; data replication to improve the robustness; compile time definition of system behaviour using policies) and its interfaces, the details on the realization of the middleware are provided together with the methodology how to use it.

The detailed description of the middleware provided here, allows further use of the tinyDSM as a fundamental building block for the implementation of the consistency models, without explaining the way it works.

Im Dokument OPUS 4 | Assessment of the feasibility of distributed shared memory and data consistency for wireless sensor networks (Seite 99-105)