Reliable Multicast - R ELATED W ORK - A middleware for cooperating mobile embedded systems

4.4 R ELATED W ORK

4.4.3 Reliable Multicast

In this sub-section we present related work w.r.t the reliable and timely transmission of multicast messages. In doing so, our main interest is whether and how the problem of achieving reliable and timely transmission in spite of varying loss rates is addressed. Gos-sip-based protocols, like (Sun and Sturman 2000,Kermarrec et al. 2003), are intended to provide probabilistic properties in large-scale systems and are hence out of the scope of this discussion.

The first point we are interested in is how message losses are tolerated. Regarding this point works can be distinguished as follows: (i) they do not consider message losses at all;

(ii) they tolerate message losses using a static redundancy approach; or (iii) they tolerate message losses using a dynamic redundancy approach. Dynamic redundancy in the context of communication protocols means detecting message losses by way of acknowledgements and retransmitting affected messages. For our environment dynamic redundancy appears to be the most appropriate solution because the number of retransmissions is determined by the actual, not the worst-case number of messages losses and they allow detecting situation in which the number of retransmission were not sufficient. Both reasons owe their particu-lar importance to the fact that tight worst-case bounds on the number of message losses cannot be assumed for wireless media. The efficiency of the acknowledgement mecha-nisms is a key factor in dynamic redundancy approaches. It will be important to consider, if the acknowledgement schemes of the discussed works are suited for the particular poll-ing-based communication structure of the IEEE Standard.

The second point we are interested in is whether the protocols explicitly allow handling the inherent tradeoff between reliability and timeliness. We examine whether they provide the means to relax reliability requirements in order to achieve shorter delays. In the following sub-section, which deals with atomicity, we will then consider if the protocols provide agreement in case a message with a reduced number of retransmissions in not received by all group members. This point is important, since as long as agreement and total order are guaranteed, common views and hence consistent actions can be achieved even in case of multicasts being lost. For those protocols that combine ordering and reliability in a single protocol, we will explain as much of the ordering mechanism as is necessary to understand how reliability is achieved in the section at hand.

Reliable communication channels. There are protocols designed assuming that communi-cation between any pair of connected stations is reliable; that is, communicommuni-cation channels are either assumed to be stable or exhibit crash failure semantics (Birman et al.

1991,Ezhilchelvan et al. 1995,Chockler et al. 1998). Obviously, these protocols do not deal with omission failures. They are designed to tackle other problems like agreement in case of sender crashes, ordering semantics, etc. (Cristian et al. 1985) achieve reliable communi-cation through spatial redundancy, which ensures that the sub-graph consisting of correct processes and communication links is always connected. So, they do not address omission failures of the communication links too. (Kopetz and Grünsteidl 1993,Kopetz 1997) use spatial redundancy as well, but take message losses into account additionally.

Static time redundancy. Static redundancy can be used to achieve reliable communication if there is a known bound on the number of omission failures. For example, in TTP (Grünsteidl and Kopetz 1991,Kopetz and Grünsteidl 1993,Kopetz 1997) each message is transmitted a fixed number of times on physically redundant channels. Transmitting each message r+1 times allows tolerating up to r consecutive message losses. (Bar-Joseph et al.

2000) use forward error correction (FEC) techniques to tolerate up to r losses out of k + r consecutive messages. For each sequence of k original messages, r redundant messages are constructed in such a way that the receiver can reconstruct the original messages from any subset of size k of the k + r messages sent. The performance of static redundancy protocols is determined by the worst-case number of message losses. Since in a wireless network a very large worst-case bound must be assumed, which significantly exceeds the average number of losses, adopting this approach would result in a poor average performance of the protocol. Furthermore, both protocols do not allow choosing a number of transmissions that is smaller than required by the worst-case number of message losses.

Dynamic time redundancy. The xAMP (Verissimo et al. 1991,Rodrigues and Verissimo 1992) is based on a synchronous system model with a bounded number of message omis-sion failures. It uses a positive acknowledgement scheme to detect message losses. Each receiver sends an individual positive acknowledgment for each message it receives. This means that the number of acknowledgement messages is proportional to the number of group members, which results in a significant overhead. Furthermore, this mechanism is not well suited to be used in the CFP, since each receiver would have to wait until the AP polls it before sending the acknowledgement message. Therefore, after each multicast, the AP would have to poll each station for the transmission of its acknowledgement message.

Thus, in the best (fault-free) case, at least 2n+2 messages are needed for a single multicast with n intended recipients (the first polling message, the broadcast message itself, n polling messages for the acknowledgements, and the acknowledgements). The protocol does not ensure agreement if the number of retries is not sufficient to ensure that all stations receive the message. Therefore, if agreement is required, the retry limit must be set to the omission degree. In our architecture, this is not required, since the atomic multicast protocol achieves agreement even if a retry limit smaller than the omission degree is chosen.

There are many reliable multicast protocols developed for asynchronous systems that em-ploy dynamic redundancy to tolerate an a priori unknown number of message losses and guarantee delivery of multicasts as long as sender an receiver are correct and connected.

(Inoue et al. 1998) enhanced reliability of multicasts in wireless networks using a represen-tative acknowledgement scheme to reduce the number of acknowledgements. They subdi-vide the group into subgroups, each of which is assigned a representative. The

representa-tive is responsible for sending posirepresenta-tive or negarepresenta-tive acknowledgements when being polled by the sender after a broadcast.

The protocols presented in (Peterson et al. 1989,Melliar-Smith et al. 1990,Amir et al.

1992) exploit the partial causal order (Lamport 1978) to detect message losses. Stations piggyback positive acknowledgements on their broadcasts. Together with a message all causal predecessors of that message are acknowledged. So, this scheme is a kind of a cu-mulative acknowledgement scheme. Stations are able to detect their missing a message by detecting gaps in the ordering graph; that is, when they receive a message, but have not yet received the causal predecessors of that message. In this case, they send negative acknowl-edgements to explicitly request the retransmission of the messages they missed. Since our protocols are not intended to provide causal ordered delivery, transferring and maintaining the necessary context information would introduce an unnecessary overhead.

There are several protocols that impose a logical ring structure on the multicast group (Chang and Maxemchuck 1984,Cristian and Mishra 1995,Jia et al. 1996,Mishra et al.

1997,Mishra et al. 2002). According to this structure, the group members take turns in as-suming the role of a central sequencer (or token site). The sequencer is in charge of assign-ing global sequence number (a.k.a ordinals) to the multicasts messages. Although primar-ily introduced to order multicasts and to distribute the load associated with being the se-quencer equally among the group members, all protocols exploit the structure to realize an efficient acknowledgment mechanism. The following main ideas contribute to this objec-tive: (i) rotating the right to order messages (i.e. the token) is used as an implicit acknowl-edgment mechanism. By assuming the role of the sequencer, a station acknowledges multi-casts that have been ordered so far. (ii) Global sequence numbers are being used to detect message losses. If the difference between the sequence numbers of two consecutively re-ceived multicasts is greater than 1, the receiving station knows that it must have lost some message and it knows the global sequence numbers of these messages also. (iii) Global sequence numbers can be used as cumulative acknowledgments. By announcing the high-est in order global sequence number it has received, a station is able to acknowledge all messages up to that sequence number. The different protocols add specific optimizations to these basic mechanisms. For example, in the RMP (Jia et al. 1996), stations include a so-called safe parameter in their messages to support a fast assessment of the stability of a message (the message has been received by all group members). In the protocol suggested in (Mishra et al. 2002), the sequencer sends a list with selective acknowledgements for all messages, which are not deemed stable at that moment. For each such message a bit vector ack is included where ack[i] = true if the i^th member on the ring has acknowledged the message. This idea is similar to the way stations transmit acknowledgments in our proto-col. Protocols with a rotating central sequencer have turned out to exhibit a good perform-ance in settings where the ring is stable most of the time. Their drawback is that a ring ref-ormation is required when the (implicit) token is lost or the sequencer crashes. They are not well suited for a wireless environment since message losses are frequent and may trig-ger ring reformation if the token is affected. Furthermore, dynamic changes in the topology due to locomotion of the systems may result in two successive stations in the ring no longer being connected, which would trigger a ring reformation too. Thus, ring reforma-tions are quite likely and impair the efficiency and the predictability of the protocol.

(Kaashoek and Tanenbaum 1991) also adopt a central sequencer approach, but in their protocol the role is fixed to a certain station. A station that wants to multicast a message sends it to the sequencer, which associates a global sequence number with that message

and multicasts it to the group. We adopted that communication structure with the AP act-ing as the central sequencer since it fits very well to the structure of the underlyact-ing net-work:

• In a BSS (cell) of an 802.11 Standard infrastructure network all frames have to be routed through the AP anyway.

• Routing frames through the AP ensures that each stations in the BSS are within the range of the broadcaster

• Since the centralized communication structure of the PCF already implies the as-sumption of a stable central station, we can best profit from that fact by making this station the sequencer also. In particular, this makes dealing with station crashes easier and allows achieving a more predictable timing behavior in this case.

In this structure, two kinds of message losses have to be considered: (a) request messages transmitted from the sending station to the sequencer, and (b) broadcast message from the sequencer to the group. To protocol uses implicit positive acknowledgements to detect the first kind of message losses. The sequencer acknowledges reception of a message by broadcasting it (message identifies allow recognizing the message). To deal with the sec-ond kind of message loss, negative as well as positive acknowledgments are employed. A station detects that it lost some multicast message if there is a gap between the global se-quence numbers of the messages it receives (as described above). In this case, it sends an explicit negative acknowledgement for the missed messages to the sequencer to request a copy of those messages. Since in this approach the sequencer must store messages it broadcast for the purpose of later retransmission, there must be some means for the se-quencer to learn that it need no longer store a message in its buffer. To this end, the proto-col uses cumulative positive acknowledgments. A station includes in each request message it sends the highest in order sequence number it received so far. Similar to the safe parame-ter mentioned above, the sequencer deparame-termines the minimum of all sgi, where sgi is the last acknowledged sequence number the sequencer received from station si. Each message with a sequence number not greater than that minimum can be safely purged from the se-quencer’s buffer. While we adopt their approach to dealing with the first kind of message losses (a), we use another approach for the second kind (b). Instead of explicit negative acknowledgment message we use piggybacking since each explicit message requires poll-ing and adds to the overhead of the protocol. Moreover, we use selective acknowledgments instead of cumulative ones since the former are better suited for environments with a large number of message losses

The protocols described above, which deal with unbounded message omission failures, are designed for asynchronous environments and implement eventual termination semantics;

that is, they ensure that messages sent by a correct station are eventually delivered by each correct intended recipient as long as both are connected. Since timeliness is not an issue here, none of the protocols provides an explicit parameter to bound the number retransmis-sion in order to improve timeliness. There typically is an implicit bound in the sense that a station that continuously fails to acknowledge a message will sooner or later be considered as having crashed or being disconnected from the group and it will be removed from the membership. But, even if this means that the message delays will not grow arbitrarily due to message losses, there is no way to bound message delays independent of the notion of group membership.

RTCAST (Abdelzaher et al. 1996) is based on a synchronous model with a finite but not explicitly bounded number of omission failures. Retransmissions are not handled in the multicast protocol. Instead, it assumes that a bounded number of retransmissions may be performed on a lower layer to reduce the probability of message losses on the multicast layer. The basic idea to deal with message losses is to maintain agreement among the group members. To achieve this, each station detecting that it missed a message takes itself out of the group. So, this can be considered as a model with reliable links and stations ob-serving pseudo receive omission failures. Similar to the protocols discussed above, the drawback of this approach in the context of our environment is that there is not distinction between not receiving a message and being excluded from the group. So, either the number of retransmission must be chosen quite high, even if messages do not have high reliability requirements, or the station will be excluded from the group frequently. The protocol therefore does not allow deciding the reliability/timeliness trade-off for messages inde-pendent of deciding when a station is considered to be disconnected.

Im Dokument A middleware for cooperating mobile embedded systems (Seite 102-106)