Open Ends - Efficient and Low-Cost Fault Tolerance for Web-Scale Systems

88 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH This thesis introduced the first gracefully degrading replication algorithm, Aurora, which only relaxes Linearizability when a single leader is not avail-able in the system. In such runs, consensus can not be solved and thus preserving Linearizability would mean blocking. Eventual Linearizability prevents blocking by relaxing consistency only in these cases. Aurora can be used to increase the consistency of existing weak consistency solutions for Web-scale systems without reducing availability.

It is often necessary to offer to applications the possibility of specify-ing different consistency degrees to different operations. Some operations may always require Linearizability, whereas other might be better off with Eventual Linearizability. The thesis shows that there are fundamental trade-offs in combining Linearizability and Eventual Linearizability. In particular, strong operations can only be completed using a stronger failure detector than needed to solve Consensus.

A first investigation on the applicability of Eventual Linearizability to practical Web-scale applications, such as crawling, is in [SJ10]. These ap-plications often partition their large workload over a large number of pro-cessors using master-worker schemes. Using Eventual Linearizability has the potential to be significantly more advantageous than using Linearizability in systems where partitions are not very rare.

Resultant publications

• Marco Serafini, Dan Dobre, Matthias Majuntke, Peter Bokor and Neeraj Suri, Eventually Linearizable Shared Objects, in Proc. of ACM Symp. on Principles of Distributed Computing (PODC), 2010.

6.2. OPEN ENDS 89 the necessity of some of the requirements of the proposed algorithms. There are two issues that are particularly interesting in this sense.

The Scrooge protocol shows that 2f+ 2breplicas are sufficient for a BFT algorithm to be eventually fast in presence of faulty (unresponsive) replicas.

It still remains to show a lower bounds matching this upper bound, that is, a proof that 2f + 2b replicas are minimal for being eventually fast.

Similarly, this thesis shows that a ♦S failure detector is sufficient to im-plement eventual linearizability. The open question is whether it is also nec-essary, that is, whether♦S is the weakest failure detector which implements eventual linearizability.

6.2.2 Understanding Byzantine Faults

The design of BFT algorithms has reached a very mature state, which in-cluded the design of BFT versions of existing Web-scale services such as HDFS [CKL⁺09]. Assuming that the problem of designing efficient BFT systems can be solved, the main question that remains is whether using these systems is worth their additional complexity. In other words, it is not yet clear whether non-silent faults appearing in practice are best modeled using the Byzantine fault model [SJR09]. While in general the Byzantine fault model is attractive due to its generality, there still are a number of unresolved issues that are preliminary to the use of BFT.

First, many non-silent fault are caused by hardware malfunctions rather than malicious activity [Con02; Bor05; PWB07; SG07]. These faults do not require the use of cryptographic techniques. In fact, more efficient coding techniques can be used to detect errors induced by hardware faults. These do not only offer performance advantages, but also reduce the administrative complexity of setting up cryptographic algorithm, as for example generating and sharing secret keys.

Second, bugs and other software faults tend to be correlated, and may vi-olate the assumption of failure independence. Design diversity is not proven to be effective and in many cases is not an option [LPS01]. Also, design di-versity is unlikely to protect the system from configuration and maintenance faults, which often also have correlated nature [SJR09].

Third, using the Byzantine fault model for malicious intrusions still leaves many security issues open. Protecting confidentiality requires a very large number of nodes [YMV⁺03] unless, as this thesis shows, trusted components are used. Even with trusted components, however, confidentiality requires specific network topologies to prevent data leaks. It is not clear whether mandating the use of such topologies is realistic in data centers. Another issue is that BFT systems, like any other distributed system, are vulnerable

90 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH to denial of service attacks. Although some solutions have been proposed to mitigate this problem, such as [ACKL08; CWA⁺09], these again require the use of specific network topologies.

Overall, the application of BFT is limited by the lack of clear evidence that these faults occur in practice.

Research topics In the field of BFT replication, there are two main open issues that need to be solved to make a good case for the usefulness of this approach, also in the context of Web-scale systems. The first is establishing whether the Byzantine faults that appear in practical systems can be toler-ated using BFT. The second and perhaps more important issue is whether it is possible to design algorithms that tolerate arbitrary but accidental faults (hardware) and that are more efficient and cheaper than BFT algorithms.

6.2.3 Applications of Eventual Linearizability

Evaluating the practical impact of Eventual Linearizability is also an open issue. Eventual Linearizability is useful for systems where weak consistency is acceptable but it is the last resort. These are systems where availability is of paramount importance, but the loss of consistency must be limited.

Follow-up work already made some evaluations related to the application of eventual linearizability in highly-available master-workers schemes [SJ10].

There are also other examples where eventual linearizability can be useful.

Consider for example a system handling bids. Consider for example a bid-ding system. High availability is crucial to always ensure that users can do their bids, especially when the end of the bid is approaching. However, it is desirable that bidders can base their decisions on the most up-to-date infor-mation available. This is particularly true if the actual value of the bid is a function of the current state of the system. Another example where Even-tual Linearizability is beneficial could be a flight booking system. Ensuring high availability is essential avoid keeping seats unsold. Relaxing consistency to preserve high availability might result in over-bookings, that are anyway tolerated by the application. However, in the normal case, updated real-time information should be provided to each retailer and customer. A final ex-ample could be a social networking application. In order to increase data locality, user accounts can be spread over a wide-area system. Each user might have friends worldwide who want to observe updates to its profile and to comment them. It is desirable that, when multiple users are concurrently commenting on a friend’s picture or on a status update, they observe a real-time flow of comments. Degradation of consistency, however, is preferable

6.2. OPEN ENDS 91 to unavailability, which can demotivate the user from interacting with the system.

Existing weakly consistent solutions have some limitations which are a direct consequence of their consistency semantics. First, they either always degrade to causal consistency, as Dynamo [DHJ⁺07] or have stronger consis-tency but become unavailable if one of the replicas, identified as the master, becomes unavailable, as for example PNUTS [CRS⁺08]. Eventual Lineariz-ability provides strong consistency most of the time, and degrades consistency only when implementing Linearizability would mean blocking the system. It offers the same advantages as eventually consistent systems in terms of avail-ability, but it only allows divergences when it is necessary.

Research topics Evaluating Eventual Linearizability in practical applica-tions scenarios could be done on multiple dimensions. Existing work shows some initial results indicating that it can be beneficial in highly available master-worker schemes when the likelihood of partitions or timing failures using strong consistency is not negligible [SJ10]. Eventual Linearizability seems also to be attractive for other applications, such as bidding, retail or social networking applications, where the loss of consistency is acceptable but not desirable and should be minimized. This claim must be validated considering some specific use case.

92 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH

Appendix A Scrooge

This section presents additional results related to the Scrooge protocol. Sec-tion A.1 shows the correctness of the algorithm. SecSec-tion A.2 presents an extension to Scrooge enabling garbage collection and shows it correct.

A.1 Correctness of the Scrooge Protocol

This section proves the correctness of the simplified Scrooge protocol. The next two sections describe the full version of Scrooge by extending the sim-plified version, and prove that the introduced modifications preserve correct-ness. As customary, first it is proven that the protocol never violates some invariant properties (safety) and that the protocol eventually achieves some useful results (liveness).

Im Dokument Efficient and Low-Cost Fault Tolerance for Web-Scale Systems (Seite 106-111)