Chapter Summary - Efficient and Low-Cost Fault Tolerance for Web-Scale Systems

84 CHAPTER 5. EVENTUAL LINEARIZABILITY

Chapter 6 Conclusions and Future Research

The thesis of this dissertation is that novel fault-tolerant replication algo-rithms are needed that fully adhere to the needs of Web-scale systems. In particular, the dissertation focuses on two main open issues. The first is re-ducing the performance and replication costs of tolerating worst-case failures, which are unlikely in general but do appear in very large-scale systems. The second is improving the efficiency of replication by increasing its availability, which has a positive impact both on latency and throughput, still keeping the same degree of consistency whenever possible. In this chapted we summarize the contributions of this thesis and indicate some new research directions for the future.

86 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH

6.1 Overall Thesis Contributions

This section reviews the main contributions of this thesis and refers to the papers which have resulted from the thesis’ work.

6.1.1 Low-Cost and Fast BFT

There has been a large deal of work on efficient and cheap BFT algorithms.

The Scrooge algorithm represents a fresh look on existing lower bounds on the tradeoff between fast agreement and replication costs. The main idea is that a fast algorithm may not need to be always fast. By admitting a minor performance degradation upon failure events, Scrooge introduces a new upper bound on the replication cost of fast agreement. This is 2f+ 2b replicas, where f is the overall number of tolerated faults (both crashes and Byzantine faults) and b ≤ f is the number of tolerated Byzantine faults.

The existing lower bound for achieving fast agreement even in runs where a backup replica fails is 3f + 2b−1. This is f +b −2 replicas more than the lower bound for Byzantine agreement, which is 2f+b+ 1. Scrooge thus shows for the first time that the additional costs to be fast in presence of faulty replicas is f −1, that is, it is only a function of the number of the tolerated Byzantine faults. This makes Scrooge convenient in systems, like most Web-scale systems, where Byzantine faults are very rare.

Experimental evaluation shows that Scrooge performs as well as Zyzzyva and Zyzzyva5 in fault-free runs and that it performs like Zyzzyva5, and better than Zyzzyva, in runs with faults. These properties are achieved with strictly less replicas than Zyzzyva5. Scrooge also greatly outperforms Zyzzyva in presence of faults on read-only workloads.

The reduction of replication costs is particularly critical for Web-scale systems, which might include a large number of BFT clusters. A small reduction on the cost of a single cluster results in a significant reduction of hardware and energy costs if the number of clusters is high.

Resultant publication

• Marco Serafini and Neeraj Suri, Reducing the Costs of Large-Scale BFT Replication, in Proc. of Large-Scale Distributed Systems and Middleware (LADIS), 2008.

• Marco Serafini, Peter Bokor, Dan Dobre, Matthias Majuntke and Neeraj Suri,Scrooge: Reducing the Costs of Fast Byzantine Replication in Presence of Unresponsive Replicas, in Proc. of IEEE Int’l. Conf. on Dependable Systems and Networks (DSN-DCCS), 2010.

6.1. OVERALL THESIS CONTRIBUTIONS 87

6.1.2 Fail-Heterogeneous Architectures

This thesis shows for the first time that trusted components can be used to reduce the replication costs of BFT even in general asynchronous sys-tems. Relaxing the synchrony requirements compared to prior work on using trusted components is fundamental to enable the use of these components in Web-scale systems. The potential of this approach has been confirmed by the interest it has attracted. Immediately after HeterTrust, related and indepen-dently developed work has appeared. The A2M protocol aims at reducing replication costs by using an attested append-only memory [CMSK07]. This results in a symmetric failure mode for Byzantine processes, which resembles the hybrid fault model defined in [TP88]. Work on TRINC showed that such memory can be implemented using only a monotonically increasing counter which is associated with a key [LDLM09]. These papers propose using spe-cialized hardware components that are deeply integrated into the processors’

hardware. The fail-heterogeneous fault model does not impose such a restric-tion and is thus more generic. Trusted coordinators can be external processes with restricted software functionalities (only related to agreement) running on commodity hardware. However, the verification of trustworthiness for software processes is more complex than for hardware components.

The fail-heterogeneous architecture is also innovative in its use of trusted components as filters. While other work focuses only on integrity issues, HeterTrust shows that trusted components can also be used to preserve data confidentiality. Filtering in HeterTrust is not only done for confidentiality but also for tolerance to DoS attacks. Later work has shown that such filtering can also be done without assuming trusted components [ACKL08; CWA⁺09], at the cost of degraded performance in “good” runs where no fault appears.

Resultant publication

• Marco Serafini and Neeraj Suri, The Fail-Heterogeneous Architec-tural Model, in Proc. of the IEEE Int’l Symp. on Reliable Distributed Systems (SRDS), 2007

6.1.3 Eventual Linearizability and Gracefully Degrad-ing Implementations

Eventual Linearizability is a natural way of expressing gracefully degrading shared objects. In normal runs, these objects must respect the standard correctness condition of Linearizability. Whenever consistency deviates from Linearizability, it must eventually converge back to it.

88 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH This thesis introduced the first gracefully degrading replication algorithm, Aurora, which only relaxes Linearizability when a single leader is not avail-able in the system. In such runs, consensus can not be solved and thus preserving Linearizability would mean blocking. Eventual Linearizability prevents blocking by relaxing consistency only in these cases. Aurora can be used to increase the consistency of existing weak consistency solutions for Web-scale systems without reducing availability.

It is often necessary to offer to applications the possibility of specify-ing different consistency degrees to different operations. Some operations may always require Linearizability, whereas other might be better off with Eventual Linearizability. The thesis shows that there are fundamental trade-offs in combining Linearizability and Eventual Linearizability. In particular, strong operations can only be completed using a stronger failure detector than needed to solve Consensus.

A first investigation on the applicability of Eventual Linearizability to practical Web-scale applications, such as crawling, is in [SJ10]. These ap-plications often partition their large workload over a large number of pro-cessors using master-worker schemes. Using Eventual Linearizability has the potential to be significantly more advantageous than using Linearizability in systems where partitions are not very rare.

Resultant publications

• Marco Serafini, Dan Dobre, Matthias Majuntke, Peter Bokor and Neeraj Suri, Eventually Linearizable Shared Objects, in Proc. of ACM Symp. on Principles of Distributed Computing (PODC), 2010.

Im Dokument Efficient and Low-Cost Fault Tolerance for Web-Scale Systems (Seite 101-106)