Failure detectors and communication efficiency in the crash and general omisión failure models
Abstract
Consensus is one of the fundamental problems in fault tolerant distributed systems. In addition to the importance of the problem itself, consensus can be a way to solve many other problems in distributed systems, so it is considered a key topic in the Distributed Computing area. Although many solutions have been proposed to solve consensus in synchronous systems, [Fischer, Lynch, and Paterson, 1985] presented an impossibility result, namely Fischer-Lynch-Paterson or FLP, that states that it is impossible to reach consensus in asynchronous systems where even one process may crash. In order to circumvent FLP, [Chandra and Toueg, 1996] proposed the unreliable failure detector abstraction, which has been widely studied in several systems, especially those where processes can only fail by crashing. Failure detectors offer a modular approach that allows other applications such as consensus to use them as a building block. Additionally, the failure detector abstraction allows to encapsulate the synchrony assumptions of the system, so that applications which make use of failure detectors can be designed as if they run in pure asynchronous systems. In this work we show that failure detectors can also be applied to the general omission failure model, in which processes may fail by crashing and by omitting messages either when sending or receiving. As a practical example, we propose a solution to a security area problem called Secure Multiparty Computation by using failure detectors for general omission. In the context of failure detectors in the crash failure model we also study communication efficiency, a performance measure achieved when there are only n links that carry messages forever, being n the number of processes. We improve this measure by defining communication optimality, in which only c links are needed, being c the number of correct processes. In this regard, we propose some communication-optimal implementations of the eventually perfect failure detector class <>P. Finally, we propose a communication-efficient implementation of a failure detector for the general omission failure model. In this case, we define communication efficiency as a linear number of links carrying messages forever.