Consensus in the Presence of Partial Synchrony 289 1. Introduction 1.1 BACKGROUND. The role of synchronism in distributed computing has recently received considerable attention [ 1, 4, lo]. One method of comparing two models with differing amounts or types of synchronism is to examine a specific problem in both models. Because of its fundamental role in distributed computing, the problem chosen is often that of reaching agreement. (See [8] for a survey; see also [6], [ 111, [ 121, and [ 181 for example.) One version of this problem considers a pN, which communicate by sending messages collection of N processors, pI, . . . , to one another. Initially each processor pi has a value Vi drawn from some domain Vof values, and the correct processors must all decide on the same value; moreover, if the initial values are all the same, say v, then v must be the common decision. In addition, the consensus protocol should operate correctly if some of the proces- sors are faulty, for example, if they crash (fail-stop faults), fail to send or receive messages when they should (omission faults), or send erroneous messages (Byzan- tine faults). Fix a particular type of fault. Given assumptions about the synchronism of the message system and the processors, one can characterize the model by its resiliency, the maximum number of faults that can be tolerated in any protocol in the given model. For example, it might be assumed that there is a fixed upper bound A on the time for messages to be delivered (communication is synchronous) and a fixed upper bound + on the rate at which one processor’s clock can run faster than another’s (processors are synchronous), and that these bounds are known a priori and can be “built into” the protocol. In this case N-resilient consensus protocols exist for Byzantine failures with authentication [3, 151 and, therefore, also for fail- stop and omission failures; in other words, any number of faults can be tolerated. For Byzantine faults without authentication, t-resilient consensus is possible iff Nr 3t + 1 [14, 151. Recent work has shown that the existence of both bounds A and @ is necessary to achieve any resiliency, even under the weakest type of faults. Dolev et al. [4], building on earlier work of Fischer et al. [lo], prove that if either a fixed upper bound A on message delivery time does not exist (communication is asynchronous) or a fixed upper bound 9 on relative processor speeds does not exist (processors are asynchronous), then there is no consensus protocol resilient to even one fail- stop fault. In this paper we define and study practically motivated models that lie between the completely synchronous and completely asynchronous cases. 1.2 PARTIALLY SYNCHRONOUS COMMUNICATION. We lirstconsiderthe case in which processors are completely synchronous (i.e., + = 1) and communication lies “between” synchronous and asynchronous. There are at least two natural ways in which communication might be partially synchronous. One reasonable situation could be that an upper bound A on message delivery time exists, but we do not know what it is a priori. On the one hand, the impossibility results of [4] and [lo] do not apply since communication is, in fact, synchronous. On the other hand, participating processors in the known consensus protocols need to know A in order to know how long to wait during each round of message exchange. Of course, it is possible to pick some arbitrary A to use in designing the protocol, and say that, whenever a message takes longer than this A, then either the sender or the receiver is considered to be faulty. This is not an acceptable solution to the problem since, if we picked A too small, all the processors
Consensus in the Presence of Partial Synchrony Page 1 Page 3