Consensus in the Presence of Partial Synchrony

Cynthia Dwork, Nancy Lynch and Larry StockMeyer

Consensus in the Presence of Partial Synchrony CYNTHIA DWORK AND NANCY LYNCH .Massachusetts Institute of Technology, Cambridge, Massachusetts AND LARRY STOCKMEYER IBM Almaden Research Center, San Jose, California Abstract. The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound A on the time required for a message to be sent from one processor to another and a known fixed upper bound % on the relative speeds of different processors. In an asynchronous system no fixed upper bounds A and @ exist. In one version of partial synchrony, fixed bounds A and Cp exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds A and Cp. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when time T occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant “distributed clocks” that allow partially synchronous processors to reach some approximately common notion of time. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems- distributed applications; distributed databases; network operating systems; C.4 [Computer Systems Organization]: Performance of Systems-reliability, availability, and serviceability; H.2.4 [Database Management]: Systems-distributed systems General Terms: Algorithms, Performance, Reliability, Theory, Verification Additional Key Words and Phrases: Agreement problem, Byzantine Generals problem, commit problem, consensus problem, distributed clock, distributed computing, fault tolerance, partially synchronous system A preliminary version of this paper appears in Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (Vancouver, B.C., Canada, Aug. 27-29). ACM, New York, 1984, pp. 103- 118. The work of C. Dwork was supported by a Bantrell postdoctoral Fellowship. The work of N. Lynch was supported in part by the Defense Advance Research Projects Agency under contract N00014-83-K- 0125, the National Science Foundation under grants DCR 83-02391 and MCS 83-06854, the Office of Army Research under Contract DAAG29-84-K-0058, and the Office of Naval Research under contract NOOO14-85-K-0168. Authors’ addresses: C. Dwork and L. Stockmeyer, Department K53/802, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95 120; N. Lynch, Laboratory for Computer Science, Massachu- setts Institute of Technology, 545 Technology Square, Cambridge, MA 02 139. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1988 ACM 0004-541 l/88/0400-0288 $01.50 Journal of the Association for Computing Machinery, Vol. 35, No. 2, April 1988, pp. 288-323.

Consensus in the Presence of Partial Synchrony - Page 1 Consensus in the Presence of Partial Synchrony Page 2