290 C. DWORK ET AL. could soon be considered faulty, and by definition the decisions of faulty processors do not have to be consistent with the decision of any other processor. What we would like is a protocol that does not have A “built in.” Such a protocol would operate correctly whenever it is executed in a system where some fixed upper bound A exists. It should also be mentioned that we do not assume any probability distribution on message transmission time that would allow A to be estimated by doing experiments. Another situation could be that we know A, but the message system is sometimes unreliable, delivering messages late or not at all. As noted above, we do not want to consider a late or lost message as a processor fault. However, without any further constraint on the message system, this “unreliable” message system is at least as bad as a completely asynchronous one, and the impossibility results of [4] apply. Therefore, we impose an additional constraint: For each execution there is a global stabilization time (GST), unknown to the processors, such that the message system respects the upper bound A from time GST onward. This constraint might at first seem too strong: In realistic situations, the upper bound cannot reasonably be expected to hold forever after GST, but perhaps only for a limited time. However, any good solution to the consensus problem in this model would have an upper bound L on the amount of time after GST required for consensus to be reached; in this case it is not really necessary that the bound A hold forever after time GST, but only up to time GST + L. We find it technically convenient to avoid explicit mention of the interval length L in the model, but will instead present the appropriate upper bounds on time for each of our algorithms. Instead of requiring that the consensus problem be solvable in the GST model, we might think of separating the correctness conditions into safety and termination properties. The safety conditions are that no two correct processors should ever reach disagreement, and that no correct processor should ever make a decision that is contrary to the specified validity conditions. The termination property is just that each correct processor should eventually make a decision. Then we might require an algorithm to satisfy the safety conditions no matter how asynchronously the message system behaves, that is, even if A does not hold eventually. On the other hand, we might only require termination in case A holds eventually. It is easy to see that these safety and termination conditions are equivalent to our GST condition: If an algorithm solves the consensus problem when A holds from time GST onward, then that algorithm cannot possibly violate a safety property even if the message system is completely asynchronous. This is because safety violations must occur at some finite point in time, and there would be some continuation of the violating execution in which A eventually holds. Thus, the condition that A holds from some time GST onward provides a second reasonable definition for partial communication synchrony. Once again, it is not clear how we could apply previously known consensus protocols to this model. For example, the same argument as for the case of the unknown bound shows that we cannot treat lost or delayed messages in the same way as processor faults. For succinctness, we say that communication is partially synchronous if one of these two situations holds: A exists but is not known, or A is known and has to hold from some unknown point on. Our results determine precisely, for four interesting fault models, the maxi- mum resiliency possible in cases where communication is partially synchronous. For fail-stop or omission faults we show that t-resilient consensus is possible iff N 2 2t + 1. For Byzantine faults with authentication, we show that t-resilient consensus is possible iff N 2 3t + 1. Also, for Byzantine faults without authenti-
Consensus in the Presence of Partial Synchrony Page 2 Page 4