TheStellar Consensus Protocol 7 ill-behaved well-behaved failed Byzantine, correct including blocked divergent correct crashed Fig. 5. Venndiagramofnodefailures behaved node may be compromised, its owner may have maliciously modified the soft- ware, or it may have crashed. The goal of Byzantine agreement is to ensure that well-behaved nodes externalize the same values despite the presence of such ill-behaved nodes. There are two parts to this goal. First, we would like to prevent nodes from diverging and externalizing different values for the same slot. Second, we would like to ensure nodes can actually externalize values, as opposed to getting blocked in some dead-end state from which consensus is no longer possible. We introduce the following two terms for these prop- erties: Definition (safety). A set of nodes in an FBAS enjoy safety if no two of them ever externalize different values for the same slot. Definition (liveness). A node in an FBAS enjoys liveness if it can externalize new values without the participation of any failed (including ill-behaved) nodes. Wecall well-behaved nodes that enjoy both safety and liveness correct. Nodes that are not correct have failed. All ill-behaved nodes have failed, but a well-behaved node can fail, too, by waiting indefinitely for messages from ill-behaved nodes, or, worse, by having its state poisoned by incorrect messages from ill-behaved nodes. Figure 5 illustrates the possible kinds of node failure. To the left are Byzantine failures, meaning the ill-behaved nodes. To the right are two kinds of well-behaved but failed nodes. Nodes that lack liveness are termed blocked, while those that lack safety are termed divergent. An attack violating safety is strictly more powerful than one violating only liveness, so we classify divergent nodes as a subset of blocked ones. Ourdefinition of liveness is weak in that it says a node can externalize new values, not that it will. Hence, it admits a state of perpetual preemption in which consensus remains forever possible, yet the network continually thwarts it by delaying or re- ordering critical messages in just the wrong way. Perpetual preemption is inevitable in a purely asynchronous, deterministic system that survives node failure [Fischer et al. 1985]. Fortunately, preemption is transient. It does not indicate node failure, be- cause the system can recover at any time. Protocols can mitigate the problem through randomness [Ben-Or 1983; Bracha and Toueg 1985] or through realistic assumptions about message latency [Dwork et al. 1988]. Latency assumptions are more practical when one would like to limit execution time or avoid the trusted dealers often re- quired by more efficient Randomized algorithms [?]. Of course, only termination and not safety should depend upon message timing. 4. OPTIMALRESILIENCE Whether or not nodes enjoy safety and liveness depends on several factors: what quo- rum slices they have chosen, which nodes are ill-behaved, and of course the concrete consensus protocol and network behavior. As is common for asynchronous systems, we assume the network eventually delivers messages between well-behaved nodes, but can otherwise arbitrarily delay or reorder messages.
The Stellar Consensus Protocol Page 7 Page 9