skip to main content


Title: Vote Them Out: Detecting and Eliminating Byzantine Peers
Byzantine Fault Tolerant (BFT) protocols are designed to ensure correctness and eventual progress in the face of misbehaving nodes [1]. However, this does not prevent negative effects an adversary may have on performance: a faulty node may significantly affect the latency and throughput of the system without being detected. This is especially true in speculative protocols optimized for the best-case where a single leader can force the protocol into the worst case [3]. Systems like Aardvark [2] that are designed to maximize worst-case performance tolerate byzantine behavior without necessarily detecting who the perpetrator is. By forcing regular view changes, for example, they mitigate the effects of leaders who deliberately delay dissemination of messages, even if this behavior would be difficult to prove to a third party. Byzantine faults, by definition, can be difficult to detect. An error of 'commission', such as a message with a mismatching digest, can be proven. Errors of 'omission', such as delaying or failing to relay a message, as a rule cannot be proven, and the node responsible for these types of omission faults may not appear faulty to all observers. Nevertheless, we observe that they can reliably be detected. Designing protocols that detect and eject nodes is challenging for two reasons. First, some behaviors are observed by a subset of honest nodes and cannot be objectively proven to a third party. Second, any mechanism capable of ejecting nodes could be subverted by Byzantine nodes to eject honest nodes. This paper presents the Protocol for Ejecting All Corrupted Hosts (Peach, a mechanism for detecting and ejecting faulty nodes in Byzantine fault tolerant (BFT) protocols. Nodes submit votes to a trusted configuration manager that replaces faulty nodes once a threshold of votes are received. We implement Peach for two BFT protocol variants, a traditional pbft-style three-phase protocol and a speculative protocol, and evaluate its ability to respond to Byzantine behavior. This work makes the following contributions: (1) We present and prove a necessary and sufficient constraint on cluster membership guaranteeing that any nodes causing performance degradation via acts of omission will be detected. (2) We present an agreement protocol, PEACHes, in which replicas pass votes about their subjective local observations of possible omissions to a TTP. (3) We show how the separation of detection and effectuation allows fine-grained detection of malicious behavior that is compatible and easily integrated with existing systems. (4) We present DecentBFT, an extension of BFT-Smart to which we added a speculative fast path (similar to Zyzzva) and integrated PEACHes. (5) We show DecentBFT rapidly detects and mitigates a variety of performance attacks that would have gone undetected by the state of the art.  more » « less
Award ID(s):
1750060
NSF-PAR ID:
10249862
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
SoCC '19: Proceedings of the ACM Symposium on Cloud Computing
Page Range / eLocation ID:
480 to 480
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process. Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol dependency on the primary node and minimizes transmission delay induced by the slack primary while keeping low message complexity and latency with high scalability. Hermes achieves these performance improvements by relaxing strong BFT agreement (safety) guarantees only for a specific type of Byzantine faults (also called equivocated faults). Interestingly, we show that in Hermes equivocating by a Byzantine primary is expensive and ineffective. Therefore, the safety of Hermes is comparable to the general BFT consensus. We deployed and tested Hermes on 190 Amazon EC2 instances. In these tests, Hermes's performance was comparable to the state-of-the-art BFT protocol for blockchains (when the network size is large) in the absence of slack nodes. Whereas, in the presence of slack nodes, Hermes outperforms the state-of-the-art BFT protocol significantly in terms of throughput and latency. 
    more » « less
  2. Secret sharing is an essential tool for many distributed applications, including distributed key generation and multiparty computation. For many practical applications, we would like to tolerate network churn, meaning participants can dynamically enter and leave the pool of protocol participants as they please. Such protocols, called Dynamic-committee Proactive Secret Sharing (DPSS) have recently been studied; however, existing DPSS protocols do not gracefully handle faults: the presence of even one unexpectedly slow node can often slow down the whole protocol by a factor of O(n). In this work, we explore optimally fault-tolerant asynchronous DPSS that is not slowed down by crash faults and even handles byzantine faults while maintaining the same performance. We first introduce the first high-threshold DPSS, which offers favorable characteristics relative to prior non-synchronous works in the presence of faults while simultaneously supporting higher privacy thresholds. We then batch-amortize this scheme along with a parallel non-high-threshold scheme which achieves optimal bandwidth characteristics. We implement our schemes and demonstrate that they can compete with prior work in best-case performance while outperforming it in non-optimal settings. 
    more » « less
  3. null (Ed.)
    Many aspects of blockchain-based decentralized finance can be understood as an extension of classical distributed computing. In this paper, we trace the evolution of two interrelated notions: failure and fault-tolerance. In classical distributed computing, a failure to complete a multi-party protocol is typically attributed to hardware malfunctions. A fault-tolerant protocol is one that responds to such failures by rolling the system back to an earlier consistent state. In the presence of Byzantine failures, a failure may be the result of an attack, and a fault-tolerant protocol is one that ensures that attackers will be punished and victims compensated. In modern decentralized finance however, failure to complete a protocol can be considered a legitimate option, not a transgression. A fault-tolerant protocol is one that ensures that the party offering the option cannot renege, and the party purchasing the option provides fair compensation (in the form of a fee) to the offering party. We sketch the evolution of such protocols, starting with two-phase commit, and finishing with timed hashlocked smart contracts. 
    more » « less
  4. This paper studies Byzantine reliable broadcast (BRB) under asynchronous networks, and improves the state-of-the-art protocols from the following aspects. Near-optimal communication cost: We propose two new BRB protocols for n nodes and input message M that has communication cost O(n|M| +n^2 log n), which is near-optimal due to the lower bound of Ω(n|M| +n^2). The first BRB protocol assumes threshold signature but is easy to understand, while the second BRB protocol is error-free but less intuitive. Improved computation: We propose a new construction that improves the computation cost of the state-of-the-art BRB by avoiding the expensive online error correction on the input message, while achieving the same communication cost. Balanced communication: We propose a technique named balanced multicast that can balance the communication cost for BRB protocols where the broadcaster needs to multicast the message M while other nodes only needs to multicast coded fragments of size O(|M|/n + log n). The balanced multicast technique can be applied to many existing BRB protocols as well as all our new constructions in this paper, and can make every node incur about the same communication cost. Finally, we present a lower bound to show the near optimality of our protocol in terms of communication cost at each node. 
    more » « less
  5. Jurdziński, T ; Schmid, S (Ed.)
    In the multiparty equality problem, each of the n nodes starts with a k-bit input. If there is a mismatch between the inputs, then at least one node must be able to detect it. The cost of a multiparty equality protocol is the total number of bits sent in the protocol. We consider the problem of minimizing this communication cost under the local broadcast model for the case where the underlying communication graph is undirected. In the local broadcast model of communication, a message sent by a node is received identically by all of its neighbors. This is in contrast to the classical point-to-point communication model, where a message sent by a node to one of its neighbors is received only by its intended recipient. Under point-to-point communication, there exists a simple protocol which is competitive within a factor 2 of the lower bound [1]. In this protocol, a rooted spanning tree is fixed and each node sends its entire input to its parent in the tree. On receiving a value from its child, a node compares it against its own input to check if the two values match. Ignoring lower order additive terms, a more complicated protocol comes within a factor 4/3 of the lower bound and is tight for certain classes of graphs [1]. Tight results, ignoring lower order terms, are also known for complete graphs [2, 9]. We study the multiparty equality problem under the local broadcast model. Recently, our work has shown that the connectivity requirements for Byzantine consensus are lower in the local broadcast model as compared to the classical model [7, 8]. In this work, 1. we identify a lower bound for the multiparty equality problem in this model. 2. we first identify simple protocols, wherein nodes are restricted to either transmit their entire input or not transmit anything at all, and find that these can cost Ω(logn) times the lower bound using existing example for the set cover problem [12]. 3. we then design a protocol to solve the problem within a constant factor of the lower bound. 
    more » « less