skip to main content

Title: Vote Them Out: Detecting and Eliminating Byzantine Peers
Byzantine Fault Tolerant (BFT) protocols are designed to ensure correctness and eventual progress in the face of misbehaving nodes [1]. However, this does not prevent negative effects an adversary may have on performance: a faulty node may significantly affect the latency and throughput of the system without being detected. This is especially true in speculative protocols optimized for the best-case where a single leader can force the protocol into the worst case [3]. Systems like Aardvark [2] that are designed to maximize worst-case performance tolerate byzantine behavior without necessarily detecting who the perpetrator is. By forcing regular view changes, for example, they mitigate the effects of leaders who deliberately delay dissemination of messages, even if this behavior would be difficult to prove to a third party. Byzantine faults, by definition, can be difficult to detect. An error of 'commission', such as a message with a mismatching digest, can be proven. Errors of 'omission', such as delaying or failing to relay a message, as a rule cannot be proven, and the node responsible for these types of omission faults may not appear faulty to all observers. Nevertheless, we observe that they can reliably be detected. Designing protocols that detect and eject nodes more » is challenging for two reasons. First, some behaviors are observed by a subset of honest nodes and cannot be objectively proven to a third party. Second, any mechanism capable of ejecting nodes could be subverted by Byzantine nodes to eject honest nodes. This paper presents the Protocol for Ejecting All Corrupted Hosts (Peach, a mechanism for detecting and ejecting faulty nodes in Byzantine fault tolerant (BFT) protocols. Nodes submit votes to a trusted configuration manager that replaces faulty nodes once a threshold of votes are received. We implement Peach for two BFT protocol variants, a traditional pbft-style three-phase protocol and a speculative protocol, and evaluate its ability to respond to Byzantine behavior. This work makes the following contributions: (1) We present and prove a necessary and sufficient constraint on cluster membership guaranteeing that any nodes causing performance degradation via acts of omission will be detected. (2) We present an agreement protocol, PEACHes, in which replicas pass votes about their subjective local observations of possible omissions to a TTP. (3) We show how the separation of detection and effectuation allows fine-grained detection of malicious behavior that is compatible and easily integrated with existing systems. (4) We present DecentBFT, an extension of BFT-Smart to which we added a speculative fast path (similar to Zyzzva) and integrated PEACHes. (5) We show DecentBFT rapidly detects and mitigates a variety of performance attacks that would have gone undetected by the state of the art. « less
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
SoCC '19: Proceedings of the ACM Symposium on Cloud Computing
Page Range or eLocation-ID:
480 to 480
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper shows how to use bounded-time recovery (BTR) to defend distributed systems against non-crash faults and attacks. Unlike many existing fault-tolerance techniques, BTR does not attempt to completely mask all symptoms of a fault; instead, it ensures that the system returns to the correct behavior within a bounded amount of time. This weaker guarantee is sufficient, e.g., for many cyber-physical systems, where physical properties - such as inertia and thermal capacity - prevent quick state changes and thus limit the damage that can result from a brief period of undefined behavior. We present an algorithm called REBOUND that canmore »provide BTR for the Byzantine fault model. REBOUND works by detecting faults and then reconfiguring the system to exclude the faulty nodes. This supports very fine-grained responses to faults: for instance, the system can move or replace existing tasks, or drop less critical tasks entirely to conserve resources. REBOUND can take useful actions even when a majority of the nodes is compromised, and it requires less redundancy than full fault-tolerance.« less
  2. Jurdziński, T ; Schmid, S (Ed.)
    In the multiparty equality problem, each of the n nodes starts with a k-bit input. If there is a mismatch between the inputs, then at least one node must be able to detect it. The cost of a multiparty equality protocol is the total number of bits sent in the protocol. We consider the problem of minimizing this communication cost under the local broadcast model for the case where the underlying communication graph is undirected. In the local broadcast model of communication, a message sent by a node is received identically by all of its neighbors. This is in contrastmore »to the classical point-to-point communication model, where a message sent by a node to one of its neighbors is received only by its intended recipient. Under point-to-point communication, there exists a simple protocol which is competitive within a factor 2 of the lower bound [1]. In this protocol, a rooted spanning tree is fixed and each node sends its entire input to its parent in the tree. On receiving a value from its child, a node compares it against its own input to check if the two values match. Ignoring lower order additive terms, a more complicated protocol comes within a factor 4/3 of the lower bound and is tight for certain classes of graphs [1]. Tight results, ignoring lower order terms, are also known for complete graphs [2, 9]. We study the multiparty equality problem under the local broadcast model. Recently, our work has shown that the connectivity requirements for Byzantine consensus are lower in the local broadcast model as compared to the classical model [7, 8]. In this work, 1. we identify a lower bound for the multiparty equality problem in this model. 2. we first identify simple protocols, wherein nodes are restricted to either transmit their entire input or not transmit anything at all, and find that these can cost Ω(logn) times the lower bound using existing example for the set cover problem [12]. 3. we then design a protocol to solve the problem within a constant factor of the lower bound.« less
  3. Many aspects of blockchain-based decentralized finance can be understood as an extension of classical distributed computing. In this paper, we trace the evolution of two interrelated notions: failure and fault-tolerance. In classical distributed computing, a failure to complete a multi-party protocol is typically attributed to hardware malfunctions. A fault-tolerant protocol is one that responds to such failures by rolling the system back to an earlier consistent state. In the presence of Byzantine failures, a failure may be the result of an attack, and a fault-tolerant protocol is one that ensures that attackers will be punished and victims compensated. In modernmore »decentralized finance however, failure to complete a protocol can be considered a legitimate option, not a transgression. A fault-tolerant protocol is one that ensures that the party offering the option cannot renege, and the party purchasing the option provides fair compensation (in the form of a fee) to the offering party. We sketch the evolution of such protocols, starting with two-phase commit, and finishing with timed hashlocked smart contracts.« less
  4. In a key-agreement protocol whose security is proven in the random oracle model (ROM), the parties and the eavesdropper can make bounded number of queries to a shared random function (an “oracle”). Such protocol are the alternative to key-agreement protocols whose security is based on “public-key assumptions”, assumptions that being more structured are presumingly more vulnerable to attacks. Barak and Mahmoody [Crypto ’09] (following Impagliazzo and Rudich [STOC ’89]) have shown the ROM key-agreement protocols can only guarantee limited secrecy: the key of any `l-query protocol can be revealed by an O(l^2 )-query adversary, a bound that matches the gapmore »obtained by the Merkle’s Puzzles two-message protocol of Merkle [CACM ’78]. While this quadratic gap might not seem like much, if the honest parties are willing to work “hard enough” and given continuousness improvement in common hash functions evaluation time, this gap yields a good enough advantage (assuming the security of the protocol holds when initiating the random function with a fixed hash function). In this work we consider the communication complexity of ROM key-agreement protocols. In Merkle’s Puzzles, the honest parties need to exchange Ω(l) bits (ignoring logarithmic factors) to obtain secrecy against an eavesdropper that makes roughly l^2 queries, which makes the protocol unrealizable in many settings. We show that for protocols with certain natural properties, such high communication is unavoidable. Specifically, this is the case if the honest parties’ queries are independent and uniformly random, or alternatively if the protocol uses non-adaptive queries and has only two rounds. Since two-round key-agreement protocol are equivalent to public-key encryption scheme (seeing the first message as the public-key), the latter result bounds the public-key and encryption size of public-key encryption scheme whose security is proven in the ROM.« less
  5. This paper studies Byzantine reliable broadcast (BRB) under asynchronous networks, and improves the state-of-the-art protocols from the following aspects. Near-optimal communication cost: We propose two new BRB protocols for n nodes and input message M that has communication cost O(n|M| +n^2 log n), which is near-optimal due to the lower bound of Ω(n|M| +n^2). The first BRB protocol assumes threshold signature but is easy to understand, while the second BRB protocol is error-free but less intuitive. Improved computation: We propose a new construction that improves the computation cost of the state-of-the-art BRB by avoiding the expensive online error correction onmore »the input message, while achieving the same communication cost. Balanced communication: We propose a technique named balanced multicast that can balance the communication cost for BRB protocols where the broadcaster needs to multicast the message M while other nodes only needs to multicast coded fragments of size O(|M|/n + log n). The balanced multicast technique can be applied to many existing BRB protocols as well as all our new constructions in this paper, and can make every node incur about the same communication cost. Finally, we present a lower bound to show the near optimality of our protocol in terms of communication cost at each node.« less