skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Churn-tolerant Leader Election Protocols
Classical leader election protocols typically assume complete and correct knowledge of underlying membership lists at all participating nodes. Yet many edge and IoT settings are dynamic, with nodes joining, leaving, and failing continuously—a phenomenon called churn. This implies that in any membership protocol, a given node’s membership list may have entries that are missing (e.g., false positive detections, or newly joined nodes whose information has not spread yet) or stale (e.g., failed nodes that are undetected)—these would render classical election protocols incorrect. We present a family of four leader election protocols that are churn-tolerant (or c-tolerant). The key ideas are to: i) involve the minimum number of nodes necessary to achieve safety; ii) use optimism so that decisions are made faster when churn is low; iii) incorporate a preference for electing healthier nodes as leaders. We prove the correctness and safety of our c-tolerant protocols and show their message complexity is optimal. We present experimental results from both a trace- driven simulation as well as our implementation atop Raspberry Pi devices, including a comparison against Zookeeper.  more » « less
Award ID(s):
1908888
PAR ID:
10472722
Author(s) / Creator(s):
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proceedings 43rd IEEE International Conference on Distributed Computing Systems
Subject(s) / Keyword(s):
Leader Election, Membership, Edge Computing, Churn
Format(s):
Medium: X
Location:
Hong Kong
Sponsoring Org:
National Science Foundation
More Like this
  1. Gilbert, Seth (Ed.)
    This paper concerns designing distributed algorithms that are singularly optimal, i.e., algorithms that are simultaneously time and message optimal, for the fundamental leader election problem in asynchronous networks. Kutten et al. (JACM 2015) presented a singularly near optimal randomized leader election algorithm for general synchronous networks that ran in O(D) time and used O(m log n) messages (where D, m, and n are the network’s diameter, number of edges and number of nodes, respectively) with high probability. Both bounds are near optimal (up to a logarithmic factor), since Ω(D) and Ω(m) are the respective lower bounds for time and messages for leader election even for synchronous networks and even for (Monte-Carlo) randomized algorithms. On the other hand, for general asynchronous networks, leader election algorithms are only known that are either time or message optimal, but not both. Kutten et al. (DISC 2020) presented a randomized asynchronous leader election algorithm that is singularly near optimal for complete networks, but left open the problem for general networks. This paper shows that singularly near optimal (up to polylogarithmic factors) bounds can be achieved for general asynchronous networks. We present a randomized singularly near optimal leader election algorithm that runs in O(D + log² n) time and O(m log² n) messages with high probability. Our result is the first known distributed leader election algorithm for asynchronous networks that is near optimal with respect to both time and message complexity and improves over a long line of results including the classical results of Gallager et al. (ACM TOPLAS, 1983), Peleg (JPDC, 1989), and Awerbuch (STOC, 89). 
    more » « less
  2. Böhme, Rainer; Kiffer, Lucianna (Ed.)
    We consider the problem of secret leader election with accountability. Secret leader election protocols counter adaptive adversaries by keeping the identities of elected leaders secret until they choose to reveal themselves, but in existing protocols this means it is impossible to determine who was elected leader if they fail to act. This opens the door to undetectable withholding attacks, where leaders fail to act in order to slow the protocol or bias future elections in their favor. We formally define accountability (in weak and strong variants) for secret leader election protocols. We present three paradigms for adding accountability, using delay-based cryptography, enforced key revelation, or threshold committees, all of which ensure that after some time delay the result of the election becomes public. The paradigm can be chosen to balance trust assumptions, protocol efficiency, and the length of the delay before leaders are revealed. Along the way, we introduce several new cryptographic tools including re-randomizable timed commitments and timed VRFs. 
    more » « less
  3. Blockchains operating at the global scale demand high-performance byzantine fault-tolerant (BFT) consensus protocols. Most classic PBFT-like protocols suffer from an issue known as the leader bottleneck, which severely limits their throughput and resource utilization. Recently, Directed Acyclic Graph, or DAG-based protocols, have emerged as a promising approach for eliminating the leader bottleneck and achieving better performance. They attain higher throughput by separating data dissemination and block ordering. However, their safety and liveness logic is also significantly more elaborate. So far, most DAG-based protocols have only enjoyed on-paper security proofs, and it is not clear how to construct formal proofs of these protocols efficiently. We introduce LiDO-DAG, a concurrent object model that abstracts the common logic of these protocols. LiDO-DAG is constructed by combining a DAG abstraction and LiDO, a recently proposed abstraction for leader-based consensus. To demonstrate that our framework enables rapid validation of new DAG-based protocol designs, we implemented LiDO-DAG in Coq and applied it to three recent DAG-based protocols, including Narwhal, Bullshark, and Sailfish. Our framework readily yields mechanized safety and liveness proofs for all three protocols, which are also the first mechanized liveness proofs of any DAG-based protocol. Our framework has also revealed an optimization for Sailfish that improves its worst-case latency. 
    more » « less
  4. null (Ed.)
    Byzantine Fault Tolerant (BFT) protocols are designed to ensure correctness and eventual progress in the face of misbehaving nodes [1]. However, this does not prevent negative effects an adversary may have on performance: a faulty node may significantly affect the latency and throughput of the system without being detected. This is especially true in speculative protocols optimized for the best-case where a single leader can force the protocol into the worst case [3]. Systems like Aardvark [2] that are designed to maximize worst-case performance tolerate byzantine behavior without necessarily detecting who the perpetrator is. By forcing regular view changes, for example, they mitigate the effects of leaders who deliberately delay dissemination of messages, even if this behavior would be difficult to prove to a third party. Byzantine faults, by definition, can be difficult to detect. An error of 'commission', such as a message with a mismatching digest, can be proven. Errors of 'omission', such as delaying or failing to relay a message, as a rule cannot be proven, and the node responsible for these types of omission faults may not appear faulty to all observers. Nevertheless, we observe that they can reliably be detected. Designing protocols that detect and eject nodes is challenging for two reasons. First, some behaviors are observed by a subset of honest nodes and cannot be objectively proven to a third party. Second, any mechanism capable of ejecting nodes could be subverted by Byzantine nodes to eject honest nodes. This paper presents the Protocol for Ejecting All Corrupted Hosts (Peach, a mechanism for detecting and ejecting faulty nodes in Byzantine fault tolerant (BFT) protocols. Nodes submit votes to a trusted configuration manager that replaces faulty nodes once a threshold of votes are received. We implement Peach for two BFT protocol variants, a traditional pbft-style three-phase protocol and a speculative protocol, and evaluate its ability to respond to Byzantine behavior. This work makes the following contributions: (1) We present and prove a necessary and sufficient constraint on cluster membership guaranteeing that any nodes causing performance degradation via acts of omission will be detected. (2) We present an agreement protocol, PEACHes, in which replicas pass votes about their subjective local observations of possible omissions to a TTP. (3) We show how the separation of detection and effectuation allows fine-grained detection of malicious behavior that is compatible and easily integrated with existing systems. (4) We present DecentBFT, an extension of BFT-Smart to which we added a speculative fast path (similar to Zyzzva) and integrated PEACHes. (5) We show DecentBFT rapidly detects and mitigates a variety of performance attacks that would have gone undetected by the state of the art. 
    more » « less
  5. Distributed data management systems use state Machine Replication (SMR) to provide fault tolerance. The SMR algorithm enables Byzantine Fault-Tolerant (BFT) protocols to guarantee safety and liveness despite the malicious failure of nodes. However, SMR does not prevent the adversarial manipulation of the order of transactions, where the order assigned by a malicious leader differs from the order in that transactions are received from clients. Whileorder-fairnesshas been recently studied in a few protocols, such protocols rely on synchronized clocks, suffer from liveness issues, or incur significant performance overhead. This paper presentsRashnu, a high-performance fair ordering protocol. Rashnu is motivated by the fact that fair ordering among two transactions is needed only when both transactions access a shared resource. Based on this observation, we define the notion ofdata-dependent order fairnesswhere replicas capture only the order of data-dependent transactions and the leader uses these orders to propose a dependency graph that represents fair ordering among transactions. Replicas then execute transactions using the dependency graph, resulting in the parallel execution of independent transactions. We implemented a prototype of Rashnu where our experimental evaluation reveals the low overhead of providing order-fairness in Rashnu. 
    more » « less