Classical leader election protocols typically assume complete and correct knowledge of underlying membership lists at all participating nodes. Yet many edge and IoT settings are dynamic, with nodes joining, leaving, and failing continuously—a phenomenon called churn. This implies that in any membership protocol, a given node’s membership list may have entries that are missing (e.g., false positive detections, or newly joined nodes whose information has not spread yet) or stale (e.g., failed nodes that are undetected)—these would render classical election protocols incorrect. We present a family of four leader election protocols that are churn-tolerant (or c-tolerant). The key ideas are to: i) involve the minimum number of nodes necessary to achieve safety; ii) use optimism so that decisions are made faster when churn is low; iii) incorporate a preference for electing healthier nodes as leaders. We prove the correctness and safety of our c-tolerant protocols and show their message complexity is optimal. We present experimental results from both a trace- driven simulation as well as our implementation atop Raspberry Pi devices, including a comparison against Zookeeper.
more »
« less
DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications
In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. DPaxos is intended to implement a State Machine Replication component in data management systems for the edge. DPaxos targets the unique opportunities of utilizing edge computing resources to support emerging applications with stringent mobility and real-time requirements such as Augmented and Virtual Reality and vehicular applications. The main objective of DPaxos is to reduce the latency of serving user requests, recovering from failures, and reacting to mobility. DPaxos achieves these objectives by a few proposed changes to the traditional Paxos protocol. Most notably, DPaxos proposes a dynamic allocation of quorums ( i.e. , groups of nodes) that are needed for Paxos Leader Election. Leader Election quorums in DPaxos are smaller than traditional Paxos and expand only in the presence of conflicts.
more »
« less
- Award ID(s):
- 1703560
- NSF-PAR ID:
- 10091231
- Date Published:
- Journal Name:
- SIGMOD
- Page Range / eLocation ID:
- 1221 to 1236
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, we examine the Paxos protocol and demonstrate how the discrete numbering of ballots can be leveraged to weaken the conditions for learning. Specifically, we define the notion of consecutive ballots and use this to define Consecutive Quorums. Consecutive Quorums weaken the learning criterion such that a learner does not need matching accept messages sent in the same ballot from a majority of acceptors to learn a value. We prove that this modification preserves the original safety and liveness guarantees of Paxos. We define Consecutive Paxos which encapsulates the properties of discrete consecutive ballots. To establish the correctness of these results, in addition to a paper proof, we formally verify the correctness of a State Machine Replication Library built on top of an optimized version of Multi-Paxos modified to reflect Consecutive Paxos.more » « less
-
Finding the right abstraction is critical for reasoning about complex systems such as distributed protocols like Paxos and Raft. Despite a recent abundance of impressive verification work in this area, we claim the ways that past efforts model distributed state are not ideal for protocol-level reasoning: they either hide important details, or leak too much complexity from the network. As evidence we observe that nearly all of them avoid the complex, but important issue of reconfiguration. Reconfiguration's primary challenge lies in how it interacts with a protocol's core safety invariants. To handle this increased complexity, we introduce the Adore model, whose novel abstract state hides network-level communications while capturing dependencies between committed and uncommitted states, as well as metadata like election quorums. It includes first-class support for a generic reconfiguration command that can be instantiated with a variety of implementations. Under this model, the subtle interactions between reconfiguration and the core protocol become clear, and with this insight we completed the first mechanized proof of safety of a reconfigurable consensus protocol.more » « less
-
Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.more » « less
-
Böhme, Rainer ; Kiffer, Lucianna (Ed.)We consider the problem of secret leader election with accountability. Secret leader election protocols counter adaptive adversaries by keeping the identities of elected leaders secret until they choose to reveal themselves, but in existing protocols this means it is impossible to determine who was elected leader if they fail to act. This opens the door to undetectable withholding attacks, where leaders fail to act in order to slow the protocol or bias future elections in their favor. We formally define accountability (in weak and strong variants) for secret leader election protocols. We present three paradigms for adding accountability, using delay-based cryptography, enforced key revelation, or threshold committees, all of which ensure that after some time delay the result of the election becomes public. The paradigm can be chosen to balance trust assumptions, protocol efficiency, and the length of the delay before leaders are revealed. Along the way, we introduce several new cryptographic tools including re-randomizable timed commitments and timed VRFs.more » « less