skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adore: atomic distributed objects with certified reconfiguration
Finding the right abstraction is critical for reasoning about complex systems such as distributed protocols like Paxos and Raft. Despite a recent abundance of impressive verification work in this area, we claim the ways that past efforts model distributed state are not ideal for protocol-level reasoning: they either hide important details, or leak too much complexity from the network. As evidence we observe that nearly all of them avoid the complex, but important issue of reconfiguration. Reconfiguration's primary challenge lies in how it interacts with a protocol's core safety invariants. To handle this increased complexity, we introduce the Adore model, whose novel abstract state hides network-level communications while capturing dependencies between committed and uncommitted states, as well as metadata like election quorums. It includes first-class support for a generic reconfiguration command that can be instantiated with a variety of implementations. Under this model, the subtle interactions between reconfiguration and the core protocol become clear, and with this insight we completed the first mechanized proof of safety of a reconfigurable consensus protocol.  more » « less
Award ID(s):
2019285 2118851 1945541
PAR ID:
10359232
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
Page Range / eLocation ID:
379 to 394
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a formal, machine checked TLA+ safety proof of MongoRaftReconfig, a distributed dynamic reconfiguration protocol. MongoRaftReconfig was designed for and implemented in MongoDB, a distributed database whose replication protocol is derived from the Raft consensus algorithm. We present an inductive invariant for MongoRaftReconfig that is formalized in TLA+ and formally proved using the TLA+ proof system (TLAPS). We also present a formal TLAPS proof of two key safety properties of MongoRaftReconfig, LeaderCompleteness and StateMachineSafety. To our knowledge, these are the first machine checked inductive invariant and safety proof of a dynamic reconfiguration protocol for a Raft based replication system. 
    more » « less
  2. Public random beacons publish random numbers at regular intervals, which anyone can obtain and verify. The design of public distributed random beacons has been an exciting research direction with significant implications for blockchains, voting, and beyond. Distributed random beacons, in addition to being bias-resistant and unpredictable, also need to have low communication overhead and latency, high resilience to faults, and ease of reconfigurability. Existing synchronous random beacon protocols sacrifice one or more of these properties. In this work, we design an efficient unpredictable synchronous random beacon protocol, OptRand, with quadratic (in the number $$n$$ of system nodes) communication complexity per beacon output. First, we innovate by employing a novel combination of bilinear pairing based publicly verifiable secret-sharing and non-interactive zero-knowledge proofs to build a linear (in $$n$$) sized publicly verifiable random sharing. Second, we develop a state machine replication protocol with linear-sized inputs that is also optimistically responsive, i.e., it can progress responsively at actual network speed during optimistic conditions, despite the synchrony assumption, and thus incur low latency. In addition, we present an efficient reconfiguration mechanism for OptRand that allows nodes to leave and join the system. Our experiments show our protocols perform significantly better compared to state-of-the-art protocols under optimistic conditions and on par with state-of-the-art protocols in the normal case. We are also the first to implement a reconfiguration mechanism for distributed beacons and demonstrate that our protocol continues to be live during reconfigurations. 
    more » « less
  3. Despite recent advances, guaranteeing the correctness of large-scale distributed applications without compromising performance remains a challenging problem. Network and node failures are inevitable and, for some applications, careful control over how they are handled is essential. Unfortunately, existing approaches either completely hide these failures behind an atomic state machine replication (SMR) interface, or expose all of the network-level details, sacrificing atomicity. We propose a novel, compositional, atomic distributed object (ADO) model for strongly consistent distributed systems that combines the best of both options. The object-oriented API abstracts over protocol-specific details and decouples high-level correctness reasoning from implementation choices. At the same time, it intentionally exposes an abstract view of certain key distributed failure cases, thus allowing for more fine-grained control over them than SMR-like models. We demonstrate that proving properties even of composite distributed systems can be straightforward with our Coq verification framework, Advert, thanks to the ADO model. We also show that a variety of common protocols including multi-Paxos and Chain Replication refine the ADO semantics, which allows one to freely choose among them for an application's implementation without modifying ADO-level correctness proofs. 
    more » « less
  4. Distributed protocols have long been formulated in terms of their safety and liveness properties. Much recent work has focused on automatically verifying the safety properties of distributed protocols, but doing so for liveness properties has remained a challenging, unsolved problem. We present LVR, the first framework that can mostly automatically verify liveness properties for distributed protocols. Our key insight is that most liveness properties for distributed protocols can be reduced to a set of safety properties with the help of ranking functions. Such ranking functions for practical distributed protocols have certain properties that make them straightforward to synthesize, contrary to conventional wisdom. We prove that verifying a liveness property can then be reduced to a simpler problem of verifying a set of safety properties, namely that the ranking function is strictly decreasing and nonnegative for any protocol state transition, and there is no deadlock. LVR automatically synthesizes ranking functions by formulating a parameterized function of integer protocol variables, statically analyzing the lower and upper bounds of the variables as well as how much they can change on each state transition, then feeding the constraints to an SMT solver to determine the coefficients of the ranking function. It then uses an off-the-shelf verification tool to find inductive invariants to verify safety properties for both ranking functions and deadlock freedom. We show that LVR can mostly automatically verify the liveness properties of several distributed protocols, including various versions of Paxos, with limited user guidance. 
    more » « less
  5. Achieving consensus is a challenging and ubiquitous problem in distributed systems that is only made harder by the introduction of malicious byzantine servers. While significant effort has been devoted to the benign and byzantine failure models individually, no prior work has considered the mechanized verification of both in a generic way. We claim this is due to the lack of an appropriate abstraction that is capable of representing both benign and byzantine consensus without either losing too much detail or becoming impractically complex. We build on recent work on the atomic distributed object model to fill this void with a novel abstraction called AdoB. In addition to revealing important insights into the essence of consensus, this abstraction has practical benefits for easing distributed system verification. As a case study, we proved safety and liveness properties for AdoB in Coq, which are the first such mechanized proofs to handle benign and byzantine consensus in a unified manner. We also demonstrate that AdoB faithfully models real consensus protocols by proving it is refined by standard network-level specifications of Fast Paxos and a variant of Jolteon. 
    more » « less