skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Strong and Efficient Consistency with Consistency-aware Durability
We introduce consistency-aware durability or Cad, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems; such a property can be particularly beneficial in geo-distributed and edge-computing scenarios. We build Orca, a modified version of ZooKeeper that implements Cad and cross-client monotonic reads. We experimentally show that Orca provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, Orca provides significantly higher throughput (1.8--3.3×) and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. We also implement Cad in Redis and show that the performance benefits are similar to that of Cad’s implementation in ZooKeeper.  more » « less
Award ID(s):
1838733
PAR ID:
10299723
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM transactions on storage
Volume:
17
Issue:
1
ISSN:
1553-3077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce consistency-aware durability or CAD, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems. We build ORCA, a modified version of ZooKeeper that implements CAD and cross-client monotonic reads. We experimentally show that ORCA provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, ORCA provides significantly higher throughput (1.8 – 3.3×), and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. 
    more » « less
  2. We introduce consistency-aware durability or CAD, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems. We build ORCA, a modified version of ZooKeeper that implements CAD and cross-client mono- tonic reads. We experimentally show that ORCA provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, ORCA provides significantly higher through- put (1.8 – 3.3x), and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. 
    more » « less
  3. Many distributed applications rely on the strong guarantees of sequential consistency to ensure program correctness. Replication systems or frameworks that support such applications typically implement sequential consistency by em- ploying voting schemes among replicas. However, such schemes suffer dramatic performance loss when deployed globally due to increased long-haul message latency between replicas in separate data centers. One approach to overcome this challenge involves deploying distinct instances of a service in each geographic cluster, then loosely coupling those services. Unfortunately, the consistency guarantees of the individual replication system in- stances do not compose when coupled this way, sacrificing overall sequential consistency. We propose an alternative approach, the consistent, propagatable partition tree (CoPPar Tree), a data structure that spans multiple data centers and data partitions, and that realizes sequential consistency using divide-and-conquer. By leveraging the geospatial affinity of data used in global services, CoPPar Tree can localize reads and writes in a sequentially consistent manner, improving the overall performance of a sequentially consistent service deployed at global scale. Our work allows clients to access local data and fully run SMR protocols locally without additional overhead. We implemented CoPPar Tree by enhancing ZooKeeper with an extension called ZooTree, which can be deployed without changing existing ZooKeeper clusters, and which achieves a speedup of 100×for reads and up to 10× for writes over prior work. 
    more » « less
  4. The FuzzyLog is a partially ordered shared log abstraction. Distributed applications can concurrently append to the partial order and play it back. FuzzyLog applications obtain the benefits of an underlying shared log --- extracting strong consistency, durability, and failure atomicity in simple ways --- without suffering from its drawbacks. By exposing a partial order, the FuzzyLog enables three key capabilities for applications: linear scaling for throughput and capacity (without sacrificing atomicity), weaker consistency guarantees, and tolerance to network partitions. We present Dapple, a distributed implementation of the FuzzyLog abstraction that stores the partial order compactly and supports efficient appends / playback via a new ordering protocol. We implement several data structures and applications over the FuzzyLog, including several map variants as well as a ZooKeeper implementation. Our evaluation shows that these applications are compact, fast, and flexible: they retain the simplicity (100s of lines of code) and strong semantics (durability and failure atomicity) of a shared log design while exploiting the partial order of the FuzzyLog for linear scalability, flexible consistency guarantees (e.g., causal+ consistency), and network partition tolerance. On a 6-node Dapple deployment, our FuzzyLogbased ZooKeeper supports 3M/sec single-key writes, and 150K/sec atomic cross-shard renames. 
    more » « less
  5. The FuzzyLog is a partially ordered shared log abstraction. Distributed applications can concurrently append to the partial order and play it back. FuzzyLog applications obtain the benefits of an underlying shared log - extracting strong consistency, durability, and failure atomicity in simple ways - without suffering from its drawbacks. By exposing a partial order, the FuzzyLog enables three key capabilities for applications: linear scaling for throughput and capacity (without sacrificing atomicity), weaker consistency guarantees, and tolerance to network partitions. We present Dapple, a distributed implementation of the FuzzyLog abstraction that stores the partial order compactly and supports efficient appends / playback via a new ordering protocol. We implement several data structures and applications over the FuzzyLog, including several map variants as well as a ZooKeeper implementation. Our evaluation shows that these applications are compact, fast, and flexible: they retain the simplicity (100s of lines of code) and strong semantics (durability and failure atomicity) of a shared log design while exploiting the partial order of the FuzzyLog for linear scalability, flexible consistency guarantees (e.g., causal+ consistency), and network partition tolerance. On a 6-node Dapple deployment, our FuzzyLog-based ZooKeeper supports 3M/sec single-key writes, and 150K/sec atomic cross-shard renames. 
    more » « less