skip to main content


Search for: All records

Award ID contains: 1910613

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. File systems that store metadata on a single machine or via a shared-disk abstraction face scalability challenges, especially in contexts demanding the management of billions of files. Recent work has shown that employing shared-nothing, distributed database system (DDBMS) for metadata storage can alleviate these scalability challenges without compromising on high availability guarantees. However, for low-scale deployments -- where metadata can fit in memory on a single machine -- these DDBMS-based systems typically perform an order of magnitude worse than systems that store metadata in memory on a single machine. This has limited the impact of these distributed database approaches, since they are only currently applicable to file systems of extreme scale. This paper describes FileScale, a three-tier architecture that incorporates a DDBMS as part of a comprehensive approach to file system metadata management. In contrast to previous approaches, FileScale performs comparably to the single-machine architecture at a small scale, while enabling linear scalability as the file system metadata increases. 
    more » « less
    Free, publicly-accessible full text available October 30, 2024
  2. Many globally distributed data stores need to replicate data across large geographic distances. Since synchronously replicating data across such distances is slow, those systems with high consistency requirements often geo-partition data and direct all linearizable requests to the primary region of the accessed data. This significantly improves performance for workloads where most transactions access data close to where they originate from. However, supporting serializable multi-geo-partition transactions is a challenge, and they often degrade the performance of the whole system. This becomes even more challenging when they conflict with single-partition requests, where optimistic protocols lead to high numbers of aborts, and pessimistic protocols lead to high numbers of distributed deadlocks. In this paper, we describe the design of concurrency control and deadlock resolution protocols, built within a practical, complete implementation of a geographically replicated database system called Detock, that enables processing strictly-serializable multi-region transactions with near-zero performance degradation at extremely high conflict and order of magnitude higher throughput relative to state-of-the art geo-replication approaches, while improving latency by up to a factor of 5.

     
    more » « less
    Free, publicly-accessible full text available June 13, 2024
  3. Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary's concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup's parallelism. As a result, the primary's concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.

     
    more » « less
  4. null (Ed.)
    BullFrog is a relational DBMS that supports single-step schema migrations --- even those that are backwards incompatible --- without downtime, and without need for advanced warning. When a schema migration is submitted, BullFrog initiates a logical switch to the new schema, but physically migrates affected data lazily, as it is accessed by incoming transactions. BullFrog's internal concurrency control algorithms and data structures enable concurrent processing of schema migration operations with post-migration transactions, while ensuring exactly-once migration of all old data into the physical layout required by the new schema. BullFrog is implemented as an open source extension to PostgreSQL. Experiments using this prototype over a TPC-C based workload (supplemented to include schema migrations) show that BullFrog can achieve zero-downtime migration to non-trivial new schemas with near-invisible impact on transaction throughput and latency. 
    more » « less
  5. AnyLog is a decentralized platform for data publishing, sharing, and querying IoT (Internet of Things) data that enables an unlimited number of independent participants to publish and access the contents of IoT datasets stored across the participants. AnyLog provides decentralized publishing and querying functionality over structured data in an analogous fashion to how the world wide web (WWW) enables decentralized publishing and accessing of unstructured data. However, AnyLog differs from the traditional WWW in the way that it provides incentives and financial reward for performing tasks that are critical to the well-being of the system as a whole, including contribution, integration, storing, and processing of data, as well as protecting the confidentiality, integrity, and availability of that data. Another difference is how Anylog enforces good behavior by the participants through a collection of methods, including blockchain, secure enclaves, and state channels. 
    more » « less