NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mako: Speculative Distributed Transactions with Geo-Replication

Shen, Weihai; Cui, Yang; Sen, Siddhartha; Angel, Sebastian; Mu, Shuai (July 2025, USENIX Symposium on Operating Systems Design and Implementation)

This paper introduces Mako, a highly available, high- throughput, and horizontally scalable transactional key-value store. Mako performs strongly consistent geo-replication to maintain availability despite entire datacenter failures, uses multi-core machines for fast serializable transaction process- ing, and shards data to scale out. To achieve these properties, especially to overcome the overheads of distributed transac- tions in geo-replicated settings, Mako decouples transaction execution and replication. This enables Mako to run transactions speculatively and very fast, and replicate transactions in the background to make them fault-tolerant. The key innovation in Mako is the use of two-phase commit (2PC) speculatively to allow distributed transactions to proceed without having to wait for their decisions to be replicated, while also preventing unbounded cascading aborts if shards fail prior to the end of replication. Our experimental evaluation on Azure shows that Mako processes 3.66M TPC-C transactions per second when data is split across 10 shards, each of which runs with 24 threads. This is an 8.6×higher throughput than state-of-the-art systems optimized for geo-replication.
more » « less
Free, publicly-accessible full text available July 7, 2026
Mako: Speculative Distributed Transactions with Geo-Replication

Shen, Weihai; Cui, Yang; Sen, Siddhartha; Angel, Sebastian; Mu, Shuai (July 2025, USENIX)

This paper introduces Mako, a highly available, highthroughput, and horizontally scalable transactional key-value store. Mako performs strongly consistent geo-replication to maintain availability despite entire datacenter failures, uses multi-core machines for fast serializable transaction processing, and shards data to scale out. To achieve these properties, especially to overcome the overheads of distributed transactions in geo-replicated settings, Mako decouples transaction execution and replication. This enables Mako to run transactions speculatively and very fast, and replicate transactions in the background to make them fault-tolerant. The key innovation in Mako is the use of two-phase commit (2PC) speculatively to allow distributed transactions to proceed without having to wait for their decisions to be replicated, while also preventing unbounded cascading aborts if shards fail prior to the end of replication. Our experimental evaluation on Azure shows that Mako processes 3.66M TPC-C transactions per second when data is split across 10 shards, each of which runs with 24 threads. This is an 8.6× higher throughput than state-of-the-art systems optimized for geo-replication.
more » « less
Free, publicly-accessible full text available July 7, 2026
TraceUpscaler: Upscaling Traces to Evaluate Systems at High Load

https://doi.org/10.1145/3627703.3629581

Sajal, Sultan Mahmud; Zhu, Timothy; Urgaonkar, Bhuvan; Sen, Siddhartha (April 2024, ACM)

Full Text Available
Splice: Efficiently Removing a User's Data from In-memory Application State

https://doi.org/10.1145/3576915.3623070

Han, Xueyuan; Mickens, James; Sen, Siddhartha (November 2023, ACM)
Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification

https://doi.org/10.1609/aaai.v38i20.30203

Cooper, A_Feder; Lee, Katherine; Choksi, Madiha Zahrah; Barocas, Solon; De_Sa, Christopher; Grimmelmann, James; Kleinberg, Jon; Sen, Siddhartha; Zhang, Baobao (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.
more » « less
Full Text Available
NCC: Natural Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall

Lu, Haonan; Mu, Shuai; Sen, Siddhartha; Lloyd, Wyatt (July 2023, The 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI '23))

Strictly serializable datastores greatly simplify application development. However, existing techniques pay unnecessary costs for naturally consistent transactions, which arrive at servers in an order that is already strictly serializable. We exploit this natural arrival order by executing transactions with minimal costs while optimistically assuming they are naturally consistent, and then leverage a timestamp-based technique to efficiently verify if the execution is indeed consistent. In the process of this design, we identify a fundamental pitfall in relying on timestamps to provide strict serializability and name it the timestamp-inversion pitfall. We show that timestamp inversion has affected several existing systems. We present Natural Concurrency Control (NCC), a new concurrency control technique that guarantees strict serializability and ensures minimal costs—i.e., one-round latency, lock-free, and non-blocking execution—in the common case by leveraging natural consistency. NCC is enabled by three components: non-blocking execution, decoupled response management, and timestamp-based consistency checking. NCC avoids the timestamp-inversion pitfall with response timing control and proposes two optimization techniques, asynchrony-aware timestamps and smart retry, to reduce false aborts. Moreover, NCC designs a specialized protocol for read-only transactions, which is the first to achieve optimal best-case performance while guaranteeing strict serializability without relying on synchronized clocks. Our evaluation shows NCC outperforms state-of-the-art strictly serializable solutions by an order of magnitude on many workloads.
more » « less
Full Text Available
NCC: Natural Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall

Lu, Haonan; Mu, Shuai Mu; Sen, Siddhartha; Lloyd, Wyatt (July 2023, 17th USENIX Symposium on Operating Systems Design and Implementation)

Full Text Available
Rolis: a software approach to efficiently replicating multi-core transactions

https://doi.org/10.1145/3492321.3519561

Shen, Weihai; Khanna, Ansh; Angel, Sebastian; Sen, Siddhartha; Mu, Shuai (March 2022, Proceedings of the European Conference on Computer Systems (EuroSys))

This paper presents Rolis, a new speedy and fault-tolerant replicated multi-core transactional database system. Rolis's aim is to mask the high cost of replication by ensuring that cores are always doing useful work and not waiting for each other or for other replicas. Rolis achieves this by not mixing the multi-core concurrency control with multi-machine replication, as is traditionally done by systems that use Paxos to replicate the transaction commit protocol. Instead, Rolis takes an "execute-replicate-replay" approach. Rolis first speculatively executes the transaction on the leader machine, and then replicates the per-thread transaction log to the followers using a novel protocol that leverages independent Paxos instances to avoid coordination, while still allowing followers to safely replay. The execution, replication, and replay are carefully designed to be scalable and have nearly zero coordination overhead across cores. Our evaluation shows that Rolis can achieve 1.03M TPS (transactions per second) on the TPC-C workload, using a 3-replica setup where each server has 32 cores. This throughput result is orders of magnitude higher than traditional software approaches we tested (e.g., 2PL), and is comparable to state-of-the-art, fault-tolerant, in-memory storage systems built using kernel bypass and advanced networking hardware, even though Rolis runs on commodity machines.
more » « less
Full Text Available
Tolerating Slowdowns in Replicated State Machines using Copilots

Ngo, Khiem; Sen, Siddhartha; Lloyd, Wyatt (November 2020, USENIX Symposium on Operating Systems Design and Implementation)
null (Ed.)
Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.
more » « less
Full Text Available
Performance-Optimal Read-Only Transactions

Lu, Haonan; Sen, Siddhartha; Lloyd, Wyatt (November 2020, 14th USENIX Symposium on Operating Systems Design and Implementation)
null (Ed.)
Read-only transactions are critical for consistently reading data spread across a distributed storage system but have worse performance than simple, non-transactional reads. We identify three properties of simple reads that are necessary for read-only transactions to be performance-optimal, i.e.,come as close as possible to simple reads. We demonstrate a fundamental tradeoff in the design of read-only transactions by proving that performance optimality is impossible to achieve with strict serializability, the strongest consistency.Guided by this result, we present PORT, a performance-optimal design with the strongest consistency to date. Central to PORT are version clocks, a specialized logical clock that concisely captures the necessary ordering constraints.We show the generality of PORT with two applications.Scylla-PORT provides process-ordered serializability with simple writes and shows performance comparable to its non-transactional base system. Eiger-PORT provides causal consistency with write transactions and significantly improves the performance of its transactional base system.
more » « less
Full Text Available

« Prev Next »

Search for: All records