This paper presents Pesto, a high-performance Byzantine Fault Tolerant (BFT) database that offers full SQL compatibility. Pesto intentionally forgoes the use of State Machine Replication (SMR); SMR-based designs offer poor performance due to the several round trips required to order transactions. Pesto, instead, allows for replicas to remain inconsistent, and only synchronizes on demand to ensure that the database remain serializable in the presence of concurrent transactions and malicious actors. On TPC-C, Pesto matches the throughput of Peloton and Postgres, two unreplicated SQL database systems, while increasing throughput by 2.3x compared to classic SMR-based BFT-architectures, and reducing latency by 2.7x to 3.9x. Pesto's leaderless design minimizes the impact of replica failures and ensures robust performance.
more »
« less
This content will become publicly available on October 12, 2026
Pesto: Cooking up High Performance BFT Queries
This paper presents Pesto, a high-performance Byzantine Fault Tolerant (BFT) database that offers full SQL compatibility. Pesto intentionally forgoes the use of State Machine Replication (SMR); SMR-based designs offer poor performance due to the several round trips required to order transactions. Pesto, instead, allows for replicas to remain inconsistent, and only synchronizes on demand to ensure that the database remain serializable in the presence of concurrent transactions and malicious actors. On TPC-C, Pesto matches the throughput of Peloton and Postgres, two unreplicated SQL database systems, while increasing throughput by 2.3x compared to classic SMR-based BFT-architectures, and reducing latency by 2.7x to 3.9x. Pesto's leaderless design minimizes the impact of replica failures and ensures robust performance.
more »
« less
- Award ID(s):
- 2106954
- PAR ID:
- 10644399
- Publisher / Repository:
- Association for Computing Machinery New York, NY, United States
- Date Published:
- Format(s):
- Medium: X
- Location:
- Seoul, Republic of Korea
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Distributed data management systems use state Machine Replication (SMR) to provide fault tolerance. The SMR algorithm enables Byzantine Fault-Tolerant (BFT) protocols to guarantee safety and liveness despite the malicious failure of nodes. However, SMR does not prevent the adversarial manipulation of the order of transactions, where the order assigned by a malicious leader differs from the order in that transactions are received from clients. Whileorder-fairnesshas been recently studied in a few protocols, such protocols rely on synchronized clocks, suffer from liveness issues, or incur significant performance overhead. This paper presentsRashnu, a high-performance fair ordering protocol. Rashnu is motivated by the fact that fair ordering among two transactions is needed only when both transactions access a shared resource. Based on this observation, we define the notion ofdata-dependent order fairnesswhere replicas capture only the order of data-dependent transactions and the leader uses these orders to propose a dependency graph that represents fair ordering among transactions. Replicas then execute transactions using the dependency graph, resulting in the parallel execution of independent transactions. We implemented a prototype of Rashnu where our experimental evaluation reveals the low overhead of providing order-fairness in Rashnu.more » « less
-
Real-time embedded systems perform many important functions in the modern world. A standard way to tolerate faults in these systems is with Byzantine fault-tolerant (BFT) state machine replication (SMR), in which multiple replicas execute the same software and their outputs are compared by the actuators. Unfortunately, traditional BFT SMR protocols areslow, requiring replicas to exchange sensor data back and forth over multiple rounds in order to reach agreement before each execution. The state of the art in reducing the latency of BFT SMR iseager execution, in which replicas execute on data from different sensors simultaneously on different processor cores. However, this technique results in 3–5× higher computation overheads compared to traditional BFT SMR systems, significantly limiting schedulability. We presentCrossTalk, a new BFT SMR protocol that leverages the prevalence of redundant switched networks in embedded systems to reduce latency without added computation. The key idea is to use specific algorithms to move messages between redundant network planes (which many systems already possess) as the messages travel from the sensors to the replicas. As a result,CrossTalkcan ensure agreementautomaticallyin the network, avoiding the need for any communication between replicas. Our evaluation shows thatCrossTalkimproves schedulability by 2.13–4.24× over the state of the art. Moreover, in a NASA simulation of a real spaceflight mission,CrossTalktolerates more faults than the state of the art while using nearly 3× less processor time.more » « less
-
The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process. Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol dependency on the primary node and minimizes transmission delay induced by the slack primary while keeping low message complexity and latency with high scalability. Hermes achieves these performance improvements by relaxing strong BFT agreement (safety) guarantees only for a specific type of Byzantine faults (also called equivocated faults). Interestingly, we show that in Hermes equivocating by a Byzantine primary is expensive and ineffective. Therefore, the safety of Hermes is comparable to the general BFT consensus. We deployed and tested Hermes on 190 Amazon EC2 instances. In these tests, Hermes's performance was comparable to the state-of-the-art BFT protocol for blockchains (when the network size is large) in the absence of slack nodes. Whereas, in the presence of slack nodes, Hermes outperforms the state-of-the-art BFT protocol significantly in terms of throughput and latency.more » « less
-
Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and updates can affect database state with respect to these sophisticated properties. Enforcing serializable execution of all transactions achieves this simplification, but it comes at a significant price in performance, especially at scale, where database state is often replicated to improve latency and availability. To address these challenges, this paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems. We manifest our approach in a tool, CLOTHO, that combines a static analyzer and model checker to generate abstract executions, discover serializability violations in these executions, and translate them back into concrete test inputs suitable for deployment in a test environment. To the best of our knowledge, CLOTHO, is the first automated test generation facility for identifying serializability anomalies of Java applications intended to operate in geo-replicated distributed environments. An experimental evaluation on a set of industry-standard benchmarks demonstrates the utility of our approach.more » « less
An official website of the United States government
