skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols
Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted - e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is accountability: if a corrupt node breaks the protocol and affects consensus safety, it should be possible to identify the culpable components with cryptographic integrity from the node states. Today, the best-known protocol for providing accountability to CFT protocols is called PeerReview; it essentially records a signed transcript of all messages sent during the CFT protocol. Because PeerReview is agnostic to the underlying CFT protocol, it incurs high communication and storage overhead. We propose CFT-Forensics, an accountability framework for CFT protocols. We show that for a special family of forensics-compliant CFT protocols (which includes widely-used CFT protocols like Raft and multi-Paxos), CFT-Forensics gives provable accountability guarantees. Under realistic deployment settings, we show theoretically that CFT-Forensics operates at a fraction of the cost of PeerReview. We subsequently instantiate CFT-Forensics for Raft, and implement Raft-Forensics as an extension to the popular nuRaft library. In extensive experiments, we demonstrate that Raft-Forensics adds low overhead to vanilla Raft. With 256 byte messages, Raft-Forensics achieves a peak throughput 87.8% of vanilla Raft at 46% higher latency (+44 ms). We finally integrate Raft-Forensics into the open-source central bank digital currency OpenCBDC, and show that in wide-area network experiments, Raft-Forensics achieves 97.8% of the throughput of Raft, with 14.5% higher latency (+326 ms).  more » « less
Award ID(s):
2325477
PAR ID:
10618319
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Böhme, Rainer; Kiffer, Lucianna
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
316
ISSN:
1868-8969
ISBN:
978-3-95977-345-4
Page Range / eLocation ID:
3:1-3:25
Subject(s) / Keyword(s):
CFT Protocols forensics blockchain Security and privacy → Distributed systems security Networks → Security protocols
Format(s):
Medium: X Size: 25 pages; 1661047 bytes Other: application/pdf
Size(s):
25 pages 1661047 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process. Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol dependency on the primary node and minimizes transmission delay induced by the slack primary while keeping low message complexity and latency with high scalability. Hermes achieves these performance improvements by relaxing strong BFT agreement (safety) guarantees only for a specific type of Byzantine faults (also called equivocated faults). Interestingly, we show that in Hermes equivocating by a Byzantine primary is expensive and ineffective. Therefore, the safety of Hermes is comparable to the general BFT consensus. We deployed and tested Hermes on 190 Amazon EC2 instances. In these tests, Hermes's performance was comparable to the state-of-the-art BFT protocol for blockchains (when the network size is large) in the absence of slack nodes. Whereas, in the presence of slack nodes, Hermes outperforms the state-of-the-art BFT protocol significantly in terms of throughput and latency. 
    more » « less
  2. Replication is essential for fault-tolerance. However, in in-memory systems, it is a source of high overhead. Remote direct memory access (RDMA) is attractive to create redundant copies of data, since it is low-latency and has no CPU overhead at the target. However, existing approaches still result in redundant data copying and active receivers. To ensure atomic data transfers, receivers check and apply only fully received messages. Tailwind is a zero-copy recovery-log replication protocol for scale-out in-memory databases. Tailwind is the first replication protocol that eliminates all CPU-driven data copying and fully bypasses target server CPUs, thus leaving backups idle. Tailwind ensures all writes are atomic by leveraging a protocol that detects incomplete RDMA transfers. Tailwind substantially improves replication throughput and response latency compared with conventional RPC-based replication. In symmetric systems where servers both serve requests and act as replicas, Tailwind also improves normal-case throughput by freeing server CPU resources for request processing. We implemented and evaluated Tailwind on RAMCloud, a low-latency in-memory storage system. Experiments show Tailwind improves RAMCloud's normal-case request processing throughput by 1.7x. It also cuts down writes median and 99th percentile latencies by 2x and 3x respectively. 
    more » « less
  3. Asynchronous verifiable secret sharing (AVSS) protocols protect a secret that is distributed among N parties. Dual-threshold AVSS protocols guarantee consensus in the presence of T Byzantine failures and privacy if fewer than P parties attempt to reconstruct the secret. In this work, we construct a dual-threshold AVSS protocol that is optimal along several dimensions. First, it is a high-threshold AVSS scheme, meaning that it is a dual-threshold AVSS with optimal parameters T < N/3 and P < N - T. Second, it has O(N^2) message complexity, and for large secrets it achieves the optimal O(N) communication overhead, without the need for a public key infrastructure or trusted setup. While these properties have been achieved individually before, to our knowledge this is the first protocol that is achieves all of the above simultaneously. The core component of our construction is a high-threshold AVSS scheme for small secrets based on polynomial commitments that achieves O(N^2 log(N)) communication overhead, as compared to prior schemes that require O(N^3) overhead with T 
    more » « less
  4. Security analyses for consensus protocols in blockchain research have primarily focused on the synchronous model, where point-to-point communication delays are upper bounded by a known finite constant. These models are unrealistic in noisy settings, where messages may be lost (i.e. incur infinite delay). In this work, we study the impact of message losses on the security of the proof-of-work longest-chain protocol. We introduce a new communication model to capture the impact of message loss called the 0-∞ model, and derive a region of tolerable adversarial power under which the consensus protocol is secure. The guarantees are derived as a simple bound for the probability that a transaction violates desired security properties. Specifically, we show that this violation probability decays almost exponentially in the security parameter. Our approach involves constructing combinatorial objects from blocktrees, and identifying random variables associated with them that are amenable to analysis. This approach improves existing bounds and extends the known regime for tolerable adversarial threshold in settings where messages may be lost. 
    more » « less
  5. Blockchains operating at the global scale demand high-performance byzantine fault-tolerant (BFT) consensus protocols. Most classic PBFT-like protocols suffer from an issue known as the leader bottleneck, which severely limits their throughput and resource utilization. Recently, Directed Acyclic Graph, or DAG-based protocols, have emerged as a promising approach for eliminating the leader bottleneck and achieving better performance. They attain higher throughput by separating data dissemination and block ordering. However, their safety and liveness logic is also significantly more elaborate. So far, most DAG-based protocols have only enjoyed on-paper security proofs, and it is not clear how to construct formal proofs of these protocols efficiently. We introduce LiDO-DAG, a concurrent object model that abstracts the common logic of these protocols. LiDO-DAG is constructed by combining a DAG abstraction and LiDO, a recently proposed abstraction for leader-based consensus. To demonstrate that our framework enables rapid validation of new DAG-based protocol designs, we implemented LiDO-DAG in Coq and applied it to three recent DAG-based protocols, including Narwhal, Bullshark, and Sailfish. Our framework readily yields mechanized safety and liveness proofs for all three protocols, which are also the first mechanized liveness proofs of any DAG-based protocol. Our framework has also revealed an optimization for Sailfish that improves its worst-case latency. 
    more » « less