In the Zero-Knowledge Proof (ZKP) of a disjunctive statement, P and V agree on B fan-in 2 circuits C0, . . . , CB−1 over a field F; each circuit has n_in inputs, n_× multiplications, and one output. P’s goal is to demonstrate the knowledge of a witness (id ∈ [B], w ∈ F^n_in ), s.t. Cid (w) = 0 where neither w nor id is revealed. Disjunctive statements are effective, for example, in implementing ZKP based on sequential execution of CPU steps. This paper studies ZKP (of knowledge) protocols over disjunctive statements based on Vector OLE. Denoting by λ the statistical security parameter and let ρ \in^\Delta max{log |F|, λ}, the previous state-of-the-art protocol Robin (Yang et al. CCS’23) required (n_in +3n_×) log |F|+O(ρB) bits of communication with O(1) rounds, and Mac'n'Cheese (Baum et al. CRYPTO’21) required (n_in +n_×) log |F|+2n×ρ+O(ρ logB) bits of communication with O(logB) rounds, both in the VOLE-hybrid model. Our novel protocol LogRobin++ achieves the same functionality at the cost of (n_in+n_×) log |F|+O(ρ logB) bits of communication with O(1) rounds in the VOLE-hybrid model. Crucially, LogRobin++ takes advantage of two new techniques – (1) an O(logB)-overhead approach to prove in ZK that an IT-MAC commitment vector contains a zero; and (2) the realization of VOLE-based ZK over a disjunctive statement, where P commits only to w and multiplication outputs of Cid (w) (as opposed to prior work where P commits to w and all three wires that are associated with each multiplication gate). We implemented LogRobin++ over Boolean (i.e., F2) and arithmetic (i.e., F_2^61−1) fields. In our experiments, including the cost of generating VOLE correlations, LogRobin++ achieved up to 170× optimization over Robin in communication, resulting in up to 7× (resp. 3×) wall-clock time improvements in a WAN-like (resp. LAN-like) setting.
more »
« less
This content will become publicly available on December 2, 2025
Tight ZK CPU: Batched ZK Branching with Cost Proportional to Evaluated Instruction
We explore Zero-Knowledge Proofs (ZKPs) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the usefulness of ZK CPUs with a large number of instructions of varying sizes. We formalize and design an efficient tight ZK CPU, where the cost (both computation and communication, for each party) of each step depends only on the instruction taken. This qualitatively improves over state of the art, where cost scales with the size of the largest CPU instruction (largest CFG node). Our technique is formalized in the standard commit-and-prove paradigm, so our results are compatible with a variety of (interactive and non-interactive) general-purpose ZK. We implemented an interactive tight arithmetic (over F261−1) ZK CPU based on Vector Oblivious Linear Evaluation (VOLE) and compared it to the state-of-the-art non-tight VOLE-based ZK CPU Batchman (Yang et al. CCS’23). In our experiments, under the same hardware configuration, we achieve comparable performance when instructions are of the same size and a 5-18× improvement when instructions are of varied size. Our VOLE-based tight ZK CPU (over F261−1) can execute 100K (resp. 450K) multiplication gates per second in a WAN-like (resp. LAN-like) setting. It requires ≤ 102 Bytes per multiplication gate. Our basic building block, ZK Unbalanced Read-Only Memory, may be of independent interest.
more »
« less
- PAR ID:
- 10578744
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400706363
- Page Range / eLocation ID:
- 3095 to 3109
- Format(s):
- Medium: X
- Location:
- Salt Lake City UT USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Cas Cremers and Engin Kirda (Ed.)Vector Oblivious Linear Evaluation (VOLE) supports fast and scalable interactive Zero-Knowledge (ZK) proofs. Despite recent improvements to VOLE-based ZK, compiling proof statements to a control-flow oblivious form (e.g., a circuit) continues to lead to expensive proofs. One useful setting where this inefficiency stands out is when the statement is a disjunction of clauses $$\mathcal{L}_1 \lor \cdots \lor \mathcal{L}_B$$. Typically, ZK requires paying the price to handle all $$B$$ branches. Prior works have shown how to avoid this price in communication, but not in computation. Our main result, $$\mathsf{Batchman}$$, is asymptotically and concretely efficient VOLE-based ZK for batched disjunctions, i.e. statements containing $$R$$ repetitions of the same disjunction. This is crucial for, e.g., emulating CPU steps in ZK. Our prover and verifier complexity is only $$\bigO(RB+R|\C|+B|\C|)$$, where $$|\C|$$ is the maximum circuit size of the $$B$$ branches. Prior works' computation scales in $$RB|\C|$$. For non-batched disjunctions, we also construct a VOLE-based ZK protocol, $$\mathsf{Robin}$$, which is (only) communication efficient. For small fields and for statistical security parameter $$\lambda$$, this protocol's communication improves over the previous state of the art ($$\mathsf{Mac'n'Cheese}$$, Baum et al., CRYPTO'21) by up to factor $$\lambda$$. Our implementation outperforms prior state of the art. E.g., we achieve up to $$6\times$$ improvement over $$\mathsf{Mac'n'Cheese}$$ (Boolean, single disjunction), and for arithmetic batched disjunctions our experiments show we improve over $$\mathsf{QuickSilver}$$ (Yang et al., CCS'21) by up to $$70\times$$ and over $$\mathsf{AntMan}$$ (Weng et al., CCS'22) by up to $$36\times$$.more » « less
-
Cas Cremers and Engin Kirda (Ed.)In MPC, we usually represent programs as circuits. This is a poor fit for programs that use complex control flow, as it is costly to compile control flow to circuits. This motivated prior work to emulate CPUs inside MPC. Emulated CPUs can run complex programs, but they introduce high overhead due to the need to evaluate not just the program, but also the machinery of the CPU, including fetching, decoding, and executing instructions, accessing RAM, etc. Thus, both circuits and CPU emulation seem a poor fit for general MPC. The former cannot scale to arbitrary programs; the latter incurs high per-operation overhead. We propose \emph{variable instruction set architectures} (VISAs), an approach that inherits the best features of both circuits and CPU emulation. Unlike a CPU, a VISA machine repeatedly executes entire program \emph{fragments}, not individual instructions. By considering larger building blocks, we avoid most of the machinery associated with CPU emulation: we directly handle each fragment as a circuit. We instantiated a VISA machine via garbled circuits (GC), yielding constant-round 2PC for arbitrary assembly programs. We use improved branching (Stacked Garbling, Heath and Kolesnikov, Crypto 2020) and recent Garbled RAM (GRAM) (Heath et al., Eurocrypt 2022). Composing these securely and efficiently is intricate, and is one of our main contributions. We implemented our approach and ran it on common programs, including Dijkstra's and Knuth-Morris-Pratt. Our 2PC VISA machine executes assembly instructions at $300$Hz to $4000$Hz, depending on the target program. We significantly outperform the state-of-the-art CPU-based approach (Wang et al., ESORICS 2016, whose tool we re-benchmarked on our setup). We run in constant rounds, use $$6\times$$ less bandwidth, and run more than $$40\times$$ faster on a low-latency network. With $50$ms (resp. $100$ms) latency, we are $$898\times$$ (resp. $$1585\times$$) faster on the same setup. While our focus is MPC, the VISA model also benefits CPU-emulation-based Zero-Knowledge proof compilers, such as ZEE and EZEE (Heath et al., Oakland'21 and Yang et al., EuroS\&P'22).more » « less
-
null (Ed.)Zero-knowledge (ZK) proofs with an optimal memory footprint have attracted a lot of attention, because such protocols can easily prove very large computation with a small memory requirement. Such ZK protocol only needs O(M) memory for both parties, where M is the memory required to verify the statement in the clear. In this paper, we propose several new ZK protocols in this setting, which improve the concrete efficiency and, at the same time, enable sublinear amortized communication for circuits with some notion of relaxed uniformity. 1. In the circuit-based model, where the computation is represented as a circuit over a field, our ZK protocol achieves a communication complexity of 1 field element per non-linear gate for any field size while keeping the computation very cheap. We implemented our protocol, which shows extremely high efficiency and affordability. Compared to the previous best-known implementation, we achieve 6×–7× improvement in computation and 3×– 7× improvement in communication. When running on intro-level AWS instances, our protocol only needs one US dollar to prove one trillion AND gates (or 2.5 US dollars for one trillion multiplication gates over a 61-bit field). 2. In the setting where part of the computation can be represented as a set of polynomials, we can achieve communication sublinear to the polynomial size: the communication only depends on the input size and the highest degree of all polynomials, independent of the number of polynomials and the number of multiplications in the polynomials. Using the improved ZK protocol, we can prove matrix multiplication with communication proportional to the input size, rather than the number of multiplications. Proving the multiplication of two 1024 × 1024 matrices, our implementation, with one thread and 1 GB of memory, only needs 10 seconds and communicates 25 MB, 35× faster than the state-of-the-art protocol Virgo that would need more than 140 GB of memory for the same task.more » « less
-
Even though heavily researched, a full formal model of the x86-64 instruction set is still not available. We present libLISA, a tool for automated discovery and analysis of the ISA of a CPU. This produces the most extensive formal x86-64 model to date, with over 118000 different instruction groups. The process requires as little human specification as possible: specifically, we do not rely on a human-written (dis)assembler to dictate which instructions are executable on a given CPU, or what their in- and outputs are. The generated model is CPU-specific: behavior that is undefined is synthesized for the current machine. Producing models for five different x86-64 machines, we mutually compare them, discover undocumented instructions, and generate instruction sequences that are CPU-specific. Experimental evaluation shows that we enumerate virtually all instructions within scope, that the instructions' semantics are correct w.r.t. existing work, and that we improve existing work by exposing bugs in their handwritten models.more » « less
An official website of the United States government
