We explore Zero-Knowledge Proofs (ZKPs) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the usefulness of ZK CPUs with a large number of instructions of varying sizes. We formalize and design an efficient tight ZK CPU, where the cost (both computation and communication, for each party) of each step depends only on the instruction taken. This qualitatively improves over state of the art, where cost scales with the size of the largest CPU instruction (largest CFG node). Our technique is formalized in the standard commit-and-prove paradigm, so our results are compatible with a variety of (interactive and non-interactive) general-purpose ZK. We implemented an interactive tight arithmetic (over F261−1) ZK CPU based on Vector Oblivious Linear Evaluation (VOLE) and compared it to the state-of-the-art non-tight VOLE-based ZK CPU Batchman (Yang et al. CCS’23). In our experiments, under the same hardware configuration, we achieve comparable performance when instructions are of the same size and a 5-18× improvement when instructions are of varied size. Our VOLE-based tight ZK CPU (over F261−1) can execute 100K (resp. 450K) multiplication gates per second in a WAN-like (resp. LAN-like) setting. It requires ≤ 102 Bytes per multiplication gate. Our basic building block, ZK Unbalanced Read-Only Memory, may be of independent interest.
more »
« less
Towards Generic MPC Compilers via Variable Instruction Set Architectures (VISAs)
In MPC, we usually represent programs as circuits. This is a poor fit for programs that use complex control flow, as it is costly to compile control flow to circuits. This motivated prior work to emulate CPUs inside MPC. Emulated CPUs can run complex programs, but they introduce high overhead due to the need to evaluate not just the program, but also the machinery of the CPU, including fetching, decoding, and executing instructions, accessing RAM, etc. Thus, both circuits and CPU emulation seem a poor fit for general MPC. The former cannot scale to arbitrary programs; the latter incurs high per-operation overhead. We propose \emph{variable instruction set architectures} (VISAs), an approach that inherits the best features of both circuits and CPU emulation. Unlike a CPU, a VISA machine repeatedly executes entire program \emph{fragments}, not individual instructions. By considering larger building blocks, we avoid most of the machinery associated with CPU emulation: we directly handle each fragment as a circuit. We instantiated a VISA machine via garbled circuits (GC), yielding constant-round 2PC for arbitrary assembly programs. We use improved branching (Stacked Garbling, Heath and Kolesnikov, Crypto 2020) and recent Garbled RAM (GRAM) (Heath et al., Eurocrypt 2022). Composing these securely and efficiently is intricate, and is one of our main contributions. We implemented our approach and ran it on common programs, including Dijkstra's and Knuth-Morris-Pratt. Our 2PC VISA machine executes assembly instructions at $300$Hz to $4000$Hz, depending on the target program. We significantly outperform the state-of-the-art CPU-based approach (Wang et al., ESORICS 2016, whose tool we re-benchmarked on our setup). We run in constant rounds, use $$6\times$$ less bandwidth, and run more than $$40\times$$ faster on a low-latency network. With $50$ms (resp. $100$ms) latency, we are $$898\times$$ (resp. $$1585\times$$) faster on the same setup. While our focus is MPC, the VISA model also benefits CPU-emulation-based Zero-Knowledge proof compilers, such as ZEE and EZEE (Heath et al., Oakland'21 and Yang et al., EuroS\&P'22).
more »
« less
- PAR ID:
- 10466232
- Editor(s):
- Cas Cremers and Engin Kirda
- Date Published:
- Journal Name:
- ACM CCS 2023
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Anna Lysyanskaya, Helena Handschuh (Ed.)We introduce tri-state circuits (TSCs). TSCs form a natural model of computation that, to our knowledge, has not been considered by theorists. The model captures a surprising combination of simplicity and power. TSCs are simple in that they allow only three wire values (0, 1, and undefined – Z) and three types of fan-in two gates; they are powerful in that their statically placed gates fire (execute) eagerly as their inputs become defined, implying orders of execution that depend on input. This behavior is sufficient to efficiently evaluate RAM programs. We construct a TSC that emulates T steps of any RAM program and that has only O(T · log3 T · log log T) gates. Contrast this with the reduction from RAM to Boolean circuits, where the best approach scans all of memory on each access, incurring quadratic cost. We connect TSCs with Garbled Circuits (GC). TSCs capture the power of garbling far better than Boolean Circuits, offering a more expressive model of computation that leaves per-gate cost essentially unchanged. As an important application, we construct authenticated Garbled RAM (GRAM), enabling constant-round maliciously-secure 2PC of RAM programs. Let λ denote the security parameter.We extend authenticated garbling to TSCs; by simply plugging in our TSC-based RAM, we obtain authenticated GRAM running at cost O(T · log3 T · log log T · λ), outperforming all prior work, including prior semi-honest GRAM. We also give semi-honest garbling of TSCs from a one-way function (OWF). This yields OWF-based GRAM at cost O(T ·log3 T ·log log T ·λ), outperforming the best prior OWF-based GRAM by more than factor λ.more » « less
-
Multi-Party Computation (MPC) is an important technique used to enable computation over confidential data from several sources. The public cloud provides a unique opportunity to enable MPC in a low latency environment. Field Programmable Gate Array (FPGA) hardware adoption allows for both MPC acceleration and utilization of low latency, high bandwidth communication networks that substantially improve the performance of MPC applications. In this work, we show how designing arithmetic and Boolean Multi-Party Computation gates for FPGAs in a cloud provide improvements to current MPC offerings and ease their use in applications such as machine learning. We focus on the usage of Secret Sharing MPC first designed by Araki et al to design our FPGA MPC while also providing a comparison with those utilizing Garbled Circuits for MPC. We show that Secret Sharing MPC provides a better usage of cloud resources, specifically FPGA acceleration, than Garbled Circuits and is able to use at least a 10x less computer resources as compared to the original design using CPUs.more » « less
-
Multi-Party Computation (MPC) is a technique enabling data from several sources to be used in a secure computation revealing only the result while protecting the original data, facilitating shared utilization of data sets gathered by different entities. The presence of Field Programmable Gate Array (FPGA) hardware in datacenters can provide accelerated computing as well as low latency, high bandwidth communication that bolsters the performance of MPC and lowers the barrier to using MPC for many applications. In this work, we propose a Secret Sharing FPGA design based on the protocol described by Araki et al. We compare our hardware design to the original authors' software implementations of Secret Sharing and to work accelerating MPC protocols based on Garbled Circuits with FPGAs. Our conclusion is that Secret Sharing in the datacenter is competitive and when implemented on FPGA hardware was able to use at least 10x fewer computer resources than the original work using CPUs.more » « less
-
Garbled Circuits (GC) is a technique for ensuring the privacy of inputs from users and is particularly well suited for FPGA implementations in the cloud where data analytics is frequently run. Secure Function Evaluation, such as that enabled by GC, is orders of magnitude slower than processing in the clear. We present our best implementation of GC on Amazon Web Services (AWS) that implements garbling on Amazon's FPGA enabled F1 instances. In this paper we present the largest problems garbled to date on FPGA instances, which includes problems that are represented by over four million gates. Our implementation speeds up garbling 20 times over software over a range of different circuit sizes.more » « less
An official website of the United States government

