skip to main content


Title: Secret Sharing MPC on FPGAs in the Datacenter
Multi-Party Computation (MPC) is a technique enabling data from several sources to be used in a secure computation revealing only the result while protecting the original data, facilitating shared utilization of data sets gathered by different entities. The presence of Field Programmable Gate Array (FPGA) hardware in datacenters can provide accelerated computing as well as low latency, high bandwidth communication that bolsters the performance of MPC and lowers the barrier to using MPC for many applications. In this work, we propose a Secret Sharing FPGA design based on the protocol described by Araki et al. We compare our hardware design to the original authors' software implementations of Secret Sharing and to work accelerating MPC protocols based on Garbled Circuits with FPGAs. Our conclusion is that Secret Sharing in the datacenter is competitive and when implemented on FPGA hardware was able to use at least 10x fewer computer resources than the original work using CPUs.  more » « less
Award ID(s):
1931714 1718135 1739000
NSF-PAR ID:
10195467
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Field Programmable Logic and Applications
ISSN:
1946-147X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-Party Computation (MPC) is an important technique used to enable computation over confidential data from several sources. The public cloud provides a unique opportunity to enable MPC in a low latency environment. Field Programmable Gate Array (FPGA) hardware adoption allows for both MPC acceleration and utilization of low latency, high bandwidth communication networks that substantially improve the performance of MPC applications. In this work, we show how designing arithmetic and Boolean Multi-Party Computation gates for FPGAs in a cloud provide improvements to current MPC offerings and ease their use in applications such as machine learning. We focus on the usage of Secret Sharing MPC first designed by Araki et al to design our FPGA MPC while also providing a comparison with those utilizing Garbled Circuits for MPC. We show that Secret Sharing MPC provides a better usage of cloud resources, specifically FPGA acceleration, than Garbled Circuits and is able to use at least a 10x less computer resources as compared to the original design using CPUs. 
    more » « less
  2. Canteaut, Anne ; Standaert, Francois-Xavier (Ed.)
    Secure multi-party computation (MPC) allows multiple par-ties to perform secure joint computations on their private inputs. To-day, applications for MPC are growing with thousands of parties wish-ing to build federated machine learning models or trusted setups for blockchains. To address such scenarios we propose a suite of novel MPC protocols that maximize throughput when run with large numbers of parties. In particular, our protocols have both communication and computation complexity that decrease with the number of parties. Our protocols build on prior protocols based on packed secret-sharing, introducing new techniques to build more efficient computation for general circuits. Specifically, we introduce a new approach for handling linear attacks that arise in protocols using packed secret-sharing and we propose a method for unpacking shared multiplication triples without increasingthe asymptotic costs. Compared with prior work, we avoid the log|C|overhead required when generically compiling circuits of size |C| for use in a SIMD computation, and we improve over folklore “committee-based” solutions by a factor of O(s), the statistical security parameter. In practice, our protocol is up to 10X faster than any known construction, under a reasonable set of parameters. 
    more » « less
  3. Canteaut, Anne ; Standaert, Francois-Xavier (Ed.)
    Secure multi-party computation (MPC) allows multiple par-ties to perform secure joint computations on their private inputs. To-day, applications for MPC are growing with thousands of parties wish-ing to build federated machine learning models or trusted setups for blockchains. To address such scenarios we propose a suite of novel MPC protocols that maximize throughput when run with large numbers of parties. In particular, our protocols have both communication and computation complexity that decrease with the number of parties. Our protocols build on prior protocols based on packed secret-sharing, introducing new techniques to build more efficient computation for general circuits. Specifically, we introduce a new approach for handling linear attacks that arise in protocols using packed secret-sharing and we propose a method for unpacking shared multiplication triples without increasingthe asymptotic costs. Compared with prior work, we avoid the log|C|overhead required when generically compiling circuits of size |C| for use in a SIMD computation, and we improve over folklore “committee-based” solutions by a factor of O(s), the statistical security parameter. In practice, our protocol is up to 10X faster than any known construction, under a reasonable set of parameters. 
    more » « less
  4. Applications today rely on cloud databases for storing and querying time-series data. While outsourcing storage is convenient, this data is often sensitive, making data breaches a serious concern. We present Waldo, a time-series database with rich functionality and strong security guarantees: Waldo supports multi-predicate filtering, protects data contents as well as query filter values and search access patterns, and provides malicious security in the 3-party honest-majority setting. In contrast, prior systems such as Timecrypt and Zeph have limited functionality and security: (1) these systems can only filter on time, and (2) they reveal the queried time interval to the server. Oblivious RAM (ORAM) and generic multiparty computation (MPC) are natural choices for eliminating leakage from prior work, but both of these are prohibitively expensive in our setting due to the number of roundtrips and bandwidth overhead, respectively. To minimize both, Waldo builds on top of function secret sharing, enabling Waldo to evaluate predicates non-interactively. We develop new techniques for applying function secret sharing to the encrypted database setting where there are malicious servers, secret inputs, and chained predicates. With 32-core machines, Waldo runs a query with 8 range predicates over 2 18 records in 3.03s, compared to 12.88s or an MPC baseline and 16.56s for an ORAM baseline. Compared to Waldo, the MPC baseline uses 9−82× more bandwidth between servers (for different numbers of records), while the ORAM baseline uses 20−152× more bandwidth between the client and server(s) (for different numbers of predicates). 
    more » « less
  5. Katzenbeisser, Stefan ; Schaumont, Patrick (Ed.)
    LowMC is a parameterizable block cipher developed for use in Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE). In these applications, linear operations are much less expensive in terms of resource utilization compared to the non-linear operations due to their low multiplicative complexity. In this work, we implemented two versions of LowMC -- unrolled and lightweight. Both implementations are realized using RTL VHDL. To the best of our knowledge, we report the first lightweight implementation of LowMC and the first implementation protected against side-channel analysis (SCA). For the SCA protection, we used a hybrid 2/3 shares Threshold Implementation (TI) approach, and for the evaluation, the Test Vector Leakage Assessment (TVLA) method, also known as the T-test. Our unprotected implementations show information leakage at 10K traces, and after protection, they could successfully pass the T-test for 1 million traces. The Xilinx Vivado is used for the synthesis, implementation, functional verification, timing analysis, and programming of the FPGA. The target FPGA family is Artix-7, selected due to its widespread use in multiple applications. Based on our results, the numbers of LUTs are 867 and 3,328 for the lightweight and the unrolled architecture with unrolling factor U = 16, respectively. It takes 14.21 μs for the lightweight architecture and 1.29 μs for the unrolled design with U = 16 to generate one 128-bit block of the ciphertext. The fully unrolled architecture beats the best previous implementation by Kales et al. in terms of the number of LUTs by a factor of 4.5. However, this advantage comes at the cost of having 2.9 higher latency. 
    more » « less