skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: An FPGA-based Programmable Vector Engine for Fast Fully Homomorphic Encryption over the Torus
This paper describes an FPGA-based vector engine to accelerate the bootstrapping procedure of Fast Fully Homomorphic Encryption over the Torus (TFHE), a popular and high-performance fully homomorphic encryption scheme. Most TFHE bootstraping comprises many matrix-vector operations that are implemented using Torus polynomials, which are not efficiently implemented on today's standard arithmetic hardware. Our implementation achieves linear performance scaling with up to 16 vector lanes. Future work will switch to an FFT-based polynomial multiplication scheme and switch to larger FPGA parts to accommodate more vector lanes.  more » « less
Award ID(s):
1763795 1545097 1528181
NSF-PAR ID:
10282639
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
SPSL: Secure and Private Systems for Machine Learning (ISCA Workshop)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fully Homomorphic Encryption over the Torus (TFHE) allows arbitrary computations to happen directly on ciphertexts using homomorphic logic gates. However, each TFHE gate on state-of-the-art hardware platforms such as GPUs and FPGAs is extremely slow (> 0.2ms). Moreover, even the latest FPGA-based TFHE accelerator cannot achieve high energy efficiency, since it frequently invokes expensive double-precision floating point FFT and IFFT kernels. In this paper, we propose a fast and energy-efficient accelerator, MATCHA, to process TFHE gates. MATCHA supports aggressive bootstrapping key unrolling to accelerate TFHE gates without decryption errors by approximate multiplication-less integer FFTs and IFFTs, and a pipelined datapath. Compared to prior accelerators, MATCHA improves the TFHE gate processing throughput by 2.3x, and the throughput per Watt by 6.3x. 
    more » « less
  2. null (Ed.)
    Because of the lack of expertise, to gain benefits from their data, average users have to upload their private data to cloud servers they may not trust. Due to legal or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join deep neural network (DNN) training in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations by \textit{Brakerski-Gentry-Vaikuntanathan} (BGV)-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong private training latency of DNNs. In this paper, we propose, Glyph, an FHE-based technique to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulations (MACs). Glyph further applies transfer learning on DNN training to improve test accuracy and reduce the number of MACs between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains state-of-the-art accuracy, and reduces training latency by 69%~99% over prior FHE-based privacy-preserving techniques on encrypted datasets. 
    more » « less
  3. This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrappings. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomial multiplications, which can be time-consuming. Existing TFHE accelerators are not easily upgradable to support DTFHE operations due to limited datapaths, a lack of datapath bit-width reconfigurability, and power inefficiencies when processing FFT and inverse FFT (IFFT) kernels. Compared to prior TFHE accelerators, OFHE addresses these challenges by improving the DTFHE operation latency by 8.7\%, the DTFHE operation throughput by $57\%$, and the DTFHE operation throughput per Watt by $94\%$. 
    more » « less
  4. We consider an architecture of confidential cloud-based control synthesis based on Homomorphic Encryption (HE). Our study is motivated by the recent surge of data-driven control such as deep reinforcement learning, whose heavy computational requirements often necessitate an outsourcing to the third party server. To achieve more flexibility than Partially Homomorphic Encryption (PHE) and less computational overhead than Fully Homomorphic Encryption (FHE), we consider a Reinforcement Learning (RL) architecture over Leveled Homomorphic Encryption (LHE). We first show that the impact of the encryption noise under the Cheon-Kim-Kim-Song (CKKS) encryption scheme on the convergence of the model-based tabular Value Iteration (VI) can be analytically bounded. We also consider secure implementations of TD(0), SARSA(0) and Z-learning algorithms over the CKKS scheme, where we numerically demonstrate that the effects of the encryption noise on these algorithms are also minimal. 
    more » « less
  5. Homomorphic encryption is a promising technology for enabling various privacy-preserving applications such as secure biomarker search. However, current implementations are not practical due to large performance overheads. A homomorphic encryption scheme has recently been proposed that allows bitwise comparison without the computationally-intensive multiplication and bootstrapping operations. Even so, this scheme still suffers from memory-bound performance bottleneck due to large ciphertext expansion. In this work, we propose HEGA, a near-data processing architecture that leverages this scheme with 3D-stacked memory to accelerate privacy-preserving biomarker search. We observe that homomorphic encryption-based search, like other emerging applications, can greatly benefit from the large throughput, capacity, and energy savings of 3D-stacked memory-based near-data processing architectures. Our near-data acceleration solution can speed up biomarker search by 6.3× with 5.7× energy savings compared to an 8-core Intel Xeon processor. 
    more » « less