skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: MATCHA: a fast and energy-efficient accelerator for fully homomorphic encryption over the torus
Fully Homomorphic Encryption over the Torus (TFHE) allows arbitrary computations to happen directly on ciphertexts using homomorphic logic gates. However, each TFHE gate on state-of-the-art hardware platforms such as GPUs and FPGAs is extremely slow (> 0.2ms). Moreover, even the latest FPGA-based TFHE accelerator cannot achieve high energy efficiency, since it frequently invokes expensive double-precision floating point FFT and IFFT kernels. In this paper, we propose a fast and energy-efficient accelerator, MATCHA, to process TFHE gates. MATCHA supports aggressive bootstrapping key unrolling to accelerate TFHE gates without decryption errors by approximate multiplication-less integer FFTs and IFFTs, and a pipelined datapath. Compared to prior accelerators, MATCHA improves the TFHE gate processing throughput by 2.3x, and the throughput per Watt by 6.3x.  more » « less
Award ID(s):
1908992
NSF-PAR ID:
10353599
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM/IEEE Design Automation Conference
Page Range / eLocation ID:
235 to 240
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrappings. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomial multiplications, which can be time-consuming. Existing TFHE accelerators are not easily upgradable to support DTFHE operations due to limited datapaths, a lack of datapath bit-width reconfigurability, and power inefficiencies when processing FFT and inverse FFT (IFFT) kernels. Compared to prior TFHE accelerators, OFHE addresses these challenges by improving the DTFHE operation latency by 8.7\%, the DTFHE operation throughput by $57\%$, and the DTFHE operation throughput per Watt by $94\%$. 
    more » « less
  2. This paper describes an FPGA-based vector engine to accelerate the bootstrapping procedure of Fast Fully Homomorphic Encryption over the Torus (TFHE), a popular and high-performance fully homomorphic encryption scheme. Most TFHE bootstraping comprises many matrix-vector operations that are implemented using Torus polynomials, which are not efficiently implemented on today's standard arithmetic hardware. Our implementation achieves linear performance scaling with up to 16 vector lanes. Future work will switch to an FFT-based polynomial multiplication scheme and switch to larger FPGA parts to accommodate more vector lanes. 
    more » « less
  3. Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 17.5 × ~ 660×. 
    more » « less
  4. null (Ed.)
    Because of the lack of expertise, to gain benefits from their data, average users have to upload their private data to cloud servers they may not trust. Due to legal or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join deep neural network (DNN) training in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations by \textit{Brakerski-Gentry-Vaikuntanathan} (BGV)-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong private training latency of DNNs. In this paper, we propose, Glyph, an FHE-based technique to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulations (MACs). Glyph further applies transfer learning on DNN training to improve test accuracy and reduce the number of MACs between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains state-of-the-art accuracy, and reduces training latency by 69%~99% over prior FHE-based privacy-preserving techniques on encrypted datasets. 
    more » « less
  5. null (Ed.)
    High-fidelity single- and two-qubit gates are essential building blocks for a fault-tolerant quantum computer. While there has been much progress in suppressing single-qubit gate errors in superconducting qubit systems, two-qubit gates still suffer from error rates that are orders of magnitude higher. One limiting factor is the residual ZZ-interaction, which originates from a coupling between computational states and higher-energy states. While this interaction is usually viewed as a nuisance, here we experimentally demonstrate that it can be exploited to produce a universal set of fast single- and two-qubit entangling gates in a coupled transmon qubit system. To implement arbitrary single-qubit rotations, we design a new protocol called the two-axis gate that is based on a three-part composite pulse. It rotates a single qubit independently of the state of the other qubit despite the strong ZZ-coupling. We achieve single-qubit gate fidelities as high as 99.1% from randomized benchmarking measurements. We then demonstrate both a CZ gate and a CNOT gate. Because the system has a strong ZZ-interaction, a CZ gate can be achieved by letting the system freely evolve for a gate time tg=53.8 ns. To design the CNOT gate, we utilize an analytical microwave pulse shape based on the SWIPHT protocol for realizing fast, low-leakage gates. We obtain fidelities of 94.6% and 97.8% for the CNOT and CZ gates respectively from quantum progress tomography. 
    more » « less