The use of multi-output look-up tables (LUTs) is a widely adopted approach in contemporary commercial field-programmable gate arrays (FPGAs). Larger LUT configurations (e.g., six-input LUTs) can be partitioned into smaller LUTs (e.g., two five-input LUTs, maintaining a total input count of less than six). This capability of generating a second output from a larger LUT is not only crucial for reducing logic cell count and enhancing the utilization efficiency of logic resources—thus conserving area—but also plays a key role in optimizing system-level delays and energy consumption. In this paper, we propose an efficient multi-output LUT mapping technique, incorporating several highly efficient technology mapping algorithms, which focus on optimizing the mapping from an interconnection perspective as alternatives to directly merging smaller LUTs. These algorithms include a side-fanout insertion algorithm, and a runtime multi-output cut generation algorithm. The proposed methods improve mapping efficiency and enhance performance. The benchmarking results demonstrate that the dual-output mapping algorithms achieve LUT area reductions of up to 35% and 6%, compared to the state-of-the-art ABC six-input, single-output LUT mapping technique and previous work focusing on dual-output LUT mapping techniques that optimize cut generation parameters. Moreover, FPGA system-level simulations also show that area, delay, and energy can all be optimized based on this multi-output mapping technique.
more »
« less
Garbled Circuit Lookup Tables with Logarithmic Number of Ciphertexts
Garbled Circuit (GC) is a basic technique for practical secure computation. GC handles Boolean circuits; it consumes significant network bandwidth to transmit encoded gate truth tables, each of which scales with the computational security parameter κ. GC optimizations that reduce bandwidth consumption are valuable. It is natural to consider a generalization of Boolean two-input one-output gates (represented by 4-row one-column lookup tables, LUTs) to arbitrary N-row m-column LUTs. Known techniques for this do not scale, with na¨ıve size-O(Nmκ) garbled LUT being the most practical approach in many scenarios. Our novel garbling scheme – logrow – implements GC LUTs while sending only a logarithmic in N number of ciphertexts! Specifically, let n = ⌈log2 N⌉. We allow the GC parties to evaluate a LUT for (n−1)κ+nmκ+Nm bits of communication. logrow is compatible with modern GC advances, e.g. half gates and free XOR. Our work improves state-of-the-art GC handling of several interesting applications, such as privacypreserving machine learning, floating-point arithmetic, and DFA evaluation.
more »
« less
- PAR ID:
- 10526049
- Publisher / Repository:
- Springer
- Date Published:
- ISBN:
- 978-3-031-58740-5
- Subject(s) / Keyword(s):
- Multiparty Computation, Garbled Circuits, Lookup Tables
- Format(s):
- Medium: X
- Location:
- Zurich, Sqitzerland
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Anna Lysyanskaya, Helena Handschuh (Ed.)We introduce tri-state circuits (TSCs). TSCs form a natural model of computation that, to our knowledge, has not been considered by theorists. The model captures a surprising combination of simplicity and power. TSCs are simple in that they allow only three wire values (0, 1, and undefined – Z) and three types of fan-in two gates; they are powerful in that their statically placed gates fire (execute) eagerly as their inputs become defined, implying orders of execution that depend on input. This behavior is sufficient to efficiently evaluate RAM programs. We construct a TSC that emulates T steps of any RAM program and that has only O(T · log3 T · log log T) gates. Contrast this with the reduction from RAM to Boolean circuits, where the best approach scans all of memory on each access, incurring quadratic cost. We connect TSCs with Garbled Circuits (GC). TSCs capture the power of garbling far better than Boolean Circuits, offering a more expressive model of computation that leaves per-gate cost essentially unchanged. As an important application, we construct authenticated Garbled RAM (GRAM), enabling constant-round maliciously-secure 2PC of RAM programs. Let λ denote the security parameter.We extend authenticated garbling to TSCs; by simply plugging in our TSC-based RAM, we obtain authenticated GRAM running at cost O(T · log3 T · log log T · λ), outperforming all prior work, including prior semi-honest GRAM. We also give semi-honest garbling of TSCs from a one-way function (OWF). This yields OWF-based GRAM at cost O(T ·log3 T ·log log T ·λ), outperforming the best prior OWF-based GRAM by more than factor λ.more » « less
-
In this work, we propose LUT-Lock, a novel Look-Up-Table-based netlist obfuscation algorithm, for protecting the intellectual property that is mapped to an FPGA bitstream or an ASIC netlist. We, first, illustrate the effectiveness of several key features that make the LUT-based obfuscation more resilient against SAT attacks and then we embed the proposed key features into our proposed LUT-Lock algorithm. We illustrate that LUT-Lock maximizes the resiliency of the LUT-based obfuscation against SAT attacks by forcing a near exponential increase in the execution time of a SAT solver with respect to the number of obfuscated gates. Hence, by adopting LUT-Lock algorithm, SAT attack execution time could be made unreasonably long by increasing the number of utilized LUTs.more » « less
-
Garbled Circuits (GC) is a technique for ensuring the privacy of inputs from users and is particularly well suited for FPGA implementations in the cloud where data analytics is frequently run. Secure Function Evaluation, such as that enabled by GC, is orders of magnitude slower than processing in the clear. We present our best implementation of GC on Amazon Web Services (AWS) that implements garbling on Amazon's FPGA enabled F1 instances. In this paper we present the largest problems garbled to date on FPGA instances, which includes problems that are represented by over four million gates. Our implementation speeds up garbling 20 times over software over a range of different circuit sizes.more » « less
-
For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture.more » « less
An official website of the United States government

