Garbled Circuit (GC) is a basic technique for practical secure computation. GC handles Boolean circuits; it consumes significant network bandwidth to transmit encoded gate truth tables, each of which scales with the computational security parameter κ. GC optimizations that reduce bandwidth consumption are valuable. It is natural to consider a generalization of Boolean two-input one-output gates (represented by 4-row one-column lookup tables, LUTs) to arbitrary N-row m-column LUTs. Known techniques for this do not scale, with na¨ıve size-O(Nmκ) garbled LUT being the most practical approach in many scenarios. Our novel garbling scheme – logrow – implements GC LUTs while sending only a logarithmic in N number of ciphertexts! Specifically, let n = ⌈log2 N⌉. We allow the GC parties to evaluate a LUT for (n−1)κ+nmκ+Nm bits of communication. logrow is compatible with modern GC advances, e.g. half gates and free XOR. Our work improves state-of-the-art GC handling of several interesting applications, such as privacypreserving machine learning, floating-point arithmetic, and DFA evaluation.
more »
« less
This content will become publicly available on May 1, 2026
An Efficient Multi-Output LUT Mapping Technique for Field-Programmable Gate Arrays
The use of multi-output look-up tables (LUTs) is a widely adopted approach in contemporary commercial field-programmable gate arrays (FPGAs). Larger LUT configurations (e.g., six-input LUTs) can be partitioned into smaller LUTs (e.g., two five-input LUTs, maintaining a total input count of less than six). This capability of generating a second output from a larger LUT is not only crucial for reducing logic cell count and enhancing the utilization efficiency of logic resources—thus conserving area—but also plays a key role in optimizing system-level delays and energy consumption. In this paper, we propose an efficient multi-output LUT mapping technique, incorporating several highly efficient technology mapping algorithms, which focus on optimizing the mapping from an interconnection perspective as alternatives to directly merging smaller LUTs. These algorithms include a side-fanout insertion algorithm, and a runtime multi-output cut generation algorithm. The proposed methods improve mapping efficiency and enhance performance. The benchmarking results demonstrate that the dual-output mapping algorithms achieve LUT area reductions of up to 35% and 6%, compared to the state-of-the-art ABC six-input, single-output LUT mapping technique and previous work focusing on dual-output LUT mapping techniques that optimize cut generation parameters. Moreover, FPGA system-level simulations also show that area, delay, and energy can all be optimized based on this multi-output mapping technique.
more »
« less
- Award ID(s):
- 2219753
- PAR ID:
- 10599101
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Electronics
- Volume:
- 14
- Issue:
- 9
- ISSN:
- 2079-9292
- Page Range / eLocation ID:
- 1782
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture.more » « less
-
The HSC-FPGA offers an intriguing feasible architecture for the next generation of configurable fabrics, which allows embracing the advantages of both CMOS and beyond-CMOS technologies without requiring significant modification to the routing structure, programming paradigms, and synthesis tool-chain of the commercial FPGAs. In the HSC-FPGA, the intrinsic characteristics of magnetic random access memory (MRAM)-look-up table (LUT) circuits are used to implement sequential logic, while combinational logic circuits are implemented by static random access memory (SRAM)-LUTs. Fabric-level simulation results for the developed HSC-FPGA show that it can achieve at least 18%, 70%, and 15% reduction in terms of area, standby power, and read power consumption, respectively, for various ISCAS-89 and ITC-99 benchmark circuits compared to conventional SRAM-based FPGAs. The power consumption values can be further decreased by the power-gating allowed by the non-volatility feature of MRAM-LUTs. Moreover, the benefits of increased heterogeneity for reconfigurable computing is extended along realizing probabilistic computing paradigms within a fabric, which is enabled by probabilistic spin logic devices. The cooperating strengths of technology-heterogeneity and heterogeneity in computing paradigm in the proposed HSC-FPGA are leveraged to develop energy-efficient and reliability-aware training and evaluation circuits for deep belief networks with memristive crossbar arrays and p-bit based probabilistic neurons.more » « less
-
Quantum algorithms will likely play a key role in future high-performance-computing (HPC) environments. These algorithms are typically expressed as quantum circuits composed of arbitrary gates or as unitary matrices. Executing these on physical devices, however, requires translation to device-compatible circuits, in a process called quantum compilation or circuit synthesis, since these devices support a limited number of native gates. Moreover, these devices typically have specific qubit topologies, which constrain how and where gates can be applied. Consequently, logical qubits in input circuits and unitaries may need to be mapped to and routed between physical qubits. Furthermore, current Noisy Intermediate-Scale Quantum (NISQ) devices present additional constraints. They are vulnerable to errors during gate application and their short decoherence times lead to qubits rapidly succumbing to accumulated noise and possibly corrupting computations. Therefore, circuits synthesized for NISQ devices need to minimize gates and execution times. The problem of synthesizing device-compatible circuits, while optimizing for low gate count and short execution times, can be shown to be computationally intractable using analytical methods. Therefore, interest has grown towards heuristics-based synthesis techniques, which are able to produce approximations of the desired algorithm, while optimizing depth and gate-count. In this work, we investigate using genetic algorithms (GA)—a proven gradient-free optimization technique based on natural selection—for circuit synthesis. In particular, we formulate the quantum synthesis problem as a multi-objective optimization (MOO) problem, with the objectives of minimizing the approximation error, number of multi-qubit gates, and circuit depth. We also employ fuzzy logic for runtime parameter adaptation of GA to enhance search efficiency and solution quality in our proposed method.more » « less
-
In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternative for implementing combinational logic as well as sequential logic versus previous spin-based LUT designs in the literature. Foremost, C-LUT eliminates the sense amplifier typically employed by using a differential polarity dual MTJ design, as opposed to a static reference resistance MTJ. This realizes a much wider read margin and the Monte Carlo simulation of the proposed fracturable C-LUT indicates no read and write errors in the presence of a variety of process variations scenarios involving MOS transistors as well as MTJs. Additionally, simulation results indicate that the proposed C-LUT reduces the standby power dissipation by $5.4$-fold compared to the SRAM-based LUT. Furthermore, the proposed SHE-MTJ-based C-LUT reduces the area by 1.3-fold and 2-fold compared to the SRAM-based LUT and the STT-MTJ-based C-LUT, respectively.more » « less
An official website of the United States government
