skip to main content


Title: An Asynchronous FPGA THx2 Programmable Cell for Mitigating Side-Channel Attacks
One approach to mitigate side-channel attacks (SCAs) is to use clockless, asynchronous digital logic. To simplify this process, we propose a unique asynchronous FPGA based on a new THx2 programmable threshold cell. At a minimum, FPGAs require a programmable logic cell that can implement a complete set of logic so that it can be connected through the programmable interconnect network to form any digital system. To meet that criteria, we take advantage of CMOS transistors to implement a programmable THx2 threshold cell capable of performing both TH12 and TH22 asynchronous operations. Our complete sixteen transistor FPGA cell includes eight transistors to implement the base THx2 threshold operation, three transistors to switch between the TH12 and TH22 modes, and five memory cell transistors for mode storage. Our unique minimal transistor, programmable THx2 implementation enables formation of a complete set of asynchronous threshold gates and a complete set of standard combinational logic functions. The symmetric nature of the FPGA cell, in regard to the number of transistors (eight NMOS and eight PMOS), makes it ideal for a four row by four column transistor grid with a nearly square, easily array-able layout. It should be noted our THx2 cell is highly compact and suitable for implementing a clockless, asynchronous FPGA.  more » « less
Award ID(s):
1916722
NSF-PAR ID:
10280569
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)
Page Range / eLocation ID:
840 to 843
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A Muller C-Element is a digital circuit component used in most asynchronous circuits and systems. In Null Convention Logic, the Muller C-Elements make up the subset of THmn threshold gates where the threshold, m, and the input bit- width, n, are equal. This paper presents a new Efficient Muller C- Element implementation, EMC, that is especially suitable for Null Convention Logic applications with high input bit-widths, and it is much faster and smaller than standard implementations. It has a two-transistor switching delay that is independent of the input bit- width, n, and exhibits low noise and static power consumption. It is suitable for all Muller C-Element applications, especially those like Null Convention Logic register feedback circuits that can have large input bit-widths. To reduce static power consumption, it uses active resistors that are only turned “ON” when necessary. Two output stages are presented to implement the required Muller C- Element digital hysteresis: standard, semi-static cross-coupled inverter version, and differential sense-amplifier option. For large values of n, our circuit requires approximately one-half fewer transistors than combining smaller Null Convention Logic THmn semi-static threshold gates. We have successfully simulated up to n = 1024 at a 65 nm node. 
    more » « less
  2. Arbitrary-precision integer multiplication is the core kernel of many applications including scientific computing, cryptographic algorithms, etc. Existing acceleration of arbitrary-precision integer multiplication includes CPUs, GPUs, FPGAs, and ASICs. To leverage the hardware intrinsics low-bit function units (32/64-bit), arbitrary-precision integer multiplication can be calculated using Karatsuba decomposition, and Schoolbook decomposition by decomposing the two large operands into several small operands, generating a set of low-bit multiplications that can be processed either in a spatial or sequential manner on the low-bit function units, e.g., CPU vector instructions, GPU CUDA cores, FPGA digital signal processing (DSP) blocks. Among these accelerators, reconfigurable computing, e.g., FPGA accelerators are promised to provide both good energy efficiency and flexibility. We implement the state-of-the-art (SOTA) FPGA accelerator and compare it with the SOTA libraries on CPUs and GPUs. Surprisingly, in terms of energy efficiency, we find that the FPGA has the lowest energy efficiency, i.e., 0.29x of the CPU and 0.17x of the GPU with the same generation fabrication. Therefore, key questions arise: Where do the energy efficiency gains of CPUs and GPUs come from? Can reconfigurable computing do better? If can, how to achieve that? We first identify that the biggest energy efficiency gains of the CPUs and GPUs come from the dedicated vector units, i.e., vector instruction units in CPUs and CUDA cores in GPUs. FPGA uses DSPs and lookup tables (LUTs) to compose the needed computation, which incurs overhead when compared to using vector units directly. New reconfigurable computing, e.g., “FPGA+vector units” is a novel and feasible solution to improve energy efficiency. In this paper, we propose to map arbitrary-precision integer multiplication onto such a “FPGA+vector units” platform, i.e., AMD/Xilinx Versal ACAP architecture, a heterogeneous reconfigurable computing platform that features 400 AI engine tensor cores (AIE) running at 1 GHz, FPGA programmable logic (PL), and a general-purpose CPU in the system fabricated with the TSMC 7nm technology. Designing on Versal ACAP incurs several challenges and we propose AIM: Arbitrary-precision Integer Multiplication on Versal ACAP to automate and optimize the design. AIM accelerator is composed of AIEs, PL, and CPU. AIM framework includes analytical models to guide design space exploration and AIM automatic code generation to facilitate the system design and on-board design verification. We deploy the AIM framework on three different applications, including large integer multiplication (LIM), RSA, and Mandelbrot, on the AMD/Xilinx Versal ACAP VCK190 evaluation board. Our experimental results show that compared to existing accelerators, AIM achieves up to 12.6x, and 2.1x energy efficiency gains over the Intel Xeon Ice Lake 6346 CPU, and NVidia A5000 GPU respectively, which brings reconfigurable computing the most energy-efficient platform among CPUs and GPUs. 
    more » « less
  3. null (Ed.)
    With the deployment of artificial intelligent (AI) algorithms in a large variety of applications, there creates an increasing need for high-performance computing capabilities. As a result, different hardware platforms have been utilized for acceleration purposes. Among these hardware-based accelerators, the field-programmable gate arrays (FPGAs) have gained a lot of attention due to their re-programmable characteristics, which provide customized control logic and computing operators. For example, FPGAs have recently been adopted for on-demand cloud services by the leading cloud providers like Amazon and Microsoft, providing acceleration for various compute-intensive tasks. While the co-residency of multiple tenants on a cloud FPGA chip increases the efficiency of resource utilization, it also creates unique attack surfaces that are under-explored. In this paper, we exploit the vulnerability associated with the shared power distribution network on cloud FPGAs. We present a stealthy power attack that can be remotely launched by a malicious tenant, shutting down the entire chip and resulting in denial-of-service for other co-located benign tenants. Specifically, we propose stealthy-shutdown: a well-timed power attack that can be implemented in two steps: (1) an attacker monitors the realtime FPGA power-consumption detected by ring-oscillator-based voltage sensors, and (2) when capturing high power-consuming moments, i.e., the power consumption by other tenants is above a certain threshold, she/he injects a well-timed power load to shut down the FPGA system. Note that in the proposed attack strategy, the power load injected by the attacker only accounts for a small portion of the overall power consumption; therefore, such attack strategy remains stealthy to the cloud FPGA operator. We successfully implement and validate the proposed attack on three FPGA evaluation kits with running real-world applications. The proposed attack results in a stealthy-shutdown, demonstrating severe security concerns of co-tenancy on cloud FPGAs. We also offer two countermeasures that can mitigate such power attacks. 
    more » « less
  4. Abstract Multicellular systems, from bacterial biofilms to human organs, form interfaces (or boundaries) between different cell collectives to spatially organize versatile functions 1,2 . The evolution of sufficiently descriptive genetic toolkits probably triggered the explosion of complex multicellular life and patterning 3,4 . Synthetic biology aims to engineer multicellular systems for practical applications and to serve as a build-to-understand methodology for natural systems 5–8 . However, our ability to engineer multicellular interface patterns 2,9 is still very limited, as synthetic cell–cell adhesion toolkits and suitable patterning algorithms are underdeveloped 5,7,10–13 . Here we introduce a synthetic cell–cell adhesin logic with swarming bacteria and establish the precise engineering, predictive modelling and algorithmic programming of multicellular interface patterns. We demonstrate interface generation through a swarming adhesion mechanism, quantitative control over interface geometry and adhesion-mediated analogues of developmental organizers and morphogen fields. Using tiling and four-colour-mapping concepts, we identify algorithms for creating universal target patterns. This synthetic 4-bit adhesion logic advances practical applications such as human-readable molecular diagnostics, spatial fluid control on biological surfaces and programmable self-growing materials 5–8,14 . Notably, a minimal set of just four adhesins represents 4 bits of information that suffice to program universal tessellation patterns, implying a low critical threshold for the evolution and engineering of complex multicellular systems 3,5 . 
    more » « less
  5. Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate future analytical queries. How will this design change if we can always efficiently access only the desired set of columns? To address this question, we present a radically new approach to data transformation from rows to columns. We build upon recent advancements in embedded platforms with re-programmable logic to design native in-memory access on rows and columns. Our approach, termed Relational Memory (RM), relies on an FPGA-based accelerator that sits between the CPU and main memory and transparently transforms base data to any group of columns with minimal overhead at runtime. This design allows accessing any group of columns as if it already exists in memory. We implement and deploy RM in real hardware, and we show that we can access the desired columns up to 1.63× faster compared to a row-wise layout, while matching the performance of pure columnar access for low projectivity, and outperforming it by up to 2.23× as projectivity (and tuple reconstruction cost) increases. Overall, RM allows the CPU to access the optimal data layout, radically reducing unnecessary data movement without high data transformation costs, thus, simplifying software complexity and physical design, while accelerating query execution. 
    more » « less