Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 7, 2025
-
Field programmable gate array (FPGA) is widely used in the acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the trade-off between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. Here, we propose a ferroelectric field-effect transistor (FeFET)–based context-switching FPGA supporting dynamic reconfiguration to break this trade-off, enabling loading of arbitrary configuration without interrupting the active configuration execution. Leveraging the intrinsic structure and nonvolatility of FeFETs, compact FPGA primitives are proposed and experimentally verified. The evaluation results show our design shows a 63.0%/74.7% reduction in a look-up table (LUT)/connection block (CB) area and 82.7%/53.6% reduction in CB/switch box power consumption with a minimal penalty in the critical path delay (9.6%). Besides, our design yields significant time savings by 78.7 and 20.3% on average for context-switching and dynamic reconfiguration applications, respectively.more » « less
-
RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The non-ideal output from the RRAM macro, due to device and circuit non-idealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multi-level RRAM cells (MLC) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.more » « less
An official website of the United States government
