skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Hu, Xiaobo"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 1, 2025
  2. Deep random forest (DRF), which combines deep learning and random forest, exhibits comparable accuracy, interpretability, low memory and computational overhead to deep neural networks (DNNs) in edge intelligence tasks. However, efficient DRF accelerator is lagging behind its DNN counterparts. The key to DRF acceleration lies in realizing the branch-split operation at decision nodes. In this work, we propose implementing DRF through associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell performs energy-efficient branch-split operations by storing decision boundaries as analog polarization states in FeFETs. The DRF accelerator architecture and its model mapping to ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM DRF and its robustness against FeFET device non-idealities are validated in experiments and simulations. Evaluations show that the FeFET ACAM DRF accelerator achieves ∼106×/10× and ∼106×/2.5× improvements in energy and latency, respectively, compared to other DRF hardware implementations on state-of-the-art CPU/ReRAM.

     
    more » « less
    Free, publicly-accessible full text available June 7, 2025
  3. Field programmable gate array (FPGA) is widely used in the acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the trade-off between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. Here, we propose a ferroelectric field-effect transistor (FeFET)–based context-switching FPGA supporting dynamic reconfiguration to break this trade-off, enabling loading of arbitrary configuration without interrupting the active configuration execution. Leveraging the intrinsic structure and nonvolatility of FeFETs, compact FPGA primitives are proposed and experimentally verified. The evaluation results show our design shows a 63.0%/74.7% reduction in a look-up table (LUT)/connection block (CB) area and 82.7%/53.6% reduction in CB/switch box power consumption with a minimal penalty in the critical path delay (9.6%). Besides, our design yields significant time savings by 78.7 and 20.3% on average for context-switching and dynamic reconfiguration applications, respectively.

     
    more » « less
    Free, publicly-accessible full text available January 19, 2025
  4. Abstract Hyperdimensional computing (HDC) is a brain-inspired computational framework that relies on long hypervectors (HVs) for learning. In HDC, computational operations consist of simple manipulations of hypervectors and can be incredibly memory-intensive. In-memory computing (IMC) can greatly improve the efficiency of HDC by reducing data movement in the system. Most existing IMC implementations of HDC are limited to binary precision which inhibits the ability to match software-equivalent accuracies. Moreover, memory arrays used in IMC are restricted in size and cannot immediately support the direct associative search of large binary HVs (a ubiquitous operation, often over 10,000+ dimensions) required to achieve acceptable accuracies. We present a multi-bit IMC system for HDC using ferroelectric field-effect transistors (FeFETs) that simultaneously achieves software-equivalent-accuracies, reduces the dimensionality of the HDC system, and improves energy consumption by 826x and latency by 30x when compared to a GPU baseline. Furthermore, for the first time, we experimentally demonstrate multi-bit, array-level content-addressable memory (CAM) operations with FeFETs. We also present a scalable and efficient architecture based on CAMs which supports the associative search of large HVs. Furthermore, we study the effects of device, circuit, and architectural-level non-idealities on application-level accuracy with HDC. 
    more » « less