NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CLAIRE: Composable Chiplet Libraries for AI Inference

https://doi.org/10.23919/DATE64628.2025.10992960

Nalla, Pragnya Sudershan; Haque, Emad; Liu, Yaotian; Sapatnekar, Sachin S; Zhang, Jeff; Chakrabarti, Chaitali; Cao, Yu (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
Plasticity in inhibitory networks improves pattern separation in early olfactory processing

https://doi.org/10.1038/s42003-025-07879-2

Joshi, Shruti; Haney, Seth; Wang, Zhenyu; Locatelli, Fernando; Lei, Hong; Cao, Yu; Smith, Brian; Bazhenov, Maxim (April 2025, Communications Biology)

Abstract Distinguishing between nectar and non-nectar odors is challenging for animals due to shared compounds and varying ratios in complex mixtures. Changes in nectar production throughout the day and over the animal’s lifetime add to the complexity. The honeybee olfactory system, containing fewer than 1000 principal neurons in the early olfactory relay, the antennal lobe (AL), must learn to associate diverse volatile blends with rewards. Previous studies identified plasticity in the AL circuits, but its role in odor learning remains poorly understood. Using a biophysical computational model, tuned by in vivo electrophysiological data, and live imaging of the honeybee’s AL, we explored the neural mechanisms of plasticity in the AL. Our findings revealed that when trained with a set of rewarded and unrewarded odors, the AL inhibitory network suppresses responses to shared chemical compounds while enhancing responses to distinct compounds. This results in improved pattern separation and a more concise neural code. Our calcium imaging data support these predictions. Analysis of a graph convolutional neural network performing an odor categorization task revealed a similar mechanism for contrast enhancement. Our study provides insights into how inhibitory plasticity in the early olfactory network reshapes the coding for efficient learning of complex odors.
more » « less
RA-BNN: Constructing a Robust & Accurate Binary Neural Network Using a Novel Network Growth Mechanism to Defend Against BFA

https://doi.org/10.1109/CCWC62904.2025.10903977

Rakin, Adnan Siraj; Yang, Li; Li, Jingtao; Yao, Fan; Chakrabarti, Chaitali; Cao, Yu; Seo, Jae-sun; Fan, Deliang (January 2025, IEEE)

Adversarial bit-flip attack (BFA), a type of powerful adversarial weight attack demonstrated in real computer systems has shown enormous success in compromising Deep Neural Network (DNN) performance with a minimal amount of model parameter perturbation through rowhammer-based computer main memory bit-flip. For the first time in this work, we demonstrate to defeat adversarial bit-flip attacks by developing a Robust and Accurate Binary Neural Network (RA-BNN) that adopts a complete BNN (i.e., weights and activations are both in binary). Prior works have demonstrated that binary or clustered weights could intrinsically improve a network's robustness against BFA, while in this work, we further reveal that binary activation could improve such robustness even better. However, with both aggressive binary weight and activation representations, the complete BNN suffers from poor clean (i.e., no attack) inference accuracy. To counter this, we propose an efficient two-stage complete BNN growing method for constructing simultaneously robust and accurate BNN, named RA-Growth. It selectively grows the channel size of each BNN layer based on trainable channel-wise binary mask learning with a Gumbel-Sigmoid function. The wider binary network (i.e., RA-BNN) has dual benefits: it can recover clean inference accuracy and significantly higher resistance against BFA. Our evaluation of the CIFAR-10 dataset shows that the proposed RA-BNN can improve the resistance to BFA by up to 100 x. On ImageNet, with a sufficiently large (e.g., 5,000) number of bit-flips, the baseline BNN accuracy drops to 4.3 % from 51.9 %, while our RA-BNN accuracy only drops to 37.1 % from 60.9 %, making it the best defense performance.
more » « less
Free, publicly-accessible full text available January 6, 2026
A 16nm Heterogeneous Accelerator for Energy-Efficient Sparse and Dense AI Computing

https://doi.org/10.1145/3665314.3670824

Raveendran_Nair, Gopikrishnan; Jiang, Fengyang; Zhang, Jeff; Cao, Yu (August 2024, ACM)

Full Text Available
HISIM: Analytical Performance Modeling and Design Space Exploration of 2.5D/3D Integration for AI Computing

https://doi.org/10.1109/TCAD.2025.3531348

Wang, Zhenyu; Nalla, Pragnya Sudershan; Sun, Jingbo; Goksoy, A Alper; Mandal, Sumit K; Seo, Jae-sun; Chhabria, Vidya A; Zhang, Jeff; Chakrabarti, Chaitali; Ogras, Umit Y; et al (January 2025, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Free, publicly-accessible full text available January 1, 2026
A 65-nm RRAM Compute-in-Memory Macro for Genome Processing

https://doi.org/10.1109/JSSC.2024.3396429

Zhang, Fan; Sridharan, Amitesh; He, Wangxin; Yeo, Injune; Liehr, Maximilian; Zhang, Wei; Cady, Nathaniel; Cao, Yu; Seo, Jae-Sun; Fan, Deliang (July 2024, IEEE Journal of Solid-State Circuits)

This work presents the first resistive random access memory (RRAM)-based compute-in-memory (CIM) macro design tailored for genome processing. We analyze and demonstrate two key types of genome processing applications using our developed CIM chip prototype: the state-of-the-art (SOTA) burrows–wheeler transform (BWT)-based DNA short- read alignment and alignment-free mRNA quantification. Our CIM macro is designed and optimized to support the major functions essential to these algorithms, e.g., parallel XNOR operations, count, addition, and parallel bit-wise and operations. The proposed CIM macro prototype is fabricated with monolithic integration of HfO2 RRAM and 65-nm CMOS, achieving 2.07 TOPS/W (tera-operations per second per watt) and 2.12 G suffixes/J (suffixes per joule) at 1.0 V, which is the most energy-efficient solution to date for genome processing.
more » « less
Full Text Available
A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment

https://doi.org/10.1109/ESSCIRC59616.2023.10268783

Zhang, Fan; He, Wangxin; Yeo, Injune; Liehr, Maximilian; Cady, Nathaniel; Cao, Yu; Seo, Jae-Sun; Fan, Deliang (September 2023, IEEE European Solid State Circuits Conference (ESSCIRC))
A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment

Zhang, Fan; He, Wangxin; Yeo, Injune; Lieh, Maximilian; Cady, Nathaniel; Cao, Yu; Seo, Jae-sun; Fan, Deliang (September 2023, Proceedings of ESSCIRC)

In genomic analysis, the major computation bottle- neck is the memory- and compute-intensive DNA short reads alignment due to memory-wall challenge. This work presents the first Resistive RAM (RRAM) based Compute-in-Memory (CIM) macro design for accelerating state-of-the-art BWT based genome sequencing alignment. Our design could support all the core instructions, i.e., XNOR based match, count, and addition, required by alignment algorithm. The proposed CIM macro implemented in integration of HfO2 RRAM and 65nm CMOS demonstrates the best energy efficiency to date with 2.07 TOPS/W and 2.12G suffixes/J at 1.0V.
more » « less
Full Text Available
Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGA

https://doi.org/10.1145/3583074

Suh, Han-Sok; Meng, Jian; Nguyen, Ty; Kumar, Vijay; Cao, Yu; Seo, Jae-Sun (June 2023, ACM Transactions on Reconfigurable Technology and Systems)

Convolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.
more » « less
Full Text Available
Learning Optimal Flows for Non-Equilibrium Importance Sampling

Cao, Yu; Vanden-Eijnden, Eric (December 2022, Advances in Neural Information Processing Systems 35 (NeurIPS 2022))

Full Text Available

« Prev Next »

Search for: All records