NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Closest Neighbors are Harmful for Lightweight Masked Auto-encoders

Meng, Jian; Hasssan, Ahmed; Yang, Li; Fan, Deliang; Shin, Jinwoo; Seo, Jae-sun (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Free, publicly-accessible full text available June 11, 2026
Quant-NeRF: Efficient End-to-End Quantization of Neural Radiance Fields with Low-Precision 3D Gaussian Representation

https://doi.org/10.1109/ICASSP49660.2025.10889510

Hasssan, Ahmed; Anupreetham, Anupreetham; Meng, Jian; Seo, Jae-sun (April 2025, IEEE)

Free, publicly-accessible full text available April 6, 2026
BBS: Bi-Directional Bit-Level Sparsity for Deep Learning Acceleration

https://doi.org/10.1109/MICRO61859.2024.00048

Chen, Yuzong; Meng, Jian; Seo, Jae-sun; Abdelfattah, Mohamed S (November 2024, IEEE)

Full Text Available
RA-BNN: Constructing a Robust & Accurate Binary Neural Network Using a Novel Network Growth Mechanism to Defend Against BFA

https://doi.org/10.1109/CCWC62904.2025.10903977

Rakin, Adnan Siraj; Yang, Li; Li, Jingtao; Yao, Fan; Chakrabarti, Chaitali; Cao, Yu; Seo, Jae-sun; Fan, Deliang (January 2025, IEEE)

Adversarial bit-flip attack (BFA), a type of powerful adversarial weight attack demonstrated in real computer systems has shown enormous success in compromising Deep Neural Network (DNN) performance with a minimal amount of model parameter perturbation through rowhammer-based computer main memory bit-flip. For the first time in this work, we demonstrate to defeat adversarial bit-flip attacks by developing a Robust and Accurate Binary Neural Network (RA-BNN) that adopts a complete BNN (i.e., weights and activations are both in binary). Prior works have demonstrated that binary or clustered weights could intrinsically improve a network's robustness against BFA, while in this work, we further reveal that binary activation could improve such robustness even better. However, with both aggressive binary weight and activation representations, the complete BNN suffers from poor clean (i.e., no attack) inference accuracy. To counter this, we propose an efficient two-stage complete BNN growing method for constructing simultaneously robust and accurate BNN, named RA-Growth. It selectively grows the channel size of each BNN layer based on trainable channel-wise binary mask learning with a Gumbel-Sigmoid function. The wider binary network (i.e., RA-BNN) has dual benefits: it can recover clean inference accuracy and significantly higher resistance against BFA. Our evaluation of the CIFAR-10 dataset shows that the proposed RA-BNN can improve the resistance to BFA by up to 100 x. On ImageNet, with a sufficiently large (e.g., 5,000) number of bit-flips, the baseline BNN accuracy drops to 4.3 % from 51.9 %, while our RA-BNN accuracy only drops to 37.1 % from 60.9 %, making it the best defense performance.
more » « less
Free, publicly-accessible full text available January 6, 2026
SpQuant-SNN: ultra-low precision membrane potential with sparse activations unlock the potential of on-device spiking neural networks applications

https://doi.org/10.3389/fnins.2024.1440000

Hasssan, Ahmed; Meng, Jian; Anupreetham, Anupreetham; Seo, Jae-sun (September 2024, Frontiers in Neuroscience)

Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present SpQuant-SNN, a fully-quantized spiking neural network with ultra-low precision weights, membrane potential, and high spatial-channel sparsity, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and >4.7× FLOPs reduction with ~1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline.
more » « less
Full Text Available
Spiking Neural Network with Learnable Threshold for Event-based Classification and Object Detection

https://doi.org/10.1109/IJCNN60899.2024.10650320

Hasssan, Ahmed; Meng, Jian; Seo, Jae-Sun (June 2024, IEEE Joint Conference on Neural Networks)

Full Text Available
IM-SNN: Memory-Efficient Spiking Neural Network with Low-Precision Membrane Potentials and Weights

Hasssan, Ahmed Hasssan; Meng, Jian; Anupreetham, Anupreetham; Seo, Jae-sun (August 2024, IEEE/ACM International Conference on Neuromorphic Systems (ICONS))

Full Text Available
HISIM: Analytical Performance Modeling and Design Space Exploration of 2.5D/3D Integration for AI Computing

https://doi.org/10.1109/TCAD.2025.3531348

Wang, Zhenyu; Nalla, Pragnya Sudershan; Sun, Jingbo; Goksoy, A Alper; Mandal, Sumit K; Seo, Jae-sun; Chhabria, Vidya A; Zhang, Jeff; Chakrabarti, Chaitali; Ogras, Umit Y; et al (January 2025, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Free, publicly-accessible full text available January 1, 2026
H4H: Hybrid Convolution-Transformer Architecture Search for NPU-CIM Heterogeneous Systems for AR/VR Applications

Zhao, Yiwei; Li, Ziyun; Khwa, Win-San; Sun, Xiaoyu; Zhang, Sai Qian; Sarwar, Syed Shakib; Stangherlin, Kleber Hugo; Lu, Yi-Lun; Gomez, Jorge Tomas; Seo, Jae-sun; et al (January 2025, Proceedings of the ASPDAC Asia and South Pacific Design Automation Conference)

Low-latency and low-power edge AI is crucial for Virtual Reality and Augmented Reality applications. Recent advances demonstrate that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve a superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can present system challenges for latency and energy efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and explore diverse execution schemas to efficiently execute these hybrid models. We introduce H4H-NAS, a two-stage Neural Architecture Search (NAS) framework to automate the design of efficient hybrid CNN/ViT models for heterogeneous edge systems featuring both NPU and CIM. We propose a two-phase incremental supernet training in our NAS framework to resolve gradient conflicts between sampled subnets caused by different types of blocks in a hybrid model search space. Our H4H-NAS approach is also powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN-ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet. Moreover, results from our algorithm/hardware co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing heterogeneous computing over baseline solutions. Overall, our framework guides the design of hybrid network architectures and system architectures for NPU+CIM heterogeneous systems.
more » « less
Free, publicly-accessible full text available January 20, 2026
SP-IMC: A Sparsity Aware In-Memory-Computing Macro in 28nm CMOS with Configurable Sparse Representation for Highly Sparse DNN Workloads

https://doi.org/10.1109/CICC60959.2024.10529009

Sridharan, Amitesh; Zhang, Fan; Seo, Jae-Sun; Fan, Deliang (April 2024, IEEE)

Deep neural networks (DNNs) have experienced unprecedented success in a variety of cognitive tasks due to which there has been a move to deploy DNNs in edge devices. DNNs are usually comprised of multiply-and-accumulate (MAC) operations and are both data and compute intensive. In-memory computing (IMC) methodologies have shown significant energy efficiency and throughput benefits for DNN workloads by reducing data movement and eliminating memory reads. Weight pruning in DNNs can further improve the energy/throughput of DNN hardware through reduced storage and compute. Recent IMC works [1]–[3], [6] have not explored such sparse compression techniques unlike ASIC counterparts to enable storage benefits and compute skipping. A recent work [4] attempted to exploit this by compressing weights using a binary map and a custom compression format. This is sub-optimal because the implementation requires a complex routing mechanism (butterfly routing), additional compute to decode compressed weights and has limited flexibility in supporting different sparse encodings. Fig. 1 illustrates our motivations and the challenges for implementing weight compression in digital IMC designs and the need for a new methodology to enable sparse compute directly on compressed weights. In this work, we present a novel sparsity-integrated IMC (SP-IMC) macro in 28nm CMOS which, for the first time, utilizes three popular sparse compression formats, i.e., coordinate representation (COO), run length encoding (RL) and N:m sparsity [7] all along the matrix column direction with tunable precisions. SP-IMC stores and directly processes the sparse compressed weights in the macro, achieving higher storage density, reduction in re-write operations to the macro and higher overall energy efficiency.
more » « less
Full Text Available

« Prev Next »

Search for: All records