NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Closest Neighbors are Harmful for Lightweight Masked Auto-encoders

Meng, Jian; Hasssan, Ahmed; Yang, Li; Fan, Deliang; Shin, Jinwoo; Seo, Jae-sun (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Free, publicly-accessible full text available June 11, 2026
Quant-NeRF: Efficient End-to-End Quantization of Neural Radiance Fields with Low-Precision 3D Gaussian Representation

https://doi.org/10.1109/ICASSP49660.2025.10889510

Hasssan, Ahmed; Anupreetham, Anupreetham; Meng, Jian; Seo, Jae-sun (April 2025, IEEE)

Free, publicly-accessible full text available April 6, 2026
BBS: Bi-Directional Bit-Level Sparsity for Deep Learning Acceleration

https://doi.org/10.1109/MICRO61859.2024.00048

Chen, Yuzong; Meng, Jian; Seo, Jae-sun; Abdelfattah, Mohamed S (November 2024, IEEE)

Full Text Available
SpQuant-SNN: ultra-low precision membrane potential with sparse activations unlock the potential of on-device spiking neural networks applications

https://doi.org/10.3389/fnins.2024.1440000

Hasssan, Ahmed; Meng, Jian; Anupreetham, Anupreetham; Seo, Jae-sun (September 2024, Frontiers in Neuroscience)

Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present SpQuant-SNN, a fully-quantized spiking neural network with ultra-low precision weights, membrane potential, and high spatial-channel sparsity, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and >4.7× FLOPs reduction with ~1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline.
more » « less
Full Text Available
Spiking Neural Network with Learnable Threshold for Event-based Classification and Object Detection

https://doi.org/10.1109/IJCNN60899.2024.10650320

Hasssan, Ahmed; Meng, Jian; Seo, Jae-Sun (June 2024, IEEE Joint Conference on Neural Networks)

Full Text Available
IM-SNN: Memory-Efficient Spiking Neural Network with Low-Precision Membrane Potentials and Weights

Hasssan, Ahmed Hasssan; Meng, Jian; Anupreetham, Anupreetham; Seo, Jae-sun (August 2024, IEEE/ACM International Conference on Neuromorphic Systems (ICONS))

Full Text Available
Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training

Meng, Jian; Yang, Li; Lee, Kyungmin; Shin, Jinwoo; Fan, Deliang; Seo, Jae-sun (December 2023, Thirty-seventh Conference on Neural Information Processing Systems)

Contrastive learning (CL) has been widely investigated with various learning mech- anisms and achieves strong capability in learning representations of data in a self-supervised manner using unlabeled data. A common fashion of contrastive learning on this line is employing large-sized encoders to achieve comparable performance as the supervised learning counterpart. Despite the success of the labelless training, current contrastive learning algorithms failed to achieve good performance with lightweight (compact) models, e.g., MobileNet, while the re- quirements of the heavy encoders impede the energy-efficient computation, espe- cially for resource-constrained AI applications. Motivated by this, we propose a new self-supervised CL scheme, named SACL-XD, consisting of two technical components, Slimmed Asymmetrical Contrastive Learning (SACL) and Cross- Distillation (XD), which collectively enable efficient CL with compact models. While relevant prior works employed a strong pre-trained model as the teacher of unsupervised knowledge distillation to a lightweight encoder, our proposed method trains CL models from scratch and outperforms them even without such an expensive requirement. Compared to the SoTA lightweight CL training (dis- tillation) algorithms, SACL-XD achieves 1.79% ImageNet-1K accuracy improve- ment on MobileNet-V3 with 64⇥ training FLOPs reduction. Code is available at https://github.com/mengjian0502/SACL-XD.
more » « less
Full Text Available
Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGA

https://doi.org/10.1145/3583074

Suh, Han-Sok; Meng, Jian; Nguyen, Ty; Kumar, Vijay; Cao, Yu; Seo, Jae-Sun (June 2023, ACM Transactions on Reconfigurable Technology and Systems)

Convolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.
more » « less
Full Text Available
Dot-product computation and logistic regression with 2D hexagonal-Boron Nitride (h-BN) memristor arrays

https://doi.org/10.1088/2053-1583/acdfe1

Afshari, Sahra; Radhakrishnan, Sritharini; Xie, Jing; Musisi-Nkambwe, Mirembe; Meng, Jian; He, Wangxin; Seo, Jae-sun; Sanchez Esqueda, Ivan (July 2023, 2D Materials)

Abstract This work reports on the hardware implementation of analog dot-product operation on arrays of 2D hexagonal boron nitride (h-BN) memristors. This extends beyond previous work that studied isolated device characteristics towards the application of analog neural network accelerators based on 2D memristor arrays. The wafer-level fabrication of the memristor arrays is enabled by large-area transfer of CVD-grown few-layer (8 layers) h-BN films. Individual devices achieve an on/off ratio of >10, low voltage operation (~0.5 Vset/Vreset), good endurance (>6,000 programming steps), and good retention (>104 s). The dot-product operation shows excellent linearity and repeatability, with low read energy consumption (~200 aJ to 20 fJ per operation), with minimal error and deviation over various measurement cycles. Moreover, we present the implementation of a stochastic logistic regression algorithm in 2D h-BN memristor hardware for the classification of noisy images. The promising resistive switching characteristics, performance of dot-product computation, and successful demonstration of logistic regression in h-BN memristors signify an important step towards the integration of 2D materials for next-generation neuromorphic computing systems.
more » « less
Full Text Available
PRIVE: Efficient RRAM Programming with Chip Verification for RRAM-based In-Memory Computing Acceleration

https://doi.org/10.23919/DATE56975.2023.10137266

He, Wangxin; Meng, Jian; Gonugondla, Sujan Kumar; Yu, Shimeng; Shanbhag, Naresh R; Seo, Jae-sun (April 2023, Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available

« Prev Next »

Search for: All records