NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Trimming Down Large Spiking Vision Transformers Via Heterogeneous Quantization Search

https://doi.org/10.1109/ASAP65064.2025.00016

Xu, Boxun; Song, Yufei; Li, Peng (July 2025, IEEE)

Free, publicly-accessible full text available July 28, 2026
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-constrained Pruning

https://doi.org/10.1145/3695053.3731063

Xu, Boxun; Yin, Yuxuan; Iyer, Vikram; Li, Peng (June 2025, International Symposium on Computer Architecture (ISCA))

Free, publicly-accessible full text available June 20, 2026
Backpropagation-based learning with local derivative approximation and memory replay in biologically plausible neural systems

https://doi.org/10.1016/j.neucom.2025.129804

Boone, Richard; Li, Peng (June 2025, Neurocomputing)

Free, publicly-accessible full text available June 1, 2026
HoSNNs: Adversarially-Robust Homeostatic Spiking Neural Networks with Adaptive Firing Thresholds

Geng, Hejia; Li, Peng (March 2025, Transactions on Machine Learning Research)

Free, publicly-accessible full text available March 1, 2026
Spiking Transformer Hardware Accelerators in 3D Integration

Xu, Boxun; Hwang, Junyoung; Vanna-iampikul, Pruek; Lim, Sung_Kyu; Li, Peng (October 2024, IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’24))

Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.
more » « less
Full Text Available
Systolic Array Acceleration of Spiking Neural Networks with Application-Independent Split-Time Temporal Coding

Lee, Jeong-Jun; Li, Peng (August 2024, 2024 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED))

Spiking Neural Networks (SNNs) are brain-inspired computing models with event-driven based low-power operations and unique temporal dynamics. However, spatial and temporal dynamics in SNNs pose a significant overhead in accelerating neural computations and limit the computing capabilities of neuromorphic accelerators. Especially, unstructured sparsity emergent in both space and time, i.e., across neurons and time points, and iterative computations across time points cause a primary bottleneck in data movement. In this work, we propose a novel technique and architecture that allow the exploitation of temporal information compression with structured sparsity and parallelism across time, and significantly improves data movement on a systolic array. We split a full range of temporal domain into several time windows (TWs) where a TW packs multiple time points, and encode the temporal information in each TW with Split-Time Temporal coding (STT) by limiting the number of spikes within a TW up to one. STT enables sparsification and structurization of irregular firing activities and dramatically reduces computational overhead while delivering competitive classification accuracy without a huge drop. To further improve the data reuse, we propose an Integration Through Time (ITT) technique that processes integration steps across different TWs in parallel with a systolic array. The proposed architecture with STT and ITT offers an application-independent solution for spike-based models across various types of layers and networks. The proposed architecture delivers 97X latency and 78X energy efficiency improvements on average over a conventional SNN baseline on different benchmarks.
more » « less
Full Text Available
Composing recurrent spiking neural networks using locally-recurrent motifs and risk-mitigating architectural optimization

https://doi.org/10.3389/fnins.2024.1412559

Zhang, Wenrui; Geng, Hejia; Li, Peng (June 2024, Frontiers in Neuroscience)

In neural circuits, recurrent connectivity plays a crucial role in network function and stability. However, existing recurrent spiking neural networks (RSNNs) are often constructed by random connections without optimization. While RSNNs can produce rich dynamics that are critical for memory formation and learning, systemic architectural optimization of RSNNs is still an open challenge. We aim to enable systematic design of large RSNNs via a new scalable RSNN architecture and automated architectural optimization. We compose RSNNs based on a layer architecture called Sparsely-Connected Recurrent Motif Layer (SC-ML) that consists of multiple small recurrent motifs wired together by sparse lateral connections. The small size of the motifs and sparse inter-motif connectivity leads to an RSNN architecture scalable to large network sizes. We further propose a method called Hybrid Risk-Mitigating Architectural Search (HRMAS) to systematically optimize the topology of the proposed recurrent motifs and SC-ML layer architecture. HRMAS is an alternating two-step optimization process by which we mitigate the risk of network instability and performance degradation caused by architectural change by introducing a novel biologically-inspired “self-repairing” mechanism through intrinsic plasticity. The intrinsic plasticity is introduced to the second step of each HRMAS iteration and acts as unsupervised fast self-adaptation to structural and synaptic weight modifications introduced by the first step during the RSNN architectural “evolution.” We demonstrate that the proposed automatic architecture optimization leads to significant performance gains over existing manually designed RSNNs: we achieve 96.44% on TI46-Alpha, 94.66% on N-TIDIGITS, 90.28% on DVS-Gesture, and 98.72% on N-MNIST. To the best of the authors' knowledge, this is the first work to perform systematic architecture optimization on RSNNs.
more » « less
Full Text Available
Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation

https://doi.org/10.1109/HPCA53966.2022.00031

Lee, Jeong-Jun; Zhang, Wenrui; Li, Peng (April 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Spiking Neural Networks (SNNs) are brain- inspired computing models incorporating unique temporal dynamics and event-driven processing. Rich dynamics in both space and time offer great challenges and opportunities for efficient processing of sparse spatiotemporal data compared with conventional artificial neural networks (ANNs). Specifically, the additional overheads for handling the added temporal dimension limit the computational capabilities of neuromorphic accelerators. Iterative processing at every time-point with sparse inputs in a temporally sequential manner not only degrades the utilization of the systolic array but also intensifies data movement.In this work, we propose a novel technique and architecture that significantly improve utilization and data movement while efficiently handling temporal sparsity of SNNs on systolic arrays. Unlike time-sequential processing in conventional SNN accelerators, we pack multiple time points into a single time window (TW) and process the computations induced by active synaptic inputs falling under several TWs in parallel, leading to the proposed parallel time batching. It allows weight reuse across multiple time points and enhances the utilization of the systolic array with reduced idling of processing elements, overcoming the irregularity of sparse firing activities. We optimize the granularity of time-domain processing, i.e., the TW size, which significantly impacts the data reuse and utilization. We further boost the utilization efficiency by simultaneously scheduling non-overlapping sparse spiking activities onto the array. The proposed architectures offer a unifying solution for general spiking neural networks with commonly exhibited temporal sparsity, a key challenge in hardware acceleration, delivering 248X energy-delay product (EDP) improvement on average compared to an SNN baseline for accelerating various networks. Compared to ANN based accelerators, our approach improves EDP by 47X on the CIFAR10 dataset.
more » « less
Full Text Available
Spiking Neural Networks with Laterally-Inhibited Self-Recurrent Units

Zheng, Wenrui; Li, Peng (July 2021, 2021 International Joint Conference on Neural Networks)
null (Ed.)
In biological brains, recurrent connections play a crucial role in cortical computation, modulation of network dynamics, and communication. However, in recurrent spiking neural networks (SNNs), recurrence is mostly constructed by random connections. How excitatory and inhibitory recurrent connections affect network responses and what kinds of connectivity benefit learning performance is still obscure. In this work, we propose a novel recurrent structure called the Laterally-Inhibited Self-Recurrent Unit (LISR), which consists of one excitatory neuron with a self-recurrent connection wired together with an inhibitory neuron through excitatory and inhibitory synapses. The self-recurrent connection of the excitatory neuron mitigates the information loss caused by the firing-and-resetting mechanism and maintains the long-term neuronal memory. The lateral inhibition from the inhibitory neuron to the corresponding excitatory neuron, on the one hand, adjusts the firing activity of the latter. On the other hand, it plays as a forget gate to clear the memory of the excitatory neuron. Based on speech and image datasets commonly used in neuromorphic computing, RSNNs based on the proposed LISR improve performance significantly by up to 9.26% over feedforward SNNs trained by a state-of-the-art backpropagation method with similar computational costs.
more » « less
Full Text Available
Systolic-Array Spiking Neural Accelerators with Dynamic Heterogeneous Voltage Regulation

Lee, Jeong-Jun; Chen, Jianhao; Zhang, Wenrui; Li, Peng (July 2021, The 2021 International Joint Conference on Neural Networks (IJCNN))
null (Ed.)
Spiking neural networks (SNNs) have emerged as a new generation of neural networks, presenting a brain-inspired event-driven model with advantages in spatiotemporal information processing. Due to the need for high power consumption of compute-intensive neural accelerators, adequate power delivery network (PDN) design is a key requirement to ensure power efficiency and integrity. However, PDN design for SNN accelerators has not been extensively studied despite its great potential benefit in energy efficiency. In this paper, we present the first study on dynamic heterogeneous voltage regulation (HVR) for spiking neural accelerators to maximize system energy efficiency while ensuring power integrity. We propose a novel sparse-workload-aware dynamic PDN control policy, which enables high energy efficiency of sparse spiking computation on a systolic array. By exploring sparse inputs and all-or-none nature of spiking computations for PDN control, we explore different types of PDNs to accelerate spiking convolutional neural networks (S-CNNs) trained with the dynamic vision sensor (DVS) gesture dataset. Furthermore, we demonstrate various power gating schemes to further optimize the proposed PDN architecture, which leads to a more than a three-fold reduction in total energy overhead for spiking neural computations on systolic array-based accelerators.
more » « less
Full Text Available

« Prev Next »

Search for: All records