ReSpike: residual frames-based hybrid spiking neural networks for efficient action recognition

Xiao, Shiting (ORCID:0009000275929466); Li, Yuhang (ORCID:0000000264447253); Kim, Youngeun (ORCID:0000000235427720); Lee, Donghyun (ORCID:0000000229278255); Panda, Priyadarshini (ORCID:0000000241676782)

doi:10.1088/2634-4386/adb070

Abstract Spiking Neural Networks (SNNs) have emerged as a compelling, energy-efficient alternative to traditional Artificial Neural Networks (ANNs) for static image tasks such as image classification and segmentation. However, in the more complex video classification domain, SNN-based methods fall considerably short of ANN-based benchmarks, due to the challenges in processing dense RGB frames. To bridge this gap, we propose ReSpike, a hybrid framework that synergizes the strengths of ANNs and SNNs to tackle action recognition tasks with high accuracy and low energy cost. By partitioning film clips into RGB image Key Frames, which primarily capture spatial information, and event-like Residual Frames, which emphasize temporal dynamics cues, ReSpike leverages ANN for processing spatial features and SNN for modeling temporal features. In addition, we propose a multi-scale cross-attention mechanism for effective feature fusion. Compared to state-of-the-art SNN baselines, our ReSpike hybrid architecture demonstrates significant performance improvements (e.g., >30% absolute accuracy improvement on both HMDB-51 and UCF-101 datasets). Additionally, ReSpike is the first SNN method capable of scaling to the large-scale benchmark Kinetics-400. Furthermore, ReSpike achieves comparable performance with prior ANN approaches while bringing better accuracy-energy tradeoff.

More Like this