Building accurate and efficient deep neural network (DNN) models for intelligent sensing systems to process data locally is essential. Spiking neural networks (SNNs) have gained significant popularity in recent years because they are more biological-plausible and energy-efficient than DNNs. However, SNNs usually have lower accuracy than DNNs. In this paper, we propose to use SNNs for image sensing applications. Moreover, we introduce the DNN-SNN knowledge distillation algorithm to reduce the accuracy gap between DNNs and SNNs. Our DNNSNN knowledge distillation improves the accuracy of an SNN by transferring knowledge between a DNN and an SNN. To better transfer the knowledge, our algorithm creates two learning paths from a DNN to an SNN. One path is between the output layer and another path is between the intermediate layer. DNNs use real numbers to propagate information between neurons while SNNs use 1-bit spikes. To empower the communication between DNNs and SNNs, we utilize a decoder to decode spikes into real numbers. Also, our algorithm creates a learning path from an SNN to a DNN. This learning path better adapts the DNN to the SNN by allowing the DNN to learn the knowledge from the SNN. Our SNN models are deployed on Loihi, which is a specialized chip for SNN models. On the MNIST dataset, our SNN models trained by the DNN-SNN knowledge distillation achieve better accuracy than the SNN models on GPU trained by other training algorithms with much lower energy consumption per image.
more »
« less
This content will become publicly available on December 11, 2025
Addressing Spectral Bias of Deep Neural Networks by Multi-Grade Deep Learning
Deep neural networks (DNNs) have showcased their remarkable precision in approximating smooth functions. However, they suffer from the spectral bias, wherein DNNs typically exhibit a tendency to prioritize the learning of lower-frequency components of a function, struggling to effectively capture its high-frequency features. This paper is to address this issue. Notice that a function having only low frequency components may be well-represented by a shallow neural network (SNN), a network having only a few layers. By observing that composition of low frequency functions can effectively approximate a high-frequency function, we propose to learn a function containing high-frequency components by composing several SNNs, each of which learns certain low-frequency information from the given data. We implement the proposed idea by exploiting the multi-grade deep learning (MGDL) model, a recently introduced model that trains a DNN incrementally, grade by grade, a current grade learning from the residue of the previous grade only an SNN (with trainable parameters) composed with the SNNs (with fixed parameters) trained in the preceding grades as features. We apply MGDL to synthetic, manifold, colored images, and MNIST datasets, all characterized by presence of high-frequency features. Our study reveals that MGDL excels at representing functions containing high-frequency information. Specifically, the neural networks learned in each grade adeptly capture some low-frequency information, allowing their compositions with SNNs learned in the previous grades effectively representing the high-frequency features. Our experimental results underscore the efficacy of MGDL in addressing the spectral bias inherent in DNNs. By leveraging MGDL, we offer insights into overcoming spectral bias limitation of DNNs, thereby enhancing the performance and applicability of deep learning models in tasks requiring the representation of high-frequency information. This study confirms that the proposed method offers a promising solution to address the spectral bias of DNNs. The code is available on GitHub: Addressing Spectral Bias via MGDL.
more »
« less
- Award ID(s):
- 2208386
- PAR ID:
- 10627171
- Publisher / Repository:
- 38th Conference on Neural Information Processing Systems (NeurIPS 2024).
- Date Published:
- Page Range / eLocation ID:
- 1-25
- Format(s):
- Medium: X
- Location:
- 38th Conference on Neural Information Processing Systems (NeurIPS 2024).
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We studied the use of deep neural networks (DNNs) in the numerical solution of the oscillatory Fredholm integral equation of the second kind. It is known that the solution of the equation exhibits certain oscillatory behaviors due to the oscillation of the kernel. It was pointed out recently that standard DNNs favor low frequency functions, and as a result, they often produce poor approximation for functions containing high frequency components. We addressed this issue in this study. We first developed a numerical method for solving the equation with DNNs as an approximate solution by designing a numerical quadrature that tailors to computing oscillatory integrals involving DNNs. We proved that the error of the DNN approximate solution of the equation is bounded by the training loss and the quadrature error. We then proposed a multigrade deep learning (MGDL) model to overcome the spectral bias issue of neural networks. Numerical experiments demonstrate that the MGDL model is effective in extracting multiscale information of the oscillatory solution and overcoming the spectral bias issue from which a standard DNN model suffers.more » « less
-
Abstract Deep learning requires solving a nonconvex optimization problem of a large size to learn a deep neural network (DNN). The current deep learning model is of asingle-grade, that is, it trains a DNN end-to-end, by solving a single nonconvex optimization problem. When the layer number of the neural network is large, it is computationally challenging to carry out such a task efficiently. The complexity of the task comes from learning all weight matrices and bias vectors from one single nonconvex optimization problem of a large size. Inspired by the human education process which arranges learning in grades, we propose a multi-grade learning model: instead of solving one single optimization problem of a large size, we successively solve a number of optimization problems of small sizes, which are organized in grades, to learn a shallow neural network (a network having a few hidden layers) for each grade. Specifically, the current grade is to learn the leftover from the previous grade. In each of the grades, we learn a shallow neural network stacked on the top of the neural network, learned in the previous grades, whose parameters remain unchanged in training of the current and future grades. By dividing the task of learning a DDN into learning several shallow neural networks, one can alleviate the severity of the nonconvexity of the original optimization problem of a large size. When all grades of the learning are completed, the final neural network learned is astair-shapeneural network, which is thesuperpositionof networks learned from all grades. Such a model enables us to learn a DDN much more effectively and efficiently. Moreover, multi-grade learning naturally leads to adaptive learning. We prove that in the context of function approximation if the neural network generated by a new grade is nontrivial, the optimal error of a new grade is strictly reduced from the optimal error of the previous grade. Furthermore, we provide numerical examples which confirm that the proposed multi-grade model outperforms significantly the standard single-grade model and is much more robust to noise than the single-grade model. They include three proof-of-concept examples, classification on two benchmark data sets MNIST and Fashion MNIST with two noise rates, which is to find classifiers, functions of 784 dimensions, and as well as numerical solutions of the one-dimensional Helmholtz equation.more » « less
-
We present a new back propagation based training algorithm for discrete-time spiking neural networks (SNN). Inspired by recent deep learning algorithms on binarized neural networks, binary activation with a straight-through gradient estimator is used to model the leaky integrate-fire spiking neuron, overcoming the difficulty in training SNNs using back propagation. Two SNN training algorithms are proposed: (1) SNN with discontinuous integration, which is suitable for rate-coded input spikes, and (2) SNN with continuous integration, which is more general and can handle input spikes with temporal information. Neuromorphic hardware designed in 28nm CMOS exploits the spike sparsity and demonstrates high classification accuracy (>98% on MNIST) and low energy (51.4–773 nJ/image).more » « less
-
Spiking neural networks (SNNs) are quickly gaining traction as a viable alternative to deep neural networks (DNNs). Compared to DNNs, SNNs are computationally more powerful and energy efficient. The design metrics (synaptic weights, membrane threshold, etc.) chosen for such SNN architectures are often proprietary and constitute confidential intellectual property (IP). Our study indicates that SNN architectures implemented using conventional analog neurons are susceptible to side channel attack (SCA). Unlike the conventional SCAs that are aimed to leak private keys from cryptographic implementations, SCANN (SCA̲ of spiking n̲eural n̲etworks) can reveal the sensitive IP implemented within the SNN through the power side channel. We demonstrate eight unique SCANN attacks by taking a common analog neuron (axon hillock neuron) as the test case. We chose this particular model since it is biologically plausible and is hence a good fit for SNNs. Simulation results indicate that different synaptic weights, neurons/layer, neuron membrane thresholds, and neuron capacitor sizes (which are the building blocks of SNN) yield distinct power and spike timing signatures, making them vulnerable to SCA. We show that an adversary can use templates (using foundry-calibrated simulations or fabricating known design parameters in test chips) and analysis to identify the specifications of the implemented SNN.more » « less
An official website of the United States government
