skip to main content


Title: SSGD: Sparsity-Promoting Stochastic Gradient Descent Algorithm for Unbiased Dnn Pruning
While deep neural networks (DNNs) have achieved state-of-the-art results in many fields, they are typically over-parameterized. Parameter redundancy, in turn, leads to inefficiency. Sparse signal recovery (SSR) techniques, on the other hand, find compact solutions to over-complete linear problems. Therefore, a logical step is to draw the connection between SSR and DNNs. In this paper, we explore the application of iterative reweighting methods popular in SSR to learning efficient DNNs. By efficient, we mean sparse networks that require less computation and storage than the original, dense network. We propose a reweighting framework to learn sparse connections within a given architecture without biasing the optimization process, by utilizing the affine scaling transformation strategy. The resulting algorithm, referred to as Sparsity-promoting Stochastic Gradient Descent (SSGD), has simple gradient-based updates which can be easily implemented in existing deep learning libraries. We demonstrate the sparsification ability of SSGD on image classification tasks and show that it outperforms existing methods on the MNIST and CIFAR-10 datasets.  more » « less
Award ID(s):
1838897
NSF-PAR ID:
10351923
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
5410 to 5414
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Optimization problems with group sparse regularization are ubiquitous in various popular downstream applications, such as feature selection and compression for Deep Neural Networks (DNNs). Nonetheless, the existing methods in the literature do not perform particularly well when such regularization is used in combination with a stochastic loss function. In particular, it is challenging to design a computationally efficient algorithm with a convergence guarantee and can compute group-sparse solutions. Recently, a half-space stochastic projected gradient ({\tt HSPG}) method was proposed that partly addressed these challenges. This paper presents a substantially enhanced version of {\tt HSPG} that we call~{\tt AdaHSPG+} that makes two noticeable advances. First, {\tt AdaHSPG+} is shown to have a stronger convergence result under significantly looser assumptions than those required by {\tt HSPG}. This improvement in convergence is achieved by integrating variance reduction techniques with a new adaptive strategy for iteratively predicting the support of a solution. Second, {\tt AdaHSPG+} requires significantly less parameter tuning compared to {\tt HSPG}, thus making it more practical and user-friendly. This advance is achieved by designing automatic and adaptive strategies for choosing the type of step employed at each iteration and for updating key hyperparameters. The numerical effectiveness of our proposed {\tt AdaHSPG+} algorithm is demonstrated on both convex and non-convex benchmark problems. The source code is available at \url{https://github.com/tianyic/adahspg}. 
    more » « less
  2. Existing adversarial algorithms for Deep Reinforcement Learning (DRL) have largely focused on identifying an optimal time to attack a DRL agent. However, little work has been explored in injecting efficient adversarial perturbations in DRL environments. We propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. ACADIA provides a set of efficient and robust perturbation-based adversarial attacks to disturb the DRL agent's decision-making based on novel combinations of techniques utilizing momentum, ADAM optimizer (i.e., Root Mean Square Propagation, or RMSProp), and initial randomization. These kinds of DRL attacks with novel integration of such techniques have not been studied in the existing Deep Neural Networks (DNNs) and DRL research. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without the state-of-the-art defenses in DRL (i.e., RADIAL and ATLA). Our results demonstrate that the proposed ACADIA outperforms existing gradient-based counterparts under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini & Wagner (CW) method with better performance under defenses of DRL. 
    more » « less
  3. In this paper, a novel way of deriving proportionate adaptive filters is proposed based on diversity measure minimization using the iterative reweighting techniques well-known in the sparse signal recovery (SSR) area. The resulting least mean square (LMS)-type and normalized LMS (NLMS)-type sparse adaptive filtering algorithms can incorporate various diversity measures that have proved effective in SSR. Furthermore, by setting the regularization coefficient of the diversity measure term to zero in the resulting algorithms, Sparsity promoting LMS (SLMS) and Sparsity promoting NLMS (SNLMS) are introduced, which exploit but do not strictly enforce the sparsity of the system response if it already exists. Moreover, unlike most existing proportionate algorithms that design the step-size control factors based on heuristics, our SSR-based framework leads to designing the factors in a more systematic way. Simulation results are presented to demonstrate the convergence behavior of the derived algorithms for systems with different sparsity levels. 
    more » « less
  4. null (Ed.)
    Deep neural networks (DNNs) are known for extracting useful information from large amounts of data. However, the representations learned in DNNs are typically hard to interpret, especially in dense layers. One crucial issue of the classical DNN model such as multilayer perceptron (MLP) is that neurons in the same layer of DNNs are conditionally independent of each other, which makes co-training and emergence of higher modularity difficult. In contrast to DNNs, biological neurons in mammalian brains display substantial dependency patterns. Specifically, biological neural networks encode representations by so-called neuronal assemblies: groups of neurons interconnected by strong synaptic interactions and sharing joint semantic content. The resulting population coding is essential for human cognitive and mnemonic processes. Here, we propose a novel Biologically Enhanced Artificial Neuronal assembly (BEAN) regularization 1 to model neuronal correlations and dependencies, inspired by cell assembly theory from neuroscience. Experimental results show that BEAN enables the formation of interpretable neuronal functional clusters and consequently promotes a sparse, memory/computation-efficient network without loss of model performance. Moreover, our few-shot learning experiments demonstrate that BEAN could also enhance the generalizability of the model when training samples are extremely limited. 
    more » « less
  5. Deep Neural Networks (DNNs) need to be both efficient and robust for practical uses. Quantization and structure simplification are promising ways to adapt DNNs to mobile devices, and adversarial training is one of the most successful methods to train robust DNNs. In this work, we aim to realize both advantages by applying a convergent relaxation quantization algorithm, i.e., Binary-Relax (BR), to an adversarially trained robust model, i.e. the ResNets Ensemble via Feynman-Kac Formalism (EnResNet). We discover that high-precision quantization, such as ternary (tnn) or 4-bit, produces sparse DNNs. However, this sparsity is unstructured under adversarial training. To solve the problems that adversarial training jeopardizes DNNs’ accuracy on clean images and break the structure of sparsity, we design a trade-off loss function that helps DNNs preserve natural accuracy and improve channel sparsity. With our newly designed trade-off loss function, we achieve both goals with no reduction of resistance under weak attacks and very minor reduction of resistance under strong adversarial attacks. Together with our model and algorithm selections and loss function design, we provide an integrated approach to produce robust DNNs with high efficiency and accuracy. Furthermore, we provide a missing benchmark on robustness of quantized models. 
    more » « less