skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Nonconvex Regularization for Network Slimming: Compressing CNNs Even More
In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an ℓ1 penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the ℓ1 penalty with the ℓp and transformed ℓ1 (T ℓ1 ) penalties since these nonconvex penalties outperformed ℓ1 in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with ℓp and T ℓ1 penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than ℓ1 . In addition, T ℓ1 preserves the model accuracy after channel pruning, and ℓ1/2,3/4 yield compressed models with similar accuracies as ℓ1 after retraining.  more » « less
Award ID(s):
1854434 1952644
PAR ID:
10249356
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Bebis, G.
Date Published:
Journal Name:
International Symposium on Visual Computing, Lecture Notes in Computer Science
Volume:
12509
Page Range / eLocation ID:
39-53
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an L1 penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the L1 penalty with the Lp and transformed L1 (TL1) penalties since these nonconvex penalties outperformed L1 in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with Lp and TL1 penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than L1. In addition, TL1 preserves the model accuracy after channel pruning, L1/2 and L3/4 yield compressed models with similar accuracies as L1 after retraining. 
    more » « less
  2. Tabacu, Lucia (Ed.)
    Convolutional neural networks (CNN) have been hugely successful recently with superior accuracy and performance in various imaging applications, such as classification, object detection, and segmentation. However, a highly accurate CNN model requires millions of parameters to be trained and utilized. Even to increase its performance slightly would require significantly more parameters due to adding more layers and/or increasing the number of filters per layer. Apparently, many of these weight parameters turn out to be redundant and extraneous, so the original, dense model can be replaced by its compressed version attained by imposing inter- and intra-group sparsity onto the layer weights during training. In this paper, we propose a nonconvex family of sparse group lasso that blends nonconvex regularization (e.g., transformed ℓ1, ℓ1 − ℓ2, and ℓ0) that induces sparsity onto the individual weights and ℓ2,1 regularization onto the output channels of a layer. We apply variable splitting onto the proposed regularization to develop an algorithm that consists of two steps per iteration: gradient descent and thresholding. Numerical experiments are demonstrated on various CNN architectures showcasing the effectiveness of the nonconvex family of sparse group lasso in network sparsification and test accuracy on par with the current state of the art. 
    more » « less
  3. This paper investigates the application of the ℓp quasinorm, where 0 < p < 1, in contexts characterized by photon-limited signals such as medical imaging and night vision. In these environments, low-photon count images have typically been modeled using Poisson statistics. In related algorithms, the ℓ1 norm is commonly employed as a regularization method to promotes sparsity in the reconstruction. However, recent research suggests that using the ℓp quasi-norm may yield lower error results. In this paper, we investigate the use of negative binomial statistics, which are more general models than Poisson models, in conjunction with the ℓp quasi-norm for recovering sparse signals in low-photon count imaging settings. 
    more » « less
  4. We study sparsification of convolutional neural networks (CNN) by a relaxed variable splitting method of ℓ0 and transformed-ℓ1 (Tℓ1) penalties, with application to complex curves such as texts written in different fonts, and words written with trembling hands simulating those of Parkinson’s disease patients. The CNN contains 3 convolutional layers, each followed by a maximum pooling, and finally a fully connected layer which contains the largest number of network weights. With ℓ0 penalty, we achieved over 99% test accuracy in distinguishing shaky vs. regular fonts or hand writings with above 86% of the weights in the fully connected layer being zero. Comparable sparsity and test accuracy are also reached with a proper choice of Tℓ1 penalty. 
    more » « less
  5. A popular method for flexible function estimation in nonparametric models is the smoothing spline. When applying the smoothing spline method, the nonparametric function is estimated via penalized least squares, where the penalty imposes a soft constraint on the function to be estimated. The specification of the penalty functional is usually based on a set of assumptions about the function. Choosing a reasonable penalty function is the key to the success of the smoothing spline method. In practice, there may exist multiple sets of widely accepted assumptions, leading to different penalties, which then yield different estimates. We refer to this problem as the problem of ambiguous penalties. Neglecting the underlying ambiguity and proceeding to the model with one of the candidate penalties may produce misleading results. In this article, we adopt a Bayesian perspective and propose a fully Bayesian approach that takes into consideration all the penalties as well as the ambiguity in choosing them. We also propose a sampling algorithm for drawing samples from the posterior distribution. Data analysis based on simulated and real‐world examples is used to demonstrate the efficiency of our proposed method. 
    more » « less