skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Sparse Neural Networks via ℓ0 and Tℓ1 by a Relaxed Variable Splitting Method with Application to Multi-scale Curve Classification
We study sparsification of convolutional neural networks (CNN) by a relaxed variable splitting method of ℓ0 and transformed-ℓ1 (Tℓ1) penalties, with application to complex curves such as texts written in different fonts, and words written with trembling hands simulating those of Parkinson’s disease patients. The CNN contains 3 convolutional layers, each followed by a maximum pooling, and finally a fully connected layer which contains the largest number of network weights. With ℓ0 penalty, we achieved over 99% test accuracy in distinguishing shaky vs. regular fonts or hand writings with above 86% of the weights in the fully connected layer being zero. Comparable sparsity and test accuracy are also reached with a proper choice of Tℓ1 penalty.  more » « less
Award ID(s):
1632935
PAR ID:
10112757
Author(s) / Creator(s):
;
Date Published:
Journal Name:
6th World Congress on Global Optimization
Page Range / eLocation ID:
800-809
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tabacu, Lucia (Ed.)
    Convolutional neural networks (CNN) have been hugely successful recently with superior accuracy and performance in various imaging applications, such as classification, object detection, and segmentation. However, a highly accurate CNN model requires millions of parameters to be trained and utilized. Even to increase its performance slightly would require significantly more parameters due to adding more layers and/or increasing the number of filters per layer. Apparently, many of these weight parameters turn out to be redundant and extraneous, so the original, dense model can be replaced by its compressed version attained by imposing inter- and intra-group sparsity onto the layer weights during training. In this paper, we propose a nonconvex family of sparse group lasso that blends nonconvex regularization (e.g., transformed ℓ1, ℓ1 − ℓ2, and ℓ0) that induces sparsity onto the individual weights and ℓ2,1 regularization onto the output channels of a layer. We apply variable splitting onto the proposed regularization to develop an algorithm that consists of two steps per iteration: gradient descent and thresholding. Numerical experiments are demonstrated on various CNN architectures showcasing the effectiveness of the nonconvex family of sparse group lasso in network sparsification and test accuracy on par with the current state of the art. 
    more » « less
  2. We trained convolutional neural networks (CNNs) to suppress off-axis scattering in the short-time Fourier Transform (STFT) domain. Our training data were point target responses from simulated anechoic cysts. We used random neural architecture search to build CNN models with variable input formulations, layer sizes, and training hyperparameters. Our results showed that CNNs were easier to train, as they required fewer network weights to match the performance of fully-connected networks (FCNs). The best CNN models achieved comparable phantom CNRs with with two to three orders of magnitude fewer weights. 
    more » « less
  3. Convolutional neural networks (CNN) have been hugely successful recently with superior accuracy and performance in various imaging applications, such as classification, object detection, and segmentation. However, a highly accurate CNN model requires millions of parameters to be trained and utilized. Even to increase its performance slightly would require significantly more parameters due to adding more layers and/or increasing the number of filters per layer. Apparently, many of these weight parameters turn out to be redundant and extraneous, so the original, dense model can be replaced by its compressed version attained by imposing inter- and intra-group sparsity onto the layer weights during training. In this paper, we propose a nonconvex family of sparse group lasso that blends nonconvex regularization (e.g., transformed L1, L1 - L2, and L0) that induces sparsity onto the individual weights and L2,1 regularization onto the output channels of a layer. We apply variable splitting onto the proposed regularization to develop an algorithm that consists of two steps per iteration: gradient descent and thresholding. Numerical experiments are demonstrated on various CNN architectures showcasing the effectiveness of the nonconvex family of sparse group lasso in network sparsification and test accuracy on par with the current state of the art. 
    more » « less
  4. Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. Binarized activation offers an additional computational saving for inference. Due to vanishing gradient issue in training networks with binarized activation, coarse gradient (a.k.a. straight through estimator) is adopted in practice. In this paper, we study the problem of coarse gradient descent (CGD) learning of a one hidden layer convolutional neural network (CNN) with binarized activation function and sparse weights. It is known that when the input data is Gaussian distributed, no-overlap one hidden layer CNN with ReLU activation and general weight can be learned by GD in polynomial time at high probability in regression problems with ground truth. We propose a relaxed variable splitting method integrating thresholding and coarse gradient descent. The sparsity in network weight is realized through thresholding during the CGD training process. We prove that under thresholding of L1, L0, and transformed-L1 penalties, no-overlap binary activation CNN can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel sparsifying operation. We found explicit error estimates of sparse weights from the true weights. 
    more » « less
  5. Quantum computing (QC) has opened the door to advancements in machine learning (ML) tasks that are currently implemented in the classical domain. Convolutional neural networks (CNNs) are classical ML architectures that exploit data locality and possess a simpler structure than a fully connected multi-layer perceptrons (MLPs) without compromising the accuracy of classification. However, the concept of preserving data locality is usually overlooked in the existing quantum counterparts of CNNs, particularly for extracting multifeatures in multidimensional data. In this paper, we present an multidimensional quantum convolutional classifier (MQCC) that performs multidimensional and multifeature quantum convolution with average and Euclidean pooling, thus adapting the CNN structure to a variational quantum algorithm (VQA). The experimental work was conducted using multidimensional data to validate the correctness and demonstrate the scalability of the proposed method utilizing both noisy and noise-free quantum simulations. We evaluated the MQCC model with reference to reported work on state-of-the-art quantum simulators from IBM Quantum and Xanadu using a variety of standard ML datasets. The experimental results show the favorable characteristics of our proposed techniques compared with existing work with respect to a number of quantitative metrics, such as the number of training parameters, cross-entropy loss, classification accuracy, circuit depth, and quantum gate count. 
    more » « less