Reducing the model redundancy is an important task to deploy complex deep learning models to resource-limited or time-sensitive devices. Directly regularizing or modifying weight values makes pruning procedure less robust and sensitive to the choice of hyperparameters, and it also requires prior knowledge to tune different hyperparameters for different models. To build a better generalized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights. The instability and noise during training on auxiliary parameters will not directly affect weight values, which makes pruning process more robust to noise and less sensitive to hyperparameters. Moreover, we design gradient update rules for auxiliary parameters to keep them consistent with pruning tasks. Our method can automatically eliminate network redundancy with recoverability, relieving the complicated prior knowledge required to design thresholding functions, and reducing the time for trial and error. We evaluate our method with LeNet and VGGlike on MNIST and CIFAR-10 datasets, and with AlexNet, ResNet and MobileNet on ImageNet to establish the scalability of our work. Results show that our model achieves state-of-the-art sparsity, e.g. 7%, 23% FLOPs and 310x, 75x compression ratio for LeNet5 and VGG-like structure without accuracy drop, and 200M and 100M FLOPs for MobileNet V2 with accuracy 73.32% and 66.83% respectively.
more »
« less
This content will become publicly available on April 6, 2026
SBL Algorithms for the Multiple Measurement Vector Problem: New Modeling and Inference Methods
This paper introduces new and practically relevant non-Gaussian priors for the Sparse Bayesian Learning (SBL) framework applied to the Multiple Measurement Vector (MMV) problem. We extend the Gaussian Scale Mixture (GSM) framework to model prior distributions for row vectors, exploring the use of shared and different hyperparameters across different measurements. We propose Expectation Maximization (EM) based algorithms to estimate the parameters of the prior density along with the hyperparameters. To promote sparsity more effectively in a non-Gaussian setting, we show the importance of incorporating learning of the parameters of the mixing density. Such an approach effectively utilizes the common support notion in the MMV problem and promotes sparsity without explicitly imposing a sparsity-promoting prior, indicating the methods’ robustness to model mismatches. Numerical simulations are provided to compare the proposed approaches with the existing SBL algorithm for the MMV problem.
more »
« less
- PAR ID:
- 10599163
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-6874-1
- Page Range / eLocation ID:
- 1 to 5
- Subject(s) / Keyword(s):
- simultaneous sparse approximation sparse Bayesian learning Gaussian Scale Mixture Laplace distribution
- Format(s):
- Medium: X
- Location:
- Hyderabad, India
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sparse Bayesian Learning (SBL) is a popular sparse signal recovery method, and various algorithms exist under the SBL paradigm. In this paper, we introduce a novel re-parameterization that allows the iterations of existing algorithms to be viewed as special cases of a unified and general mapping function. Furthermore, the re-parameterization enables an interesting beamforming interpretation that lends insights to all the considered algorithms. Utilizing the abstraction allowed by the general mapping viewpoint, we introduce a novel neural network architecture for learning improved iterative update rules under the SBL framework. Our modular design of the architecture enables the model to be independent of the size of the measurement matrix and provides us a unique opportunity to test the generalization capabilities across different measurement matrices. We show that the network when trained on a particular parameterized dictionary generalizes in many ways hitherto not possible; different measurement matrices, both type and dimension, and number of snapshots. Our numerical results showcase the generalization capability of our network in terms of mean square error and probability of support recovery across sparsity levels, different signal-to-noise ratios, number of snapshots and multiple measurement matrices of different sizes.more » « less
-
Humans can learn complex functional relationships between variables from small amounts of data. In doing so, they draw on prior expectations about the form of these relationships. In three experiments, we show that people learn to adjust these expectations through experience, learning about the likely forms of the functions they will encounter. Previous work has used Gaussian processes—a statistical framework that extends Bayesian nonparametric approaches to regression—to model human function learning. We build on this work, modeling the process of learning to learn functions as a form of hierarchical Bayesian inference about the Gaussian process hyperparameters.more » « less
-
Deep learning is a promising approach to early DRV (Design Rule Violation) prediction. However, non-deterministic parallel routing hampers model training and degrades prediction accuracy. In this work, we propose a stochastic approach, called LGC-Net, to solve this problem. In this approach, we develop new techniques of Gaussian random field layer and focal likelihood loss function to seamlessly integrate Log Gaussian Cox process with deep learning. This approach provides not only statistical regression results but also classification ones with different thresholds without retraining. Experimental results with noisy training data on industrial designs demonstrate that LGC-Net achieves significantly better accuracy of DRV density prediction than prior arts.more » « less
-
Large pretrained transformer models have revolutionized modern AI applications with their state-of-the-art performance in natural language processing (NLP). However, their substantial parameter count poses challenges for real-world deployment. To address this, researchers often reduce model size by pruning parameters based on their magnitude or sensitivity. Previous research has demonstrated the limitations of magnitude pruning, especially in the context of transfer learning for modern NLP tasks. In this paper, we introduce a new magnitude-based pruning algorithm called mixture Gaussian prior pruning (MGPP), which employs a mixture Gaussian prior for regularization. MGPP prunes non-expressive weights under the guidance of the mixture Gaussian prior, aiming to retain the model’s expressive capability. Extensive evaluations across various NLP tasks, including natural language understanding, question answering, and natural language generation, demonstrate the superiority of MGPP over existing pruning methods, particularly in high sparsity settings. Additionally, we provide a theoretical justification for the consistency of the sparse transformer, shedding light on the effectiveness of the proposed pruning method.more » « less
An official website of the United States government
