skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Adaptive Empirical Bayesian Method for Sparse Deep Learning
We propose a novel adaptive empirical Bayesian (AEB) method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors. The proposed method works by alternatively sampling from an adaptive hierarchical posterior distribution using stochastic gradient Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters using stochastic approximation (SA). We further prove the convergence of the proposed method to the asymptotically correct distribution under mild conditions. Empirical applications of the proposed method lead to the state-of-the-art performance on MNIST and Fashion MNIST with shallow convolutional neural networks (CNN) and the state-of-the-art compression performance on CIFAR10 with Residual Networks. The proposed method also improves resistance to adversarial attacks.  more » « less
Award ID(s):
1736364 1555072 1821233
PAR ID:
10188426
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
32
Issue:
1
ISSN:
1049-5258
Page Range / eLocation ID:
8794
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper presents a novel framework for training convolutional neural networks (CNNs) to quantify the impact of gradual and abrupt uncertainties in the form of adversarial attacks. Uncertainty quantification is achieved by combining the CNN with a Gaussian process (GP) classifier algorithm. The variance of the GP quantifies the impact on the uncertainties and especially their effect on the object classification tasks. Learning from uncertainty provides the proposed CNN-GP framework with flexibility, reliability and robustness to adversarial attacks. The proposed approach includes training the network under noisy conditions. This is accomplished by comparing predictions with classification labels via the Kullback-Leibler divergence, Wasserstein distance and maximum correntropy. The network performance is tested on the classical MNIST, Fashion-MNIST, CIFAR10 and CIFAR 100 datasets. Further tests on robustness to both black-box and white-box attacks are also carried out for MNIST. The results show that the testing accuracy improves for networks that backpropogate uncertainty as compared to methods that do not quantify the impact of uncertainties. A comparison with a state-of-art Monte Carlo dropout method is also presented and the outperformance of the CNN-GP framework with respect to reliability and computational efficiency is demonstrated. 
    more » « less
  2. State-of-the-art subspace clustering methods are based on the self-expressive model, which represents each data point as a linear combination of other data points. However, such methods are designed for a finite sample dataset and lack the ability to generalize to out-of-sample data. Moreover, since the number of self-expressive coefficients grows quadratically with the number of data points, their ability to handle large-scale datasets is often limited. In this paper, we propose a novel framework for subspace clustering, termed Self-Expressive Network (SENet), which employs a properly designed neural network to learn a self-expressive representation of the data. We show that our SENet can not only learn the self-expressive coefficients with desired properties on the training data, but also handle out-of-sample data. Besides, we show that SENet can also be leveraged to perform subspace clustering on large-scale datasets. Extensive experiments conducted on synthetic data and real world benchmark data validate the effectiveness of the proposed method. In particular, SENet yields highly competitive performance on MNIST, Fashion MNIST and Extended MNIST and state-of-the-art performance on CIFAR-10. 
    more » « less
  3. Meila, Marina; Zhang, Tong (Ed.)
    Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models. Its performance, however, is highly variable, depending crucially on the choice of the step sizes. Accordingly, a variety of strategies for tuning the step sizes have been proposed, ranging from coordinate-wise approaches (a.k.a. “adaptive” step sizes) to sophisticated heuristics to change the step size in each iteration. In this paper, we study two step size schedules whose power has been repeatedly confirmed in practice: the exponential and the cosine step sizes. For the first time, we provide theoretical support for them proving convergence rates for smooth non-convex functions, with and without the Polyak-Łojasiewicz (PL) condition. Moreover, we show the surprising property that these two strategies are adaptive to the noise level in the stochastic gradients of PL functions. That is, contrary to polynomial step sizes, they achieve almost optimal performance without needing to know the noise level nor tuning their hyperparameters based on it. Finally, we conduct a fair and comprehensive empirical evaluation of real-world datasets with deep learning architectures. Results show that, even if only requiring at most two hyperparameters to tune, these two strategies best or match the performance of various finely-tuned state-of-the-art strategies. 
    more » « less
  4. null (Ed.)
    Abstract Deep neural networks (DNNs) have achieved state-of-the-art performance in many important domains, including medical diagnosis, security, and autonomous driving. In domains where safety is highly critical, an erroneous decision can result in serious consequences. While a perfect prediction accuracy is not always achievable, recent work on Bayesian deep networks shows that it is possible to know when DNNs are more likely to make mistakes. Knowing what DNNs do not know is desirable to increase the safety of deep learning technology in sensitive applications; Bayesian neural networks attempt to address this challenge. Traditional approaches are computationally intractable and do not scale well to large, complex neural network architectures. In this paper, we develop a theoretical framework to approximate Bayesian inference for DNNs by imposing a Bernoulli distribution on the model weights. This method called Monte Carlo DropConnect (MC-DropConnect) gives us a tool to represent the model uncertainty with little change in the overall model structure or computational cost. We extensively validate the proposed algorithm on multiple network architectures and datasets for classification and semantic segmentation tasks. We also propose new metrics to quantify uncertainty estimates. This enables an objective comparison between MC-DropConnect and prior approaches. Our empirical results demonstrate that the proposed framework yields significant improvement in both prediction accuracy and uncertainty estimation quality compared to the state of the art. 
    more » « less
  5. null (Ed.)
    Abstract The problem of localizing a moving target arises in various forms in wireless sensor networks. Deploying multiple sensing receivers and using the time-difference-of-arrival (TDOA) of the target’s emitted signal is widely considered an effective localization technique. Traditionally, TDOA-based algorithms adopt a centralized approach where all measurements are sent to a predefined reference node for position estimation. More recently, distributed TDOA-based localization algorithms have been shown to improve the robustness of these estimates. For target models governed by highly stochastic processes, the method of nonlinear filtering and state estimation must be carefully considered. In this work, a distributed TDOA-based particle filter algorithm is proposed for localizing a moving target modeled by a discrete-time correlated random walk (DCRW). We present a method for using data collected by the particle filter to estimate the unknown probability distributions of the target’s movement model, and then apply the distribution estimates to recursively update the particle filter’s propagation model. The performance of the distributed approach is evaluated through numerical simulation, and we show the benefit of using a particle filter with online model learning by comparing it with the non-adaptive approach. 
    more » « less