skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Proximal Mapping for Deep Regularization
Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. For example, robustness to adversarial perturbations, and correlations between multiple modalities. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.  more » « less
Award ID(s):
1910146
PAR ID:
10299609
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The nonlinearity of activation functions used in deep learning models is crucial for the success of predictive models. Several simple nonlinear functions, including Rectified Linear Unit (ReLU) and Leaky-ReLU (L-ReLU) are commonly used in neural networks to impose the nonlinearity. In practice, these functions remarkably enhance the model accuracy. However, there is limited insight into the effects of nonlinearity in neural networks on their performance. Here, we investigate the performance of neural network models as a function of nonlinearity using ReLU and L-ReLU activation functions in the context of different model architectures and data domains. We use entropy as a measurement of the randomness, to quantify the effects of nonlinearity in different architecture shapes on the performance of neural networks. We show that the ReLU nonliearity is a better choice for activation function mostly when the network has sufficient number of parameters. However, we found that the image classification models with transfer learning seem to perform well with L-ReLU in fully connected layers. We show that the entropy of hidden layer outputs in neural networks can fairly represent the fluctuations in information loss as a function of nonlinearity. Furthermore, we investigate the entropy profile of shallow neural networks as a way of representing their hidden layer dynamics. 
    more » « less
  2. Abstract We consider the problem of estimating the input and hidden variables of a stochastic multi-layer neural network (NN) from an observation of the output. The hidden variables in each layer are represented as matrices with statistical interactions along both rows as well as columns. This problem applies to matrix imputation, signal recovery via deep generative prior models, multi-task and mixed regression, and learning certain classes of two-layer NNs. We extend a recently-developed algorithm—multi-layer vector approximate message passing, for this matrix-valued inference problem. It is shown that the performance of the proposed multi-layer matrix vector approximate message passing algorithm can be exactly predicted in a certain random large-system limit, where the dimensions N × d of the unknown quantities grow as N → ∞ with d fixed. In the two-layer neural-network learning problem, this scaling corresponds to the case where the number of input features as well as training samples grow to infinity but the number of hidden nodes stays fixed. The analysis enables a precise prediction of the parameter and test error of the learning. 
    more » « less
  3. Abstract BackgroundThe expanding usage of complex machine learning methods such as deep learning has led to an explosion in human activity recognition, particularly applied to health. However, complex models which handle private and sometimes protected data, raise concerns about the potential leak of identifiable data. In this work, we focus on the case of a deep network model trained on images of individual faces. Materials and methodsA previously published deep learning model, trained to estimate the gaze from full-face image sequences was stress tested for personal information leakage by a white box inference attack. Full-face video recordings taken from 493 individuals undergoing an eye-tracking- based evaluation of neurological function were used. Outputs, gradients, intermediate layer outputs, loss, and labels were used as inputs for a deep network with an added support vector machine emission layer to recognize membership in the training data. ResultsThe inference attack method and associated mathematical analysis indicate that there is a low likelihood of unintended memorization of facial features in the deep learning model. ConclusionsIn this study, it is showed that the named model preserves the integrity of training data with reasonable confidence. The same process can be implemented in similar conditions for different models. 
    more » « less
  4. The increasing uncertainties caused by the high-penetration of stochastic renewable generation resources poses a significant threat to the power system voltage stability. To address this issue, this paper proposes a probabilistic deep kernel learning enabled surrogate model to extract the hidden relationship between uncertain sources, i.e., wind power and loads, and load margin for probabilistic load margin assessment (PLMA). Unlike other deep learning approaches, a kernel SHAP provides the sensitivity analysis as well as interpretability of the inputs to outputs influences. This allows identifying the critical factors that affect load margin so that corrective control can be initiated for stability enhancement. Numerical results carried out on the IEEE 118-bus power system demonstrate the accuracy and efficiency of the proposed data-driven PLMA scheme. 
    more » « less
  5. We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, 1986). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l≥2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm. 
    more » « less