Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional kernels. These methods often destroy or ignore the benign convolutional structure of the kernels; therefore, they are often expensive or impractical for deep ConvNets. In contrast, we introduce a simple and efficient Convolutional Normalization'' (ConvNorm) method that can fully exploit the convolutional structure in the Fourier domain and serve as a simple plug-and-play module to be conveniently incorporated into any ConvNets. Our method is inspired by recent work on preconditioning methods for convolutional sparse coding and can effectively promote each layer's channel-wise isometry. Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets. Applied to classification under noise corruptions and generative adversarial network (GAN), we show that the ConvNorm improves the robustness of common ConvNets such as ResNet and the performance of GAN.more »
Training Faster by Separating Modes of Variation in Batch-normalized Models
Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batch-normalized mini-batch. However, given the rectifying non-linearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respect more »
- Award ID(s):
- 1741431
- Publication Date:
- NSF-PAR ID:
- 10111646
- Journal Name:
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Page Range or eLocation-ID:
- 1 to 1
- ISSN:
- 0162-8828
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Model confidence or uncertainty is critical in autonomous systems as they directly tie to the safety and trustworthiness of the system. The quantification of uncertainty in the output decisions of deep neural networks (DNNs) is a challenging problem. The Bayesian framework enables the estimation of the predictive uncertainty by introducing probability distributions over the (unknown) network weights; however, the propagation of these high-dimensional distributions through multiple layers and non-linear transformations is mathematically intractable. In this work, we propose an extended variational inference (eVI) framework for convolutional neural network (CNN) based on tensor Normal distributions (TNDs) defined over convolutional kernels. Our proposed eVI framework propagates the first two moments (mean and covariance) of these TNDs through all layers of the CNN. We employ first-order Taylor series linearization to approximate the mean and covariances passing through the non-linear activations. The uncertainty in the output decision is given by the propagated covariance of the predictive distribution. Furthermore, we show, through extensive simulations on the MNIST and CIFAR-10 datasets, that the CNN becomes more robust to Gaussian noise and adversarial attacks.
-
This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that ourmore »
-
This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that ourmore »
-
Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this paper, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss function and hence convergence during training. One benefit is that BNP is not constrained on the mini-batch size and works in the online learning setting. Furthermore, its connection to BN provides theoretical insights on how BN improves training and how BN is applied to special architectures such as convolutional neural networks. For a theoretical foundation, we also present a novel Hessian condition number based convergence theory for a locally convex but not strong-convex loss, which is applicable to networks with a scale-invariant property.