skip to main content


Title: Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non- linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.  more » « less
Award ID(s):
1747783
NSF-PAR ID:
10084435
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that the self-attention mechanism can help to regulate the noise by enhancing instance-specific information to obtain a better regularization effect. Therefore, we propose an attention-based BN called Instance Enhancement Batch Normalization (IEBN) that recalibrates the information of each channel by a simple linear transformation. IEBN has a good capacity of regulating the batch noise and stabilizing network training to improve generalization even in the presence of two kinds of noise attacks during training. Finally, IEBN outperforms BN with only a light parameter increment in image classification tasks under different network structures and benchmark datasets. 
    more » « less
  2. Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batch-normalized mini-batch. However, given the rectifying non-linearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respect to the statistics of disentangled sub-populations. We refer to our proposed soft piecewise version of batch normalization as Mixture Normalization (MN). Through extensive set of experiments on CIFAR-10 and CIFAR-100, using both a 5-layers deep CNN and modern Inception-V3 architecture, we show that mixture normalization reduces required number of gradient updates to reach the maximum test accuracy of the batch normalized model by ∼31%-47% across a variety of training scenarios. Replacing even a few BN modules with MN in the 48-layers deep Inception-V3 architecture is sufficient to not only obtain considerable training acceleration but also better final test accuracy. We show that similar observations are valid for 40 and 100-layers deep DenseNet architectures as well. We complement our study by evaluating the application of mixture normalization to the Generative Adversarial Networks (GANs), where "mode collapse" hinders the training process. We solely replace a few batch normalization layers in the generator with our proposed mixture normalization. Our experiments using Deep Convolutional GAN (DCGAN) on CIFAR-10 show that mixture normalized DCGAN not only provides an acceleration of ∼58% but also reaches lower (better) "Fréchet Inception Distance" (FID) of 33.35 compared to 37.56 of its batch normalized counterpart. 
    more » « less
  3. Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this paper, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss function and hence convergence during training. One benefit is that BNP is not constrained on the mini-batch size and works in the online learning setting. Furthermore, its connection to BN provides theoretical insights on how BN improves training and how BN is applied to special architectures such as convolutional neural networks. For a theoretical foundation, we also present a novel Hessian condition number based convergence theory for a locally convex but not strong-convex loss, which is applicable to networks with a scale-invariant property. 
    more » « less
  4. This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that our approach accomplishes comparable or better performance than other state-of-the-art approaches in terms of the Dice coefficient and the average perpendicular distance. Index Terms—Magnetic Resonance Imaging; Batch Normalization; Exponential Linear Units 
    more » « less
  5. This work presents a novel deep learning architecture called BNU-Net for the purpose of cardiac segmentation based on short-axis MRI images. Its name is derived from the Batch Normalized (BN) U-Net architecture for medical image segmentation. New generations of deep neural networks (NN) are called convolutional NN (CNN). CNNs like U-Net have been widely used for image classification tasks. CNNs are supervised training models which are trained to learn hierarchies of features automatically and robustly perform classification. Our architecture consists of an encoding path for feature extraction and a decoding path that enables precise localization. We compare this approach with a parallel approach named U-Net. Both BNU-Net and U-Net are cardiac segmentation approaches: while BNU-Net employs batch normalization to the results of each convolutional layer and applies an exponential linear unit (ELU) approach that operates as activation function, U-Net does not apply batch normalization and is based on Rectified Linear Units (ReLU). The presented work (i) facilitates various image preprocessing techniques, which includes affine transformations and elastic deformations, and (ii) segments the preprocessed images using the new deep learning architecture. We evaluate our approach on a dataset containing 805 MRI images from 45 patients. The experimental results reveal that our approach accomplishes comparable or better performance than other state-of-the-art approaches in terms of the Dice coefficient and the average perpendicular distance. 
    more » « less