Channel Normalization in Convolutional Neural Networks avoids Vanishing Gradients

Dai, Z.; Heckel, R.

Citation Details

Normalization layers are widely used in deep neural networks to stabilize training. In this paper, we consider the training of convolutional neural networks with gradient descent on a single training example. This optimization problem arises in recent approaches for solving inverse problems such as the deep image prior or the deep decoder. We show that for this setup, channel normalization, which centers and normalizes each channel individually, avoids vanishing gradients, whereas without normalization, gradients vanish which prevents efficient optimization. This effect prevails in deep single-channel linear convolutional networks, and we show that without channel normalization, gradient descent takes at least exponentially many steps to come close to an optimum. Contrary, with channel normalization, the gradients remain bounded, thus avoiding exploding gradients. more »

Award ID(s):: 1816986

PAR ID:: 10100189

Author(s) / Creator(s):: Dai, Z.; Heckel, R.

Date Published:: 2019-04-01

Journal Name:: International Conference on Machine Learning, Workshop Deep Phenomena

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this