Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

Zhang, Yikai; Qu, Hui; Chen, Chao; Metaxas, Dimitris

doi:10.24963/ijcai.2019/604

Citation Details

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm. more »

Award ID(s):: 1855759

PAR ID:: 10108626

Author(s) / Creator(s):: Zhang, Yikai; Qu, Hui; Chen, Chao; Metaxas, Dimitris

Date Published:: 2019-08-01

Journal Name:: The Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI)

Page Range / eLocation ID:: 4348 to 4354

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.24963/ijcai.2019/604

More Like this