MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Nguyen, Tan M.; Baraniuk, Richard G.; Bertozzi, Andrea L.; Osher, Stanley L.; Wang, Bao

Citation Details

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. more »

Award ID(s):: 1911094 1838177 1730574

PAR ID:: 10205628

Author(s) / Creator(s):: Nguyen, Tan M.; Baraniuk, Richard G.; Bertozzi, Andrea L.; Osher, Stanley L.; Wang, Bao

Date Published:: 2020-12-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this