Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

Zadorozhnyy, Vasily; Mucllari, Edison; Pospisil, Cole; Nguyen, Duc; Ye, Qiang

doi:10.1162/neco_a_01710

Citation Details

This content will become publicly available on November 19, 2025

Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs. more »

Award ID(s):: 2208314 2327113 2534947 2516126 2151802

PAR ID:: 10616223

Author(s) / Creator(s):: Zadorozhnyy, Vasily; Mucllari, Edison; Pospisil, Cole; Nguyen, Duc; Ye, Qiang

Publisher / Repository:: Massachusetts Institute of Technology

Date Published:: 2024-11-19

Journal Name:: Neural Computation

Volume:: 36

Issue:: 12

ISSN:: 0899-7667

Page Range / eLocation ID:: 2651 to 2676

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on November 19, 2025
Journal Article:
https://doi.org/10.1162/neco_a_01710

More Like this