skip to main content


Title: ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.

 
more » « less
Award ID(s):
1817048
NSF-PAR ID:
10322883
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Joint Conferences on Artificial Intelligence
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Optical neural networks (ONNs), implemented on an array of cascaded Mach–Zehnder interferometers (MZIs), have recently been proposed as a possible replacement for conventional deep learning hardware. They potentially offer higher energy efficiency and computational speed when compared to their electronic counterparts. By utilizing tunable phase shifters, one can adjust the output of each of MZI to enable emulation of arbitrary matrix–vector multiplication. These phase shifters are central to the programmability of ONNs, but they require a large footprint and are relatively slow. Here we propose an ONN architecture that utilizes parity–time (PT) symmetric couplers as its building blocks. Instead of modulating phase, gain–loss contrasts across the array are adjusted as a means to train the network. We demonstrate that PT symmetric ONNs (PT-ONNs) are adequately expressive by performing the digit-recognition task on the Modified National Institute of Standards and Technology dataset. Compared to conventional ONNs, the PT-ONN achieves a comparable accuracy (67% versus 71%) while circumventing the problems associated with changing phase. Our approach may lead to new and alternative avenues for fast training in chip-scale ONNs.

     
    more » « less
  2. A computationally efficient "one-shot" approach with a low memory footprint is presented for unsteady design optimization. The proposed technique is based on a novel and unique approach that combines "local-in-time" and fixed-point iteration methods to advance the unconverged primal and adjoint solutions forward and backward in time to evaluate the sensitivity of the globally time-integrated objective function. This is in some ways similar to the "piggyback" iterations where primal and adjoint solutions are evaluated at a frozen design. During each cycle, the primal, adjoint, and design update problems are solved to advance the optimization problem. This new coupled approach is shown to provide significant savings in the memory footprint while reducing the computational cost of primal and adjoint evaluations per design cycle. The method is first applied to an inverse design problem for the unsteady lid-driven cavity. Following this, vortex suppression and mean drag reduction for a circular cylinder in cross-flow is considered. Both of these objectives are achieved by optimizing the rotational speeds for steady or periodically oscillating excitations. For all cases presented in this work, the proposed technique is shown to provide significant reductions in memory as well as computational time. It is also shown that the unsteady design optimization problem converges to the same optimal solution obtained using a conventional approach. 
    more » « less
  3. It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new framework. We present the formulation of ANODEV2, derive optimality conditions, and implement the coupled framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach. 
    more » « less
  4. We consider a class of nonsmooth convex composite optimization problems, where the objective function is given by the sum of a continuously differentiable convex term and a potentially non-differentiable convex regularizer. In [1], the authors introduced the proximal augmented Lagrangian method and derived the resulting continuous-time primal-dual dynamics that converge to the optimal solution. In this paper, we extend these dynamics from continuous to discrete time via the forward Euler discretization. We prove explicit bounds on the exponential convergence rates of our proposed algorithm with a sufficiently small step size. Since a larger step size can improve the convergence speed, we further develop a linear matrix inequality (LMI) condition which can be numerically solved to provide rate certificates with general step size choices. In addition, we prove that a large range of step size values can guarantee exponential convergence. We close the paper by demonstrating the performance of the proposed algorithm via computational experiments. 
    more » « less
  5. null (Ed.)
    A computationally efficient one-shot approach with a low memory footprint is presented for unsteady optimization. The proposed technique is based on a novel and unique approach that combines local-in-time and fixed-point iteration methods to advance the unconverged primal and adjoint solutions forward and backward in time to evaluate the sensitivity of the globally time-integrated objective function. This is in some ways similar to the piggyback iterations in which primal and adjoint solutions are evaluated at a frozen design. During each cycle, the primal, adjoint, and design update problems are solved to advance the optimization problem. This new coupled approach is shown to provide significant savings in the memory footprint while reducing the computational cost of primal and adjoint evaluations per design cycle. The method is first applied to an inverse design problem for the unsteady lid-driven cavity. Following this, vortex suppression and mean drag reduction for a circular cylinder in crossflow is considered. Both of these objectives are achieved by optimizing the rotational speeds for steady or periodically oscillating excitations. For all cases presented in this work, the proposed technique is shown to provide significant reductions in memory as well as computational time. It is also shown that the unsteady optimization problem converges to the same optimal solution obtained using a conventional approach. 
    more » « less