skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs
Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.  more » « less
Award ID(s):
1817048
PAR ID:
10322883
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Joint Conferences on Artificial Intelligence
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new framework. We present the formulation of ANODEV2, derive optimality conditions, and implement the coupled framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach. 
    more » « less
  2. We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code. We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations. 
    more » « less
  3. The neural Ordinary Differential Equation (ODE) model has shown success in learning complex continuous-time processes from observations on discrete time stamps. In this work, we consider the modeling and forecasting of time series data that are non-stationary and may have sharp changes like spikes. We propose an RNN-based model, called RNN-ODE-Adap, that uses a neural ODE to represent the time development of the hidden states, and we adaptively select time steps based on the steepness of changes of the data over time so as to train the model more efficiently for the "spike-like" time series. Theoretically, RNN-ODE-Adap yields provably a consistent estimation of the intensity function for the Hawkes-type time series data. We also provide an approximation analysis of the RNN-ODE model showing the benefit of adaptive steps. The proposed model is demonstrated to achieve higher prediction accuracy with reduced computational cost on simulated dynamic system data and point process data and on a real electrocardiography dataset. 
    more » « less
  4. null (Ed.)
    A computationally efficient one-shot approach with a low memory footprint is presented for unsteady optimization. The proposed technique is based on a novel and unique approach that combines local-in-time and fixed-point iteration methods to advance the unconverged primal and adjoint solutions forward and backward in time to evaluate the sensitivity of the globally time-integrated objective function. This is in some ways similar to the piggyback iterations in which primal and adjoint solutions are evaluated at a frozen design. During each cycle, the primal, adjoint, and design update problems are solved to advance the optimization problem. This new coupled approach is shown to provide significant savings in the memory footprint while reducing the computational cost of primal and adjoint evaluations per design cycle. The method is first applied to an inverse design problem for the unsteady lid-driven cavity. Following this, vortex suppression and mean drag reduction for a circular cylinder in crossflow is considered. Both of these objectives are achieved by optimizing the rotational speeds for steady or periodically oscillating excitations. For all cases presented in this work, the proposed technique is shown to provide significant reductions in memory as well as computational time. It is also shown that the unsteady optimization problem converges to the same optimal solution obtained using a conventional approach. 
    more » « less
  5. A computationally efficient "one-shot" approach with a low memory footprint is presented for unsteady design optimization. The proposed technique is based on a novel and unique approach that combines "local-in-time" and fixed-point iteration methods to advance the unconverged primal and adjoint solutions forward and backward in time to evaluate the sensitivity of the globally time-integrated objective function. This is in some ways similar to the "piggyback" iterations where primal and adjoint solutions are evaluated at a frozen design. During each cycle, the primal, adjoint, and design update problems are solved to advance the optimization problem. This new coupled approach is shown to provide significant savings in the memory footprint while reducing the computational cost of primal and adjoint evaluations per design cycle. The method is first applied to an inverse design problem for the unsteady lid-driven cavity. Following this, vortex suppression and mean drag reduction for a circular cylinder in cross-flow is considered. Both of these objectives are achieved by optimizing the rotational speeds for steady or periodically oscillating excitations. For all cases presented in this work, the proposed technique is shown to provide significant reductions in memory as well as computational time. It is also shown that the unsteady design optimization problem converges to the same optimal solution obtained using a conventional approach. 
    more » « less