skip to main content


Title: ANODEv2: A Coupled Neural ODE Framework
It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new framework. We present the formulation of ANODEV2, derive optimality conditions, and implement the coupled framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach.  more » « less
Award ID(s):
1817048
NSF-PAR ID:
10322901
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.

     
    more » « less
  2. We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π0 and π1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from π0 and π1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of π0 and π1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step. 
    more » « less
  3. Delay-Differential Equations (DDEs) are the most common representation for systems with delay. However, the DDE representation has limitations. In network models with delay, the delayed channels are typically low-dimensional and accounting for this heterogeneity is challenging in the DDE framework. In addition, DDEs cannot be used to model difference equations. In this paper, we examine alternative representations for networked systems with delay and provide formulae for conversion between representations. First, we examine the Differential-Difference (DDF) formulation which allows us to represent the low-dimensional nature of delayed information. Next, we consider the coupled ODE-PDE framework and extend this to the recently developed Partial-Integral Equation (PIE) representation. The PIE framework has the advantage that it allows the H∞-optimal estimation and control problems to be solved efficiently using the recently developed software package PIETOOLS. In each case, we consider a very general class of networks, specifically accounting for four sources of delay - state delay, input delay, output delay, and process delay. Finally, we use a scalable network model of temperature control to show that the use of the DDF/PIE formulation allows for optimal control of a network with 40 users, 80 states, 40 delays, 40 inputs, and 40 disturbances. 
    more » « less
  4. Abstract

    Agent‐based models (ABMs) are increasing in popularity as tools to simulate and explore many biological systems. Successes in simulation lead to deeper investigations, from designing systems to optimizing performance. The typically stochastic, rule‐based structure of ABMs, however, does not lend itself to analytic and numerical techniques of optimization the way traditional dynamical systems models do. The goal of this work is to illustrate a technique for approximating ABMs with a partial differential equation (PDE) system to design some management strategies on the ABM. We propose a surrogate modeling approach, using differential equations that admit direct means of determining optimal controls, with a particular focus on environmental heterogeneity in the ABM. We implement this program with both PDE and ordinary differential equation (ODE) approximations on the well‐known rabbits and grass ABM, in which a pest population consumes a resource. The control problem addressed is the reduction of this pest population through an optimal control formulation. After fitting the ODE and PDE models to ABM simulation data in the absence of control, we compute optimal controls using the ODE and PDE models, which we them apply to the ABM. The results show promise for approximating ABMs with differential equations in this context.

     
    more » « less
  5. Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology, multi-body system in physics, and particle dynamics in material science. Most of the existing models are built to learn single system dynamics, which learn the dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations. 
    more » « less