NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ANODEv2: A Coupled Neural ODE Framework

Zhang, Tianjun and (April 2022, Advances in neural information processing systems)

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new framework. We present the formulation of ANODEV2, derive optimality conditions, and implement the coupled framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach.
more » « less
Full Text Available
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods

https://doi.org/10.1109/TPDS.2021.3056045

Asri, Mochamad; Malhotra, Dhairya; Wang, Jiajun; Biros, George; John, Lizy K; Gerstlauer, Andreas (January 2021, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Fast Approximation of the Gauss--Newton Hessian Matrix for the Multilayer Perceptron

https://doi.org/10.1137/19M129961X

Chen, Chao; Reiz, Severin; Yu, Chenhan D.; Bungartz, Hans-Joachim; Biros, George (January 2021, SIAM Journal on Matrix Analysis and Applications)

Full Text Available
RCHOL: Randomized Cholesky Factorization for Solving SDD Linear Systems

https://doi.org/10.1137/20M1380624

Chen, Chao; Liang, Tianyu; Biros, George (January 2021, SIAM Journal on Scientific Computing)

Full Text Available
Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices

https://doi.org/10.1109/MCSoC.2019.00008

Yu, Chenhan D.; Reiz, Severin; Biros, George (October 2019, 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC))

Full Text Available
ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

https://doi.org/10.24963/ijcai.2019/103

Gholaminejad, Amir; Keutzer, Kurt; Biros, George (August 2019, International Joint Conferences on Artificial Intelligence)

Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.
more » « less
Full Text Available

Search for: All records