Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in arXiv:1806.07366, claimed that this memory overhead can be reduced from LNt, where Nt is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, a neural ODE framework which avoids the numerical instability related problems noted above. ANODE has a memory footprint of O(L) + O(Nt), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.
more »
« less
Towards Stability of Autoregressive Neural Operators
Neural operators have proven to be a promising approach for modeling spatiotemporal sys- tems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense—these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive er- ror growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather fore- casting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our code for reproducibility.
more »
« less
- Award ID(s):
- 1835825
- PAR ID:
- 10491192
- Publisher / Repository:
- OpenReview
- Date Published:
- Journal Name:
- Transactions on machine learning research
- ISSN:
- 2835-8856
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for regional sea-suface height emulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill compared to a state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate initial steps for physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.more » « less
-
Epilepsy affects approximately 50 million people worldwide. Despite its prevalence, the recurrence of seizures can be mitigated only 70% of the time through medication. Furthermore, surgery success rates range from 30% - 70% because of our limited understanding of how a seizure starts. However, one leading hypothesis suggests that a seizure starts because of a critical transition due to an instability. Unfortunately, we lack a meaningful way to quantify this notion that would allow physicians to not only better predict seizures but also to mitigate them. Hence, in this paper, we develop a method to not only characterize the instability of seizures but also to leverage these conditions to stabilize the system underlying these seizures. Remarkably, evidence suggests that such critical transitions are associated with long-term memory dynamics, which can be captured by considering linear fractional-order systems. Subsequently, we provide for the first time tractable necessary and sufficient conditions for the global asymptotic stability of discrete-time linear fractional-order systems. Next, we propose a method to obtain a stabilizing control strategy for these systems using linear matrix inequalities. Finally, we apply our methodology to a real-world epileptic patient dataset to provide insight into mitigating epilepsy and designing future cyber-neural systems.more » « less
-
Abstract A linear two-layer model is used to elucidate the role of prognostic moisture on quasigeostrophic (QG) motions in the presence of a mean thermal wind (). Solutions to the basic equations reveal two instabilities that can explain the growth of moist QG systems. The well-documented baroclinic instability is characterized by growth at the synoptic scale (horizontal scale of ~1000 km) and systems that grow from this instability tilt against the shear. Moisture–vortex instability—an instability that occurs when moisture and lower-tropospheric vorticity exhibit an in-phase component—exists only when moisture is prognostic. The instability is also strongest at the synoptic scale, but systems that grow from it exhibit a vertically stacked structure. When moisture is prognostic andis easterly, baroclinic instability exhibits a pronounced weakening while moisture vortex instability is amplified. The strengthening of moisture–vortex instability at the expense of baroclinic instability is due to the baroclinic () component of the lower-tropospheric flow. In westward-propagating systems, lower-tropospheric westerlies associated with an easterlyadvect anomalous moisture and the associated convection toward the low-level vortex. The advected convection causes the vertical structure of the wave to shift away from one that favors baroclinic instability to one that favors moisture–vortex instability. On the other hand, a westerlyreinforces the phasing between moisture and vorticity necessary for baroclinic instability to occur. Based on these results, it is hypothesized that moisture–vortex instability is an important instability in humid regions of easterlysuch as the South Asian and West African monsoons.more » « less
-
Abstract Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.more » « less
An official website of the United States government

