skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning nonparametric ordinary differential equations from noisy data
Learning nonparametric systems of Ordinary Differential Equations (ODEs) $$\dot x = f(t,x)$$ from noisy data is an emerging machine learning topic. We use the well-developed theory of Reproducing Kernel Hilbert Spaces (RKHS) to define candidates for $$f$$ for which the solution of the ODE exists and is unique. Learning $$f$$ consists of solving a constrained optimization problem in an RKHS. We propose a penalty method that iteratively uses the Representer theorem and Euler approximations to provide a numerical solution. We prove a generalization bound for the $L^2$ distance between $$x$$ and its estimator. Experiments are provided for the FitzHugh–Nagumo oscillator, the Lorenz system, and for predicting the Amyloid level in the cortex of aging subjects. In all cases, we show competitive results compared with the state-of-the-art.  more » « less
Award ID(s):
2136228
PAR ID:
10499694
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Journal of Computational Physics
Volume:
507
Issue:
C
ISSN:
0021-9991
Page Range / eLocation ID:
112971
Subject(s) / Keyword(s):
Nonlinear dynamical systems System identification Kernel methods Penalty method Amyloid accumulation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Distributionally robust optimization (DRO) is a powerful tool for decision making under uncertainty. It is particularly appealing because of its ability to leverage existing data. However, many practical problems call for decision- making with some auxiliary information, and DRO in the context of conditional distributions is not straightforward. We propose a conditional kernel distributionally robust optimiza- tion (CKDRO) method that enables robust decision making under conditional distributions through kernel DRO and the conditional mean operator in the reproducing kernel Hilbert space (RKHS). In particular, we consider problems where there is a correlation between the unknown variable y and an auxiliary observable variable x. Given past data of the two variables and a queried auxiliary variable, CKDRO represents the conditional distribution P(y|x) as the conditional mean operator in the RKHS space and quantifies the ambiguity set in the RKHS as well, which depends on the size of the dataset as well as the query point. To justify the use of RKHS, we demonstrate that the ambiguity set defined in RKHS can be viewed as a ball under a metric that is similar to the Wasserstein metric. The DRO is then dualized and solved via a finite dimensional convex program. The proposed CKDRO approach is applied to a generation scheduling problem and shows that the result of CKDRO is superior to common benchmarks in terms of quality and robustness. 
    more » « less
  2. Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance. 
    more » « less
  3. Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance. 
    more » « less
  4. A function f∶{0,1}n→ {0,1} is called an approximate AND-homomorphism if choosing x,y∈n uniformly at random, we have that f(x∧ y) = f(x)∧ f(y) with probability at least 1−ε, where x∧ y = (x1∧ y1,…,xn∧ yn). We prove that if f∶ {0,1}n → {0,1} is an approximate AND-homomorphism, then f is δ-close to either a constant function or an AND function, where δ(ε) → 0 as ε→ 0. This improves on a result of Nehama, who proved a similar statement in which δ depends on n. Our theorem implies a strong result on judgement aggregation in computational social choice. In the language of social choice, our result shows that if f is ε-close to satisfying judgement aggregation, then it is δ(ε)-close to an oligarchy (the name for the AND function in social choice theory). This improves on Nehama’s result, in which δ decays polynomially with n. Our result follows from a more general one, in which we characterize approximate solutions to the eigenvalue equation f = λ g, where is the downwards noise operator f(x) = y[f(x ∧ y)], f is [0,1]-valued, and g is {0,1}-valued. We identify all exact solutions to this equation, and show that any approximate solution in which f and λ g are close is close to an exact solution. 
    more » « less
  5. We present a new algorithm for learning unknown gov- erning equations from trajectory data, using a family of neural net- works. Given samples of solutions x(t) to an unknown dynamical system x ̇ (t) = f (t, x(t)), we approximate the function f using a family of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iter- ation a different neural network as a prior for f. This procedure yields M-1 time-independent networks, where M is the number of time steps at which x(t) is observed. Finally, we obtain a single function f(t,x(t)) by neural network interpolation. Unlike our earlier work, where we numer- ically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate f, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples in a high-noise setting. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in the high-noise regime, when numerical differentiation provides low qual- ity target data. Finally, we compare our results with other state of the art methods for system identification. 
    more » « less