skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 21, 2026

Title: Data-driven model discovery and model selection for noisy biological systems
Biological systems exhibit complex dynamics that differential equations can often adeptly represent. Ordinary differential equation models are widespread; until recently their construction has required extensive prior knowledge of the system. Machine learning methods offer alternative means of model construction: differential equation models can be learnt from data via model discovery using sparse identification of nonlinear dynamics (SINDy). However, SINDy struggles with realistic levels of biological noise and is limited in its ability to incorporate prior knowledge of the system. We propose a data-driven framework for model discovery and model selection using hybrid dynamical systems: partial models containing missing terms. Neural networks are used to approximate the unknown dynamics of a system, enabling the denoising of the data while simultaneously learning the latent dynamics. Simulations from the fitted neural network are then used to infer models using sparse regression. We show, via model selection, that model discovery using hybrid dynamical systems outperforms alternative approaches. We find it possible to infer models correctly up to high levels of biological noise of different types. We demonstrate the potential to learn models from sparse, noisy data in application to a canonical cell state transition using data derived from single-cell transcriptomics. Overall, this approach provides a practical framework for model discovery in biology in cases where data are noisy and sparse, of particular utility when the underlying biological mechanisms are partially but incompletely known.  more » « less
Award ID(s):
2045327
PAR ID:
10573324
Author(s) / Creator(s):
; ;
Editor(s):
Alber, Mark
Publisher / Repository:
PLOS
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
21
Issue:
1
ISSN:
1553-7358
Page Range / eLocation ID:
e1012762
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the computational resources becoming available, data-driven methods have emerged as powerful means for equation discovery and model construction. Sparse regression methods such as SINDy (Sparse Identification for Nonlinear Dynamical Systems) can be used for developing reduced-order models of nonlinear systems. In this study, the authors examine how SINDy can be used for developing low-dimensional models for airfoil systems, which experience unsteady aerodynamic loads and flutter instabilities. For a system of multiple closely spaced airfoil oscillators, analytical models are not readily available to determine flutter instabilities, and one has to take recourse to experimental and numerical means. In this work, as a starting point, data collected through simulations of unsteady aerodynamics of a single airfoil oscillator system are considered and a reduced-order model is constructed based on this data. 
    more » « less
  2. IntroductionThe moment quantities associated with the nonlinear Schrödinger equation offer important insights into the evolution dynamics of such dispersive wave partial differential equation (PDE) models. The effective dynamics of the moment quantities are amenable to both analytical and numerical treatments. MethodsIn this paper, we present a data-driven approach associated with the “Sparse Identification of Nonlinear Dynamics” (SINDy) to capture the evolution behaviors of such moment quantities numerically. Results and DiscussionOur method is applied first to some well-known closed systems of ordinary differential equations (ODEs) which describe the evolution dynamics of relevant moment quantities. Our examples are, progressively, of increasing complexity and our findings explore different choices within the SINDy library. We also consider the potential discovery of coordinate transformations that lead to moment system closure. Finally, we extend considerations to settings where a closed analytical form of the moment dynamics is not available. 
    more » « less
  3. Modeling the non-linear dynamics of a system from measurement data accurately is an open challenge. Over the past few years, various tools such as SINDy and DySMHO have emerged as approaches to distill dynamics from data. However, challenges persist in accurately capturing dynamics of a system especially when the physical knowledge about the system is unknown. A promising solution is to use a hybrid paradigm, that combines mechanistic and black-box models to leverage their respective strengths. In this study, we combine a hybrid modeling paradigm with sparse regression, to develop and identify models simultaneously. Two methods are explored, considering varying complexities, data quality, and availability and by comparing different case studies. In the first approach, we integrate SINDy-discovered models with neural ODE structures, to model unknown physics. In the second approach, we employ Multifidelity Surrogate Models (MFSMs) to construct composite models comprised of SINDy-discovered models and error-correction models. 
    more » « less
  4. The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. In this paper, we argue that this makes SINDy a potentially useful tool for causal discovery and that existing tools for causal discovery can be used to dramatically improve the performance of SINDy as tool for robust sparse modeling and system identification. We then demonstrate empirically that augmenting the SINDy algorithm with tools from causal discovery can provides engineers with a tool for learning causally robust governing equations. 
    more » « less
  5. Abstract Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. Achieving this kind of interpretable system identification is even more difficult for partially observed systems. We propose a machine learning framework for discovering the governing equations of a dynamical system using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. The entire architecture is trained end-to-end by matching the higher-order symbolic time derivatives of the sparse symbolic model with finite difference estimates from the data. Our tests show that this method can successfully reconstruct the full system state and identify the equations of motion governing the underlying dynamics for a variety of ordinary differential equation (ODE) and partial differential equation (PDE) systems. 
    more » « less