Machine learning presents opportunities to improve the scale-specific accuracy of mechanistic models in a data-driven manner. Here we demonstrate the use of a machine learning technique called Sparse Identification of Nonlinear Dynamics (SINDy) to improve a simple mechanistic model of algal growth. Time-series measurements of the microalga Chlorella Vulgaris were generated under controlled photobioreactor conditions at the University of Technology Sydney. A simple mechanistic growth model based on intensity of light and temperature was integrated over time and compared to the time-series data. While the mechanistic model broadly captured the overall growth trend, discrepancies remained between the model and data due to the model's simplicity and non-ideal behavior of real-world measurement. SINDy was applied to model the residual error by identifying an error derivative correction term. Addition of this SINDy-informed error dynamics term shows improvement to model accuracy while maintaining interpretability of the underlying mechanistic framework. This work demonstrates the potential for machine learning techniques like SINDy to aid simple mechanistic models in scale-specific predictive accuracy.
more »
« less
The misleading certainty of uncertain data in biological network processes
Mathematical models are often used to study the structure and dynamics of network-driven cellular processes. In cell biology, models representing biochemical reaction networks have provided significant insights but are often plagued by a dearth of available quantitative data necessary for simulation and analysis. This has in turn led to questions about the usefulness of biochemical network models with unidentifiable parameters and high-degree of parameter sloppiness. In response, approaches to incorporate highly-available non-quantitative data and use this data to improve model certainty have been undertaken with various degrees of success. Here we employ a Bayesian inference and Machine Learning approach to first explore how quantitative and non-quantitative data can constrain a mechanistic model of apoptosis execution, in which all models can be identified. We find that two orders of magnitude more ordinal data measurements than those typically collected are necessary to achieve the same accuracy as that obtained from a quantitative dataset. We also find that ordinal and nominal non-quantitative data on their own can be combined to reduce model uncertainty and thus improve model accuracy. Further analysis demonstrates that the accuracy and certainty of model predictions strongly depends on accurate formulations of the measurement as well as the size and make-up of the nonquantitative datasets. Finally, we demonstrate the potential of a data-driven Machine Learning measurement model to identify informative mechanistic features that predict or define nonquantitative cellular phenotypes, from a systems perspective.
more »
« less
- Award ID(s):
- 1942255
- PAR ID:
- 10302780
- Date Published:
- Journal Name:
- bioRxiv
- ISSN:
- 2692-8205
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Cellular service carriers often employ reactive strategies to assist customers who experience non-outage related individual service degradation issues (e.g., service performance degradations that do not impact customers at scale and are likely caused by network provisioning issues for individual devices). Customers need to contact customer care to request assistance before these issues are resolved. This paper presents our experience with PACE (ProActive customer CarE), a novel, proactive system that monitors, troubleshoots and resolves individual service issues, without having to rely on customers to first contact customer care for assistance. PACE seeks to improve customer experience and care operation efficiency by automatically detecting individual (non-outage related) service issues, prioritizing repair actions by predicting customers who are likely to contact care to report their issues, and proactively triggering actions to resolve these issues. We develop three machine learning-based prediction models, and implement a fully automated system that integrates these prediction models and takes resolution actions for individual customers.We conduct a large-scale trace-driven evaluation using real-world data collected from a major cellular carrier in the US, and demonstrate that PACE is able to predict customers who are likely to contact care due to non-outage related individual service issues with high accuracy. We further deploy PACE into this cellular carrier network. Our field trial results show that PACE is effective in proactively resolving non-outage related individual customer service issues, improving customer experience, and reducing the need for customers to report their service issues.more » « less
-
null (Ed.)The study of complex biological systems necessitates computational modeling approaches that are currently underutilized in plant biology. Many plant biologists have trouble identifying or adopting modeling methods to their research, particularly mechanistic mathematical modeling. Here we address challenges that limit the use of computational modeling methods, particularly mechanistic mathematical modeling. We divide computational modeling techniques into either pattern models (e.g., bioinformatics, machine learning, or morphology) or mechanistic mathematical models (e.g., biochemical reactions, biophysics, or population models), which both contribute to plant biology research at different scales to answer different research questions. We present arguments and recommendations for the increased adoption of modeling by plant biologists interested in incorporating more modeling into their research programs. As some researchers find math and quantitative methods to be an obstacle to modeling, we provide suggestions for easy-to-use tools for non-specialists and for collaboration with specialists. This may especially be the case for mechanistic mathematical modeling, and we spend some extra time discussing this. Through a more thorough appreciation and awareness of the power of different kinds of modeling in plant biology, we hope to facilitate interdisciplinary, transformative research.more » « less
-
Calculation of protein–ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein–ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host–guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the “accuracy” of the pure data-driven model. In addition, the “interpretability” and “transferability” of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features.more » « less
-
Abstract Milling is a critical manufacturing process to produce high-value components in aerospace, tooling, and automotive industries. However, milling is prone to chatter, a severe vibration that damages surface quality, cutting tools, and machines. Traditional experimental and mechanistic methods of chatter prediction have significant limitations. This study presents a data-driven machine learning (ML) model to predict and quantify milling chatter directly based on time-series vibration data. Three ML models, including hybrid long short-term memory (LSTM)—fully convolutional network (FCN) model, gated recurrent unit (GRU)—FCN model, and temporal convolutional network (TCN) models, have been developed and verified by incorporating milling parameters to enhance prediction accuracy and stability. Among the proposed models, the best-performing ML model (GRU-FCN) demonstrates strong performance in chatter prediction and severity quantification, providing actionable insights with improved computational efficiency. The integration of milling parameters into the ML model notably enhances the prediction accuracy and stability, proving particularly effective in real-time monitoring scenarios.more » « less
An official website of the United States government

