skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The misleading certainty of uncertain data in biological network processes
Mathematical models are often used to study the structure and dynamics of network-driven cellular processes. In cell biology, models representing biochemical reaction networks have provided significant insights but are often plagued by a dearth of available quantitative data necessary for simulation and analysis. This has in turn led to questions about the usefulness of biochemical network models with unidentifiable parameters and high-degree of parameter sloppiness. In response, approaches to incorporate highly-available non-quantitative data and use this data to improve model certainty have been undertaken with various degrees of success. Here we employ a Bayesian inference and Machine Learning approach to first explore how quantitative and non-quantitative data can constrain a mechanistic model of apoptosis execution, in which all models can be identified. We find that two orders of magnitude more ordinal data measurements than those typically collected are necessary to achieve the same accuracy as that obtained from a quantitative dataset. We also find that ordinal and nominal non-quantitative data on their own can be combined to reduce model uncertainty and thus improve model accuracy. Further analysis demonstrates that the accuracy and certainty of model predictions strongly depends on accurate formulations of the measurement as well as the size and make-up of the nonquantitative datasets. Finally, we demonstrate the potential of a data-driven Machine Learning measurement model to identify informative mechanistic features that predict or define nonquantitative cellular phenotypes, from a systems perspective.  more » « less
Award ID(s):
1942255
PAR ID:
10302780
Author(s) / Creator(s):
Date Published:
Journal Name:
bioRxiv
ISSN:
2692-8205
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning presents opportunities to improve the scale-specific accuracy of mechanistic models in a data-driven manner. Here we demonstrate the use of a machine learning technique called Sparse Identification of Nonlinear Dynamics (SINDy) to improve a simple mechanistic model of algal growth. Time-series measurements of the microalga Chlorella Vulgaris were generated under controlled photobioreactor conditions at the University of Technology Sydney. A simple mechanistic growth model based on intensity of light and temperature was integrated over time and compared to the time-series data. While the mechanistic model broadly captured the overall growth trend, discrepancies remained between the model and data due to the model's simplicity and non-ideal behavior of real-world measurement. SINDy was applied to model the residual error by identifying an error derivative correction term. Addition of this SINDy-informed error dynamics term shows improvement to model accuracy while maintaining interpretability of the underlying mechanistic framework. This work demonstrates the potential for machine learning techniques like SINDy to aid simple mechanistic models in scale-specific predictive accuracy. 
    more » « less
  2. Cellular service carriers often employ reactive strategies to assist customers who experience non-outage related individual service degradation issues (e.g., service performance degradations that do not impact customers at scale and are likely caused by network provisioning issues for individual devices). Customers need to contact customer care to request assistance before these issues are resolved. This paper presents our experience with PACE (ProActive customer CarE), a novel, proactive system that monitors, troubleshoots and resolves individual service issues, without having to rely on customers to first contact customer care for assistance. PACE seeks to improve customer experience and care operation efficiency by automatically detecting individual (non-outage related) service issues, prioritizing repair actions by predicting customers who are likely to contact care to report their issues, and proactively triggering actions to resolve these issues. We develop three machine learning-based prediction models, and implement a fully automated system that integrates these prediction models and takes resolution actions for individual customers.We conduct a large-scale trace-driven evaluation using real-world data collected from a major cellular carrier in the US, and demonstrate that PACE is able to predict customers who are likely to contact care due to non-outage related individual service issues with high accuracy. We further deploy PACE into this cellular carrier network. Our field trial results show that PACE is effective in proactively resolving non-outage related individual customer service issues, improving customer experience, and reducing the need for customers to report their service issues. 
    more » « less
  3. null (Ed.)
    The study of complex biological systems necessitates computational modeling approaches that are currently underutilized in plant biology. Many plant biologists have trouble identifying or adopting modeling methods to their research, particularly mechanistic mathematical modeling. Here we address challenges that limit the use of computational modeling methods, particularly mechanistic mathematical modeling. We divide computational modeling techniques into either pattern models (e.g., bioinformatics, machine learning, or morphology) or mechanistic mathematical models (e.g., biochemical reactions, biophysics, or population models), which both contribute to plant biology research at different scales to answer different research questions. We present arguments and recommendations for the increased adoption of modeling by plant biologists interested in incorporating more modeling into their research programs. As some researchers find math and quantitative methods to be an obstacle to modeling, we provide suggestions for easy-to-use tools for non-specialists and for collaboration with specialists. This may especially be the case for mechanistic mathematical modeling, and we spend some extra time discussing this. Through a more thorough appreciation and awareness of the power of different kinds of modeling in plant biology, we hope to facilitate interdisciplinary, transformative research. 
    more » « less
  4. Calculation of protein–ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein–ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host–guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the “accuracy” of the pure data-driven model. In addition, the “interpretability” and “transferability” of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features. 
    more » « less
  5. The Science Demilitarized Zone (Science DMZ) is a network environment optimized for scientific applications. The Science DMZ model provides a reference set of network design patterns, tuned hosts and protocol stacks dedicated to large data transfers and streamlined security postures that significantly improve data transfer performance, accelerating scientific collaboration and discovery. Over the past decade, many universities and organizations have adopted this model for their research computing. Despite becoming increasingly popular, there is a lack of quantitative studies comparing such a specialized network to conventional production networks regarding network characteristics and data transfer performance. But does a Science DMZ exhibit significantly different behavior than a general-purpose campus network? Does it improve application performance compared a to general-purpose network? Through a two-year-long quantitative network measurement study, we find that a Science DMZ exhibits lower latency, higher throughput, and lower jitter behaviors. We also see several non-intuitive results. For example, a DMZ may take a longer route to external destinations and experience higher latency than the campus network. While the DMZ model benefits researchers, the benefits are not automatic, careful network tuning based on specific use cases is required to realize the full potential of Science DMZs. 
    more » « less