skip to main content


Title: Key challenges facing data-driven multicellular systems biology
Abstract Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment.  more » « less
Award ID(s):
1720625
NSF-PAR ID:
10188159
Author(s) / Creator(s):
Date Published:
Journal Name:
GigaScience
Volume:
8
Issue:
10
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning is increasingly recognized as a promising technology in the biological, biomedical, and behavioral sciences. There can be no argument that this technique is incredibly successful in image recognition with immediate applications in diagnostics including electrophysiology, radiology, or pathology, where we have access to massive amounts of annotated data. However, machine learning often performs poorly in prognosis, especially when dealing with sparse data. This is a field where classical physics-based simulation seems to remain irreplaceable. In this review, we identify areas in the biomedical sciences where machine learning and multiscale modeling can mutually benefit from one another: Machine learning can integrate physics-based knowledge in the form of governing equations, boundary conditions, or constraints to manage ill-posted problems and robustly handle sparse and noisy data; multiscale modeling can integrate machine learning to create surrogate models, identify system dynamics and parameters, analyze sensitivities, and quantify uncertainty to bridge the scales and understand the emergence of function. With a view towards applications in the life sciences, we discuss the state of the art of combining machine learning and multiscale modeling, identify applications and opportunities, raise open questions, and address potential challenges and limitations. This review serves as introduction to a special issue on Uncertainty Quantification, Machine Learning, and Data-Driven Modeling of Biological Systems that will help identify current roadblocks and areas where computational mechanics, as a discipline, can play a significant role. We anticipate that it will stimulate discussion within the community of computational mechanics and reach out to other disciplines including mathematics, statistics, computer science, artificial intelligence, biomedicine, systems biology, and precision medicine to join forces towards creating robust and efficient models for biological systems. 
    more » « less
  2. ABSTRACT Single mutations frequently alter several aspects of cell behavior but rarely reveal whether a particular statistically significant change is biologically significant. To determine which behavioral changes are most important for multicellular self-organization, we devised a new methodology using Myxococcus xanthus as a model system. During development, myxobacteria coordinate their movement to aggregate into spore-filled fruiting bodies. We investigate how aggregation is restored in two mutants, csgA and pilC , that cannot aggregate unless mixed with wild-type (WT) cells. To this end, we use cell tracking to follow the movement of fluorescently labeled cells in combination with data-driven agent-based modeling. The results indicate that just like WT cells, both mutants bias their movement toward aggregates and reduce motility inside aggregates. However, several aspects of mutant behavior remain uncorrected by WT, demonstrating that perfect recreation of WT behavior is unnecessary. In fact, synergies between errant behaviors can make aggregation robust. IMPORTANCE Self-organization into spatial patterns is evident in many multicellular phenomena. Even for the best-studied systems, our ability to dissect the mechanisms driving coordinated cell movement is limited. While genetic approaches can identify mutations perturbing multicellular patterns, the diverse nature of the signaling cues coupled to significant heterogeneity of individual cell behavior impedes our ability to mechanistically connect genes with phenotype. Small differences in the behaviors of mutant strains could be irrelevant or could sometimes lead to large differences in the emergent patterns. Here, we investigate rescue of multicellular aggregation in two mutant strains of Myxococcus xanthus mixed with wild-type cells. The results demonstrate how careful quantification of cell behavior coupled to data-driven modeling can identify specific motility features responsible for cell aggregation and thereby reveal important synergies and compensatory mechanisms. Notably, mutant cells do not need to precisely recreate wild-type behaviors to achieve complete aggregation. 
    more » « less
  3. Abstract

    The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability.

    This article is categorized under:

    Data Science > Artificial Intelligence/Machine Learning

    Molecular and Statistical Mechanics > Molecular Interactions

    Software > Molecular Modeling

     
    more » « less
  4. There are more than 7,000 public transit agencies in the U.S. (and many more private agencies), and together, they are responsible for serving 60 billion passenger miles each year. A well-functioning transit system fosters the growth and expansion of businesses, distributes social and economic benefits, and links the capabilities of community members, thereby enhancing what they can accomplish as a society. Since affordable public transit services are the backbones of many communities, this work investigates ways in which Artificial Intelligence (AI) can improve efficiency and increase utilization from the perspective of transit agencies. This book chapter discusses the primary requirements, objectives, and challenges related to the design of AI-driven smart transportation systems. We focus on three major topics. First, we discuss data sources and data. Second, we provide an overview of how AI can aid decision-making with a focus on transportation. Lastly, we discuss computational problems in the transportation domain and AI approaches to these problems. 
    more » « less
  5. Abstract

    The discovery of new drugs is a time consuming and expensive process. Methods such as virtual screening, which can filter out ineffective compounds from drug libraries prior to expensive experimental study, have become popular research topics. As the computational drug discovery community has grown, in order to benchmark the various advances in methodology, organizations such as the Drug Design Data Resource have begun hosting blinded grand challenges seeking to identify the best methods for ligand pose-prediction, ligand affinity ranking, and free energy calculations. Such open challenges offer a unique opportunity for researchers to partner with junior students (e.g., high school and undergraduate) to validate basic yet fundamental hypotheses considered to be uninteresting to domain experts. Here, we, a group of high school-aged students and their mentors, present the results of our participation in Grand Challenge 4 where we predicted ligand affinity rankings for the Cathepsin S protease, an important protein target for autoimmune diseases. To investigate the effect of incorporating receptor dynamics on ligand affinity rankings, we employed the Relaxed Complex Scheme, a molecular docking method paired with molecular dynamics-generated receptor conformations. We found that Cathepsin S is a difficult target for molecular docking and we explore some advanced methods such as distance-restrained docking to try to improve the correlation with experiments. This project has exemplified the capabilities of high school students when supported with a rigorous curriculum, and demonstrates the value of community-driven competitions for beginners in computational drug discovery.

     
    more » « less