skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Key challenges facing data-driven multicellular systems biology
Abstract Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment.  more » « less
Award ID(s):
1720625
NSF-PAR ID:
10188159
Author(s) / Creator(s):
Date Published:
Journal Name:
GigaScience
Volume:
8
Issue:
10
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Single mutations frequently alter several aspects of cell behavior but rarely reveal whether a particular statistically significant change is biologically significant. To determine which behavioral changes are most important for multicellular self-organization, we devised a new methodology using Myxococcus xanthus as a model system. During development, myxobacteria coordinate their movement to aggregate into spore-filled fruiting bodies. We investigate how aggregation is restored in two mutants, csgA and pilC , that cannot aggregate unless mixed with wild-type (WT) cells. To this end, we use cell tracking to follow the movement of fluorescently labeled cells in combination with data-driven agent-based modeling. The results indicate that just like WT cells, both mutants bias their movement toward aggregates and reduce motility inside aggregates. However, several aspects of mutant behavior remain uncorrected by WT, demonstrating that perfect recreation of WT behavior is unnecessary. In fact, synergies between errant behaviors can make aggregation robust. IMPORTANCE Self-organization into spatial patterns is evident in many multicellular phenomena. Even for the best-studied systems, our ability to dissect the mechanisms driving coordinated cell movement is limited. While genetic approaches can identify mutations perturbing multicellular patterns, the diverse nature of the signaling cues coupled to significant heterogeneity of individual cell behavior impedes our ability to mechanistically connect genes with phenotype. Small differences in the behaviors of mutant strains could be irrelevant or could sometimes lead to large differences in the emergent patterns. Here, we investigate rescue of multicellular aggregation in two mutant strains of Myxococcus xanthus mixed with wild-type cells. The results demonstrate how careful quantification of cell behavior coupled to data-driven modeling can identify specific motility features responsible for cell aggregation and thereby reveal important synergies and compensatory mechanisms. Notably, mutant cells do not need to precisely recreate wild-type behaviors to achieve complete aggregation. 
    more » « less
  2. Summary

    Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event‐driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on‐demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event‐driven model. The services are increasingly moving to an event‐driven model in the form of Function as a Service (FaaS) to compose services. An event‐driven runtime designed for data processing consists of well‐understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component‐based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.

     
    more » « less
  3. Machine learning is increasingly recognized as a promising technology in the biological, biomedical, and behavioral sciences. There can be no argument that this technique is incredibly successful in image recognition with immediate applications in diagnostics including electrophysiology, radiology, or pathology, where we have access to massive amounts of annotated data. However, machine learning often performs poorly in prognosis, especially when dealing with sparse data. This is a field where classical physics-based simulation seems to remain irreplaceable. In this review, we identify areas in the biomedical sciences where machine learning and multiscale modeling can mutually benefit from one another: Machine learning can integrate physics-based knowledge in the form of governing equations, boundary conditions, or constraints to manage ill-posted problems and robustly handle sparse and noisy data; multiscale modeling can integrate machine learning to create surrogate models, identify system dynamics and parameters, analyze sensitivities, and quantify uncertainty to bridge the scales and understand the emergence of function. With a view towards applications in the life sciences, we discuss the state of the art of combining machine learning and multiscale modeling, identify applications and opportunities, raise open questions, and address potential challenges and limitations. This review serves as introduction to a special issue on Uncertainty Quantification, Machine Learning, and Data-Driven Modeling of Biological Systems that will help identify current roadblocks and areas where computational mechanics, as a discipline, can play a significant role. We anticipate that it will stimulate discussion within the community of computational mechanics and reach out to other disciplines including mathematics, statistics, computer science, artificial intelligence, biomedicine, systems biology, and precision medicine to join forces towards creating robust and efficient models for biological systems. 
    more » « less
  4. null (Ed.)
    High-fidelity blood flow modelling is crucial for enhancing our understanding of cardiovascular disease. Despite significant advances in computational and experimental characterization of blood flow, the knowledge that we can acquire from such investigations remains limited by the presence of uncertainty in parameters, low resolution, and measurement noise. Additionally, extracting useful information from these datasets is challenging. Data-driven modelling techniques have the potential to overcome these challenges and transform cardiovascular flow modelling. Here, we review several data-driven modelling techniques, highlight the common ideas and principles that emerge across numerous such techniques, and provide illustrative examples of how they could be used in the context of cardiovascular fluid mechanics. In particular, we discuss principal component analysis (PCA), robust PCA, compressed sensing, the Kalman filter for data assimilation, low-rank data recovery, and several additional methods for reduced-order modelling of cardiovascular flows, including the dynamic mode decomposition and the sparse identification of nonlinear dynamics. All techniques are presented in the context of cardiovascular flows with simple examples. These data-driven modelling techniques have the potential to transform computational and experimental cardiovascular research, and we discuss challenges and opportunities in applying these techniques in the field, looking ultimately towards data-driven patient-specific blood flow modelling. 
    more » « less
  5. There are more than 7,000 public transit agencies in the U.S. (and many more private agencies), and together, they are responsible for serving 60 billion passenger miles each year. A well-functioning transit system fosters the growth and expansion of businesses, distributes social and economic benefits, and links the capabilities of community members, thereby enhancing what they can accomplish as a society. Since affordable public transit services are the backbones of many communities, this work investigates ways in which Artificial Intelligence (AI) can improve efficiency and increase utilization from the perspective of transit agencies. This book chapter discusses the primary requirements, objectives, and challenges related to the design of AI-driven smart transportation systems. We focus on three major topics. First, we discuss data sources and data. Second, we provide an overview of how AI can aid decision-making with a focus on transportation. Lastly, we discuss computational problems in the transportation domain and AI approaches to these problems. 
    more » « less