skip to main content

Title: Key challenges facing data-driven multicellular systems biology
Abstract Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment.
Authors:
Award ID(s):
1720625
Publication Date:
NSF-PAR ID:
10188159
Journal Name:
GigaScience
Volume:
8
Issue:
10
ISSN:
2047-217X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning is increasingly recognized as a promising technology in the biological, biomedical, and behavioral sciences. There can be no argument that this technique is incredibly successful in image recognition with immediate applications in diagnostics including electrophysiology, radiology, or pathology, where we have access to massive amounts of annotated data. However, machine learning often performs poorly in prognosis, especially when dealing with sparse data. This is a field where classical physics-based simulation seems to remain irreplaceable. In this review, we identify areas in the biomedical sciences where machine learning and multiscale modeling can mutually benefit from one another: Machine learning can integrate physics-based knowledge in the form of governing equations, boundary conditions, or constraints to manage ill-posted problems and robustly handle sparse and noisy data; multiscale modeling can integrate machine learning to create surrogate models, identify system dynamics and parameters, analyze sensitivities, and quantify uncertainty to bridge the scales and understand the emergence of function. With a view towards applications in the life sciences, we discuss the state of the art of combining machine learning and multiscale modeling, identify applications and opportunities, raise open questions, and address potential challenges and limitations. This review serves as introduction to a special issuemore »on Uncertainty Quantification, Machine Learning, and Data-Driven Modeling of Biological Systems that will help identify current roadblocks and areas where computational mechanics, as a discipline, can play a significant role. We anticipate that it will stimulate discussion within the community of computational mechanics and reach out to other disciplines including mathematics, statistics, computer science, artificial intelligence, biomedicine, systems biology, and precision medicine to join forces towards creating robust and efficient models for biological systems.« less
  2. ABSTRACT Single mutations frequently alter several aspects of cell behavior but rarely reveal whether a particular statistically significant change is biologically significant. To determine which behavioral changes are most important for multicellular self-organization, we devised a new methodology using Myxococcus xanthus as a model system. During development, myxobacteria coordinate their movement to aggregate into spore-filled fruiting bodies. We investigate how aggregation is restored in two mutants, csgA and pilC , that cannot aggregate unless mixed with wild-type (WT) cells. To this end, we use cell tracking to follow the movement of fluorescently labeled cells in combination with data-driven agent-based modeling. The results indicate that just like WT cells, both mutants bias their movement toward aggregates and reduce motility inside aggregates. However, several aspects of mutant behavior remain uncorrected by WT, demonstrating that perfect recreation of WT behavior is unnecessary. In fact, synergies between errant behaviors can make aggregation robust. IMPORTANCE Self-organization into spatial patterns is evident in many multicellular phenomena. Even for the best-studied systems, our ability to dissect the mechanisms driving coordinated cell movement is limited. While genetic approaches can identify mutations perturbing multicellular patterns, the diverse nature of the signaling cues coupled to significant heterogeneity of individual cellmore »behavior impedes our ability to mechanistically connect genes with phenotype. Small differences in the behaviors of mutant strains could be irrelevant or could sometimes lead to large differences in the emergent patterns. Here, we investigate rescue of multicellular aggregation in two mutant strains of Myxococcus xanthus mixed with wild-type cells. The results demonstrate how careful quantification of cell behavior coupled to data-driven modeling can identify specific motility features responsible for cell aggregation and thereby reveal important synergies and compensatory mechanisms. Notably, mutant cells do not need to precisely recreate wild-type behaviors to achieve complete aggregation.« less
  3. Abstract

    The discovery of new drugs is a time consuming and expensive process. Methods such as virtual screening, which can filter out ineffective compounds from drug libraries prior to expensive experimental study, have become popular research topics. As the computational drug discovery community has grown, in order to benchmark the various advances in methodology, organizations such as the Drug Design Data Resource have begun hosting blinded grand challenges seeking to identify the best methods for ligand pose-prediction, ligand affinity ranking, and free energy calculations. Such open challenges offer a unique opportunity for researchers to partner with junior students (e.g., high school and undergraduate) to validate basic yet fundamental hypotheses considered to be uninteresting to domain experts. Here, we, a group of high school-aged students and their mentors, present the results of our participation in Grand Challenge 4 where we predicted ligand affinity rankings for the Cathepsin S protease, an important protein target for autoimmune diseases. To investigate the effect of incorporating receptor dynamics on ligand affinity rankings, we employed the Relaxed Complex Scheme, a molecular docking method paired with molecular dynamics-generated receptor conformations. We found that Cathepsin S is a difficult target for molecular docking and we explore some advancedmore »methods such as distance-restrained docking to try to improve the correlation with experiments. This project has exemplified the capabilities of high school students when supported with a rigorous curriculum, and demonstrates the value of community-driven competitions for beginners in computational drug discovery.

    « less
  4. Abstract. Although the concepts of nonuniform sampling (NUS​​​​​​​) and non-Fourier spectral reconstruction in multidimensional NMR began to emerge 4 decades ago (Bodenhausen and Ernst, 1981; Barna and Laue, 1987), it is only relatively recently that NUS has become more commonplace. Advantages of NUS include the ability to tailor experiments to reduce data collection time and to improve spectral quality, whether through detection of closely spaced peaks (i.e., “resolution”) or peaks of weak intensity (i.e., “sensitivity”). Wider adoption of these methods is the result of improvements in computational performance, a growing abundance and flexibility of software, support from NMR spectrometer vendors, and the increased data sampling demands imposed by higher magnetic fields. However, the identification of best practices still remains a significant and unmet challenge. Unlike the discrete Fourier transform, non-Fourier methods used to reconstruct spectra from NUS data are nonlinear, depend on the complexity and nature of the signals, and lack quantitative or formal theory describing their performance. Seemingly subtle algorithmic differences may lead to significant variabilities in spectral qualities and artifacts. A community-based critical assessment of NUS challenge problems has been initiated, called the “Nonuniform Sampling Contest” (NUScon), with the objective of determining best practices for processing and analyzing NUS experiments.more »We address this objective by constructing challenges from NMR experiments that we inject with synthetic signals, and we process these challenges using workflows submitted by the community. In the initial rounds of NUScon our aim is to establish objective criteria for evaluating the quality of spectral reconstructions. We present here a software package for performing the quantitative analyses, and we present the results from the first two rounds of NUScon. We discuss the challenges that remain and present a roadmap for continued community-driven development with the ultimate aim of providing best practices in this rapidly evolving field. The NUScon software package and all data from evaluating the challenge problems are hosted on the NMRbox platform.« less
  5. Adoption of data and compute-intensive research in geosciences is hindered by the same social and technological reasons as other science disciplines - we're humans after all. As a result, many of the new opportunities to advance science in today's rapidly evolving technology landscape are not approachable by domain geoscientists. Organizations must acknowledge and actively mitigate these intrinsic biases and knowledge gaps in their users and staff. Over the past ten years, CyVerse (www.cyverse.org) has carried out the mission "to design, deploy, and expand a national cyberinfrastructure for life sciences research, and to train scientists in its use." During this time, CyVerse has supported and enabled transdisciplinary collaborations across institutions and communities, overseen many successes, and encountered failures. Our lessons learned in user engagement, both social and technical, are germane to the problems facing the geoscience community today. A key element of overcoming social barriers is to set up an effective education, outreach, and training (EOT) team to drive initial adoption as well as continued use. A strong EOT group can reach new users, particularly those in under-represented communities, reduce power distance relationships, and mitigate users' uncertainty avoidance toward adopting new technology. Timely user support across the life of a project,more »based on mutual respect between the developers' and researchers' different skill sets, is critical to successful collaboration. Without support, users become frustrated and abandon research questions whose technical issues require solutions that are 'simple' from a developer's perspective, but are unknown by the scientist. At CyVerse, we have found there is no one solution that fits all research challenges. Our strategy has been to maintain a system of systems (SoS) where users can choose 'lego-blocks' to build a solution that matches their problem. This SoS ideology has allowed CyVerse users to extend and scale workflows without becoming entangled in problems which reduce productivity and slow scientific discovery. Likewise, CyVerse addresses the handling of data through its entire lifecycle, from creation to publication to future reuse, supporting community driven big data projects and individual researchers.« less