skip to main content


Title: Deploying Big Data to Crack the Genotype to Phenotype Code
Synopsis Mechanistically connecting genotypes to phenotypes is a longstanding and central mission of biology. Deciphering these connections will unite questions and datasets across all scales from molecules to ecosystems. Although high-throughput sequencing has provided a rich platform on which to launch this effort, tools for deciphering mechanisms further along the genome to phenome pipeline remain limited. Machine learning approaches and other emerging computational tools hold the promise of augmenting human efforts to overcome these obstacles. This vision paper is the result of a Reintegrating Biology Workshop, bringing together the perspectives of integrative and comparative biologists to survey challenges and opportunities in cracking the genotype to phenotype code and thereby generating predictive frameworks across biological scales. Key recommendations include promoting the development of minimum “best practices” for the experimental design and collection of data; fostering sustained and long-term data repositories; promoting programs that recruit, train, and retain a diversity of talent; and providing funding to effectively support these highly cross-disciplinary efforts. We follow this discussion by highlighting a few specific transformative research opportunities that will be advanced by these efforts.  more » « less
Award ID(s):
1927470
NSF-PAR ID:
10196274
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Integrative and Comparative Biology
Volume:
60
Issue:
2
ISSN:
1540-7063
Page Range / eLocation ID:
385 to 396
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP). 
    more » « less
  2. Abstract

    Many research and monitoring networks in recent decades have provided publicly available data documenting environmental and ecological change, but little is known about the status of efforts to synthesize this information across networks. We convened a working group to assess ongoing and potential cross‐network synthesis research and outline opportunities and challenges for the future, focusing on the US‐based research network (the US Long‐Term Ecological Research network, LTER) and monitoring network (the National Ecological Observatory Network, NEON). LTER‐NEON cross‐network research synergies arise from the potentials for LTER measurements, experiments, models, and observational studies to provide context and mechanisms for interpreting NEON data, and for NEON measurements to provide standardization and broad scale coverage that complement LTER studies. Initial cross‐network syntheses at co‐located sites in the LTER and NEON networks are addressing six broad topics: how long‐term vegetation change influences C fluxes; how detailed remotely sensed data reveal vegetation structure and function; aquatic‐terrestrial connections of nutrient cycling; ecosystem response to soil biogeochemistry and microbial processes; population and species responses to environmental change; and disturbance, stability and resilience. This initial study offers exciting potentials for expanded cross‐network syntheses involving multiple long‐term ecosystem processes at regional or continental scales. These potential syntheses could provide a pathway for the broader scientific community, beyond LTER and NEON, to engage in cross‐network science. These examples also apply to many other research and monitoring networks in the US and globally, and can guide scientists and research administrators in promoting broad‐scale research that supports resource management and environmental policy.

     
    more » « less
  3. The Engineering Research Centers (ERCs), funded by the National Science Foundation (NSF), play an important role in improving engineering education, bridging engineering academia and broad communities, and promoting a culture of diversity and inclusion. Each ERC must partner with an independent evaluation team to annually assess their performance and impact on progressing education, connecting community, and building diversified culture. This evaluation is currently performed independently (and in isolation), which leads to inconsistent evaluations and a redundant investment of ERCs’ resources into such tasks (e.g. developing evaluation instruments). These isolated efforts by ERCs to quantitatively evaluate their education programs also typically lack adequate sample size within a single center, which limits the validity and reliability of the quantitative analyses. Three ERCs, all associated with a large southwest university in the United States, worked collaboratively to overcome sample size and measure inconsistency concerns by developing a common quantitative instrument that is capable of evaluating any ERC’s education and diversity impacts. The instrument is the result of a systematic process with comparing and contrasting each ERC’s existing evaluation tools, including surveys and interview protocols. This new, streamlined tool captures participants’ overall experience as part of the ERC by measuring various constructs including skillset development, perception of diversity and inclusion, future plans after participating in the ERC, and mentorship received from the ERC. Scales and embedded items were designed broadly for possible use with both yearlong (e.g. graduate and undergraduate student, and postdoctoral scholars) and summer program (Research Experience for Undergraduates, Research Experience for Teachers, and Young Scholar Program) participants. The instrument was distributed and tested during Summer 2019 with participants in the summer programs from all three ERCs. The forthcoming paper will present the new common cross-ERC evaluation instrument, demonstrate the effort of collecting data across all three ERCs, present preliminary findings, and discuss collaborative processes and challenges. The preliminary implication for this work is the ability to directly compare educational programs across ERCs. The authors also believe that this tool can provide a fast start for new ERCs on how to evaluate their educational programs. 
    more » « less
  4. Abstract

    Observing the environment in the vast regions of Earth through remote sensing platforms provides the tools to measure ecological dynamics. The Arctic tundra biome, one of the largest inaccessible terrestrial biomes on Earth, requires remote sensing across multiple spatial and temporal scales, from towers to satellites, particularly those equipped for imaging spectroscopy (IS). We describe a rationale for using IS derived from advances in our understanding of Arctic tundra vegetation communities and their interaction with the environment. To best leverage ongoing and forthcoming IS resources, including National Aeronautics and Space Administration’s Surface Biology and Geology mission, we identify a series of opportunities and challenges based on intrinsic spectral dimensionality analysis and a review of current data and literature that illustrates the unique attributes of the Arctic tundra biome. These opportunities and challenges include thematic vegetation mapping, complicated by low‐stature plants and very fine‐scale surface composition heterogeneity; development of scalable algorithms for retrieval of canopy and leaf traits; nuanced variation in vegetation growth and composition that complicates detection of long‐term trends; and rapid phenological changes across brief growing seasons that may go undetected due to low revisit frequency or be obscured by snow cover and clouds. We recommend improvements to future field campaigns and satellite missions, advocating for research that combines multi‐scale spectroscopy, from lab studies to satellites that enable frequent and continuous long‐term monitoring, to inform statistical and biophysical approaches to model vegetation dynamics.

     
    more » « less
  5. Abstract

    It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operation and maintenance. In this overview, the history of and foundational thinking around NEON are discussed. A framework of open science is described with a discussion of how NEON can be situated as part of a larger data constellation—across existing networks and different suites of ecological measurements and sensors. Next, a synthesis of early NEON science, based on >100 existing publications, funded proposal efforts, and emergent science at the very first NEON Science Summit (hosted by Earth Lab at the University of Colorado Boulder in October 2019) is provided. Key questions that the ecology community will address with NEON data in the next 10 yr are outlined, from understanding drivers of biodiversity across spatial and temporal scales to defining complex feedback mechanisms in human–environmental systems. Last, the essential elements needed to engage and support a diverse and inclusive NEON user community are highlighted: training resources and tools that are openly available, funding for broad community engagement initiatives, and a mechanism to share and advertise those opportunities. NEON users require both the skills to work with NEON data and the ecological or environmental science domain knowledge to understand and interpret them. This paper synthesizes early directions in the community’s use of NEON data, and opportunities for the next 10 yr of NEON operations in emergent science themes, open science best practices, education and training, and community building.

     
    more » « less