skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Toward FAIR Representations of Microbial Interactions
Despite an ever-growing number of data sets that catalog and characterize interactions between microbes in different environments and conditions, many of these data are neither easily accessible nor intercompatible. These limitations present a major challenge to microbiome research by hindering the streamlined drawing of inferences across studies.  more » « less
Award ID(s):
2019589
PAR ID:
10559554
Author(s) / Creator(s):
; ; ;
Editor(s):
Wolfe, Benjamin E
Publisher / Repository:
ASM Journals
Date Published:
Journal Name:
mSystems
Volume:
7
Issue:
5
ISSN:
2379-5077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract AimBiogeographers have used three primary data types to examine shifts in tree ranges in response to past climate change: fossil pollen, genetic data and contemporary occurrences. Although recent efforts have explored formal integration of these types of data, we have limited understanding of how integration affects estimates of range shift rates and their uncertainty. We compared estimates of biotic velocity (i.e. rate of species' range shifts) using each data type independently to estimates obtained using integrated models. LocationEastern North America. TaxonFraxinus pennsylvanicaMarshall (green ash). MethodsUsing fossil pollen, genomic data and modern occurrence data, we estimated biotic velocities directly from 24 species distribution models (SDMs) and 200 pollen surfaces created with a novel Bayesian spatio‐temporal model. We compared biotic velocity from these analyses to estimates based on coupled demographic‐coalescent simulations and Approximate Bayesian Computation that combined fossil pollen and SDMs with population genomic data collected across theF. pennsylvanicarange. ResultsPatterns and magnitude of biotic velocity over time varied by the method used to estimate past range dynamics. Estimates based on fossil pollen yielded the highest rates of range movement. Overall, integrating genetic data with other data types in our simulation‐based framework reduced apparent uncertainty in biotic velocity estimates and resulted in greater similarity in estimates between SDM‐ and pollen‐integrated analyses. Main ConclusionsBy reducing uncertainty in our assessments of range shifts, integration of data types improves our understanding of the past distribution of species. Based on these results, we propose further steps to reach the integration of these three lines of biogeographical evidence into a unified analytical framework. 
    more » « less
  2. Abstract Stable hydrogen and oxygen isotopic compositions (δ2H and δ18O, respectively) of animal tissues have been used to infer geographical origin or mobility based on the premise that the isotopic composition of tissue is systematically related to that of local water sources. Isotopic data for known‐origin samples are required to quantify these tissue–environment relationships. Although many of such data have been published and could be reused by researchers, differences in the standards used for calibration and analytical procedures for different datasets limit the comparability of these data.We develop an algorithm that uses results from comparative analysis of secondary standards to transform data among reference scales and estimate the uncertainty inherent in these transformations. We apply the algorithm to a compilation of known‐origin keratin data published over the past ~20 years.We show that transformation improves the comparability of data from different laboratories, and that the transformed data suggest ecophysiologically meaningful differences in keratin–water relationships among different animal groups and taxa.The compiled data and algorithms are freely available in the ASSIGNRr‐package to support geographical provenance research, and more generally offer a methodology overcoming several challenges in geochemical data integration and reuse. 
    more » « less
  3. Abstract Functional data analysis is an evolving field focused on analyzing data that reveals insights into curves, surfaces, or entities within a continuous domain. This type of data is typically distinguished by the inherent dependence and smoothness observed within each data curve. Traditional functional data analysis approaches have predominantly relied on linear models, which, while foundational, often fall short in capturing the intricate, nonlinear relationships within the data. This paper seeks to bridge this gap by reviewing the integration of deep neural networks into functional data analysis. Deep neural networks present a transformative approach to navigating these complexities, excelling particularly in high‐dimensional spaces and demonstrating unparalleled flexibility in managing diverse data constructs. This review aims to advance functional data regression, classification, and representation by integrating deep neural networks with functional data analysis, fostering a harmonious and synergistic union between these two fields. The remarkable ability of deep neural networks to adeptly navigate the intricate functional data highlights a wealth of opportunities for ongoing exploration and research across various interdisciplinary areas. This article is categorized under:Data: Types and Structure > Time Series, Stochastic Processes, and Functional DataStatistical Learning and Exploratory Methods of the Data Sciences > Deep LearningStatistical Learning and Exploratory Methods of the Data Sciences > Neural Networks 
    more » « less
  4. Abstract PremisePlant trait data are essential for quantifying biodiversity and function across Earth, but these data are challenging to acquire for large studies. Diverse strategies are needed, including the liberation of heritage data locked within specialist literature such as floras and taxonomic monographs. Here we report FloraTraiter, a novel approach using rule‐based natural language processing (NLP) to parse computable trait data from biodiversity literature. MethodsFloraTraiter was implemented through collaborative work between programmers and botanical experts and customized for both online floras and scanned literature. We report a strategy spanning optical character recognition, recognition of taxa, iterative building of traits, and establishing linkages among all of these, as well as curational tools and code for turning these results into standard morphological matrices. ResultsOver 95% of treatment content was successfully parsed for traits with <1% error. Data for more than 700 taxa are reported, including a demonstration of common downstream uses. ConclusionsWe identify strategies, applications, tips, and challenges that we hope will facilitate future similar efforts to produce large open‐source trait data sets for broad community reuse. Largely automated tools like FloraTraiter will be an important addition to the toolkit for assembling trait data at scale. 
    more » « less
  5. Abstract Applications of observational data to understand smallholder farming systems have increased as the accuracy of these data has improved. The more precise observational methods become, the more discrepancies between observational data and farmer perceptions arise. These discrepancies demonstrate the prevalence of heuristics and biases, highlighting the problematic ways we study fundamental decisions around agricultural input use, production outcomes, and perceptions about weather and climate. 
    more » « less