skip to main content


Title: Standardized NEON organismal data for biodiversity research
Abstract

Understanding patterns and drivers of species distribution and abundance, and thus biodiversity, is a core goal of ecology. Despite advances in recent decades, research into these patterns and processes is currently limited by a lack of standardized, high‐quality, empirical data that span large spatial scales and long time periods. The NEON fills this gap by providing freely available observational data that are generated during robust and consistent organismal sampling of several sentinel taxonomic groups within 81 sites distributed across the United States and will be collected for at least 30 years. The breadth and scope of these data provide a unique resource for advancing biodiversity research. To maximize the potential of this opportunity, however, it is critical that NEON data be maximally accessible and easily integrated into investigators' workflows and analyses. To facilitate its use for biodiversity research and synthesis, we created a workflow to process and format NEON organismal data into the ecocomDP (ecological community data design pattern) format that were available through the ecocomDP R package; we then provided the standardized data as an R data package (neonDivData). We briefly summarize sampling designs and data wrangling decisions for the major taxonomic groups included in this effort. Our workflows are open‐source so the biodiversity community may: add additional taxonomic groups; modify the workflow to produce datasets appropriate for their own analytical needs; and regularly update the data packages as more observations become available. Finally, we provide two simple examples of how the standardized data may be used for biodiversity research. By providing a standardized data package, we hope to enhance the utility of NEON organismal data in advancing biodiversity research and encourage the use of the harmonized ecocomDP data design pattern for community ecology data from other ecological observatory networks.

 
more » « less
Award ID(s):
1926341 1926568 1926567 1724433 1926598
NSF-PAR ID:
10396582
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  more » ;  ;  ;   « less
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecosphere
Volume:
13
Issue:
7
ISSN:
2150-8925
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice. 
    more » « less
  2. Abstract

    Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors.

     
    more » « less
  3. Abstract

    DNA‐based aquatic biomonitoring methods show promise to provide rapid, standardized, and efficient biodiversity assessment to supplement and in some cases replace current morphology‐based approaches that are often less efficient and can produce inconsistent results. Despite this potential, broad‐scale adoption of DNA‐based approaches by end‐users remains limited, and studies on how these two approaches differ in detecting aquatic biodiversity across large spatial scales are lacking. Here, we present a comparison of DNA metabarcoding and morphological identification, leveraging national‐scale, open‐source, ecological datasets from the National Ecological Observatory Network (NEON). Across 24 wadeable streams in North America with 179 paired sample comparisons, we found that DNA metabarcoding detected twice as many unique taxa than morphological identification overall. The two approaches showed poor congruence in detecting the same taxa, averaging 59%, 35%, and 23% of shared taxa detected at the order, family, and genus levels, respectively. Importantly, the two approaches detected different proportions of indicator taxa like %EPT and %Chironomidae. DNA metabarcoding detected far fewer Chironomid and Trichopteran taxa than morphological identification, but more Ephemeropteran and Plecopteran taxa, a result likely due to primer choice. Overall, our results showed that DNA metabarcoding and morphological identification detected different benthic macroinvertebrate communities. Despite these differences, we found that the same environmental variables were correlated with invertebrate community structure, suggesting that both approaches can accurately detect biodiversity patterns across environmental gradients. Further refinement of DNA metabarcoding protocols, primers, and reference libraries–as well as more standardized, large‐scale comparative studies–may improve our understanding of the taxonomic agreement and data linkages between DNA metabarcoding and morphological approaches.

     
    more » « less
  4. Research on floral volatiles has grown substantially in the last 20 years, which has generated insights into their diversity and prevalence. These studies have paved the way for new research that explores the evolutionary origins and ecological consequences of different types of variation in floral scent, including community-level, functional, and environmentally induced variation. However, to address these types of questions, novel approaches are needed that can handle large sample sizes, provide quality control measures, and make volatile research more transparent and accessible, particularly for scientists without prior experience in this field. Drawing upon a literature review and our own experiences, we present a set of best practices for next-generation research in floral scent. We outline methods for data collection (experimental designs, methods for conducting field collections, analytical chemistry, compound identification) and data analysis (statistical analysis, database integration) that will facilitate the generation and interpretation of quality data. For the intermediate step of data processing, we created the R package bouquet , which provides a data analysis pipeline. The package contains functions that enable users to convert chromatographic peak integrations to a filtered data table that can be used in subsequent statistical analyses. This package includes default settings for filtering out non-floral compounds, including background contamination, based on our best-practice guidelines, but functions and workflows can be easily customized as necessary. Next-generation research into the ecology and evolution of floral scent has the potential to generate broadly relevant insights into how complex traits evolve, their genomic architecture, and their consequences for ecological interactions. In order to fulfill this potential, the methodology of floral scent studies needs to become more transparent and reproducible. By outlining best practices throughout the lifecycle of a project, from experimental design to statistical analysis, and providing an R package that standardizes the data processing pipeline, we provide a resource for new and seasoned researchers in this field and in adjacent fields, where high-throughput and multi-dimensional datasets are common. 
    more » « less
  5. Abstract

    Comprehensive, time‐scaled phylogenies provide a critical resource for many questions in ecology, evolution and biodiversity. Methodological advances have increased the breadth of taxonomic coverage in phylogenetic data; however, accessing and reusing these data remain challenging.

    We introduce the Fish Tree of Life website and associatedrpackagefishtreeto provide convenient access to sequences, phylogenies, fossil calibrations and diversification rate estimates for the most diverse group of vertebrate organisms, the ray‐finned fishes. The Fish Tree of Life website presents subsets and visual summaries of phylogenetic and comparative data, and is complemented by therpackage, which provides flexible programmatic access to the same underlying data source for advanced users wishing to extend or reanalyse the data.

    We demonstrate functionality with an overview of the website, and show three examples of advanced usage through therpackage. First, we test for the presence of long branch attraction artefacts across the fish tree of life. The second example examines the effects of habitat on diversification rate in the pufferfishes. The final example demonstrates how a community phylogenetic analysis could be conducted with the package.

    This resource makes a large comparative vertebrate dataset easily accessible via the website, while therpackage enables the rapid reuse and reproducibility of research results via its ability to easily integrate with otherrpackages and software for molecular biology and comparative methods.

     
    more » « less