skip to main content


Title: An analytical pipeline to support robust research on the ecology, evolution, and function of floral volatiles
Research on floral volatiles has grown substantially in the last 20 years, which has generated insights into their diversity and prevalence. These studies have paved the way for new research that explores the evolutionary origins and ecological consequences of different types of variation in floral scent, including community-level, functional, and environmentally induced variation. However, to address these types of questions, novel approaches are needed that can handle large sample sizes, provide quality control measures, and make volatile research more transparent and accessible, particularly for scientists without prior experience in this field. Drawing upon a literature review and our own experiences, we present a set of best practices for next-generation research in floral scent. We outline methods for data collection (experimental designs, methods for conducting field collections, analytical chemistry, compound identification) and data analysis (statistical analysis, database integration) that will facilitate the generation and interpretation of quality data. For the intermediate step of data processing, we created the R package bouquet , which provides a data analysis pipeline. The package contains functions that enable users to convert chromatographic peak integrations to a filtered data table that can be used in subsequent statistical analyses. This package includes default settings for filtering out non-floral compounds, including background contamination, based on our best-practice guidelines, but functions and workflows can be easily customized as necessary. Next-generation research into the ecology and evolution of floral scent has the potential to generate broadly relevant insights into how complex traits evolve, their genomic architecture, and their consequences for ecological interactions. In order to fulfill this potential, the methodology of floral scent studies needs to become more transparent and reproducible. By outlining best practices throughout the lifecycle of a project, from experimental design to statistical analysis, and providing an R package that standardizes the data processing pipeline, we provide a resource for new and seasoned researchers in this field and in adjacent fields, where high-throughput and multi-dimensional datasets are common.  more » « less
Award ID(s):
2135270 1654655 1624073
NSF-PAR ID:
10379278
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Frontiers in Ecology and Evolution
Volume:
10
ISSN:
2296-701X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Record, Sydne (Ed.)
    1. LiDAR data are being increasingly used to provide a detailed characterization of the vertical profile of forests. This characterization enables the generation of new insights on the influence of environmental drivers and anthropogenic disturbances on forest structure as well as on how forest structure influences important ecosystem functions and services. Unfortunately, extracting information from LiDAR data in a way that enables the spatial visualization of forest structure, as well as its temporal changes, is challenging due to the high-dimensionality of these data. 2. We show how the Latent Dirichlet Allocation model applied to LiDAR data (LidarLDA) can be used to identify forest structural types and how the relative abundance of these forest types changes throughout the landscape. The code to fit this model is made available through the open-source R package LidarLDA in github. We illustrate the use of LidarLDA both with simulated data and data from a large-scale fire experiment in the Brazilian Amazon region. 3. Using simulated data, we demonstrate that LidarLDA accurately identifies the number of forest types as well as their spatial distribution and absorptance probabilities. For the empirical data, we found that LidarLDA detects both landscape-level patterns in forest structure as well as the strong interacting effect of fire and forest fragmentation on forest structure based on the experimental fire plots. More specifically, LidarLDA reveals that proximity to forest edge exacerbates the impact of fires, and that burned forests remain structurally different from unburned areas for at least seven years, even when burned only once. Importantly, LidarLDA generates insights on the 3D structure of forest that cannot be obtained using more standard approaches that just focus on top-of-the-canopy information (e.g., canopy height models based on LiDAR data). 4. By enabling the mapping of forest structure and its temporal changes, we believe that LidarLDA will be of broad utility to the ecological research community. 
    more » « less
  2. null (Ed.)
    Coastal salt marshes are distributed widely across the globe and are considered essential habitat for many fish and crustacean species. Yet, the literature on fishery support by salt marshes has largely been based on a few geographically distinct model systems, and as a result, inadequately captures the hierarchical nature of salt marsh pattern, process, and variation across space and time. A better understanding of geographic variation and drivers of commonalities and differences across salt marsh systems is essential to informing future management practices. Here, we address the key drivers of geographic variation in salt marshes: hydroperiod, seascape configuration, geomorphology, climatic region, sediment supply and riverine input, salinity, vegetation composition, and human activities. Future efforts to manage, conserve, and restore these habitats will require consideration of how environmental drivers within marshes affect the overall structure and subsequent function for fisheries species. We propose a future research agenda that provides both the consistent collection and reporting of sources of variation in small-scale studies and collaborative networks running parallel studies across large scales and geographically distinct locations to provide analogous information for data poor locations. These comparisons are needed to identify and prioritize restoration or conservation efforts, identify sources of variation among regions, and best manage fisheries and food resources across the globe. Introduction Understanding the drivers of geographic variation in the condition and composition of habitats is crucial to our capacity to generalize management plans across space and time and to clarify and perhaps challenge assumptions of functional equivalence among sites. Broadly defined wetland types such as salt marshes are often assumed to provide similar functions throughout their global range, such as providing nursery habitat for fishery species. However, a growing body of evidence suggests substantial geographic variation in the functioning of salt marsh and other coastal ecosystems (Bradley et al. 2020; Whalen et al. 2020). Variation in ecological patterns and processes within habitat types can alter community structure and dynamics. Local-scale patterns and processes (e.g., patch [10s of meters], local [100s of meters]) can be influenced by processes that occur at larger spatial scales (e.g., regional [kms], global), thereby causing geographic differences in the function and ecosystem service delivery of a given habitat type. Salt marshes (which include vegetated platform, interconnected tidal creeks, fringing mudflats, ponds, and pools) are widely distributed (Fig. 1) and function as valuable nursery habitats by providing key resources for many estuarine species that transition to marine or aquatic habitats as adults (Beck et al. 2001; Minello et al. 2003; Sheaves et al. 2015). However, factors that underlie variability in the delivery of ecological functions are still inadequately understood. Previous studies have explored geographic variation in the function of salt marshes for fish and mobile crustaceans (“nekton”; e.g., Minello et al. 2012, Baker et al. 2013). However, field studies that compare multiple sites across a geographical gradient are typically limited in duration and scale. In addition, the explanatory variables (e.g., elevation, flooding duration, plant structure) collected by smaller scale studies are often inconsistent and therefore limit generalizations across sites. 
    more » « less
  3. Abstract Background Recent development of bioinformatics tools for Next Generation Sequencing data has facilitated complex analyses and prompted large scale experimental designs for comparative genomics. When combined with the advances in network inference tools, this can lead to powerful methodologies for mining genomics data, allowing development of pipelines that stretch from sequence reads mapping to network inference. However, integrating various methods and tools available over different platforms requires a programmatic framework to fully exploit their analytic capabilities. Integrating multiple genomic analysis tools faces challenges from standardization of input and output formats, normalization of results for performing comparative analyses, to developing intuitive and easy to control scripts and interfaces for the genomic analysis pipeline. Results We describe here NetSeekR, a network analysis R package that includes the capacity to analyze time series of RNA-Seq data, to perform correlation and regulatory network inferences and to use network analysis methods to summarize the results of a comparative genomics study. The software pipeline includes alignment of reads, differential gene expression analysis, correlation network analysis, regulatory network analysis, gene ontology enrichment analysis and network visualization of differentially expressed genes. The implementation provides support for multiple RNA-Seq read mapping methods and allows comparative analysis of the results obtained by different bioinformatics methods. Conclusion Our methodology increases the level of integration of genomics data analysis tools to network inference, facilitating hypothesis building, functional analysis and genomics discovery from large scale NGS data. When combined with network analysis and simulation tools, the pipeline allows for developing systems biology methods using large scale genomics data. 
    more » « less
  4. Abstract

    Pressing environmental research questions demand the integration of increasingly diverse and large‐scale ecological datasets as well as complex analytical methods, which require specialized tools and resources.

    Computational training for ecological and evolutionary sciences has become more abundant and accessible over the past decade, but tool development has outpaced the availability of specialized training. Most training for scripted analyses focuses on individual analysis steps in one script rather than creating a scripted pipeline, where modular functions comprise an ecosystem of interdependent steps. Although current computational training creates an excellent starting place, linear styles of scripting can risk becoming labor‐ and time‐intensive and less reproducible by often requiring manual execution. Pipelines, however, can be easily automated or tracked by software to increase efficiency and reduce potential errors. Ecology and evolution would benefit from techniques that reduce these risks by managing analytical pipelines in a modular, readily parallelizable format with clear documentation of dependencies.

    Workflow management software (WMS) can aid in the reproducibility, intelligibility and computational efficiency of complex pipelines. To date, WMS adoption in ecology and evolutionary research has been slow. We discuss the benefits and challenges of implementing WMS and illustrate its use through a case study with thetargets rpackage to further highlight WMS benefits through workflow automation, dependency tracking and improved clarity for reviewers.

    Although WMS requires familiarity with function‐oriented programming and careful planning for more advanced applications and pipeline sharing, investment in training will enable access to the benefits of WMS and impart transferable computing skills that can facilitate ecological and evolutionary data science at large scales.

     
    more » « less
  5. Abstract

    Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors.

     
    more » « less