skip to main content

Title: SOils DAta Harmonization database (SoDaH): an open-source synthesis of soil data from research networks
This SOils DAta Harmonization (SoDaH) database is designed to bring together soil carbon data from diverse research networks into a harmonized dataset that can be used for synthesis activitiesMore>>
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; « less
Environmental Data Initiative
Publication Year:
Award ID(s):
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. Data collected from research networks presentopportunities to test theories and develop models about factors responsiblefor the long-term persistence and vulnerability of soil organic matter(SOM). Synthesizing datasets collected by different research networkspresents opportunities to expand the ecological gradients and scientificbreadth of information available for inquiry. Synthesizing these data ischallenging, especially considering the legacy of soil data that havealready been collected and an expansion of new network science initiatives.To facilitate this effort, here we present the SOils DAta Harmonizationdatabase (SoDaH;, last access: 22 December 2020), a flexible database designed to harmonize diverse SOM datasets frommultiple research networks. SoDaH is built on several network scienceefforts in the United States, but the tools built for SoDaH aim to providean open-access resource to facilitate synthesis of soil carbon data.Moreover, SoDaH allows for individual locations to contribute results fromexperimental manipulations, repeated measurements from long-term studies,and local- to regional-scale gradients across ecosystems or landscapes.Finally, we also provide data visualization and analysis tools that can beused to query and analyze the aggregated database. The SoDaH v1.0 dataset isarchived and availableat (Wieder et al., 2020).
  2. Abstract. In the age of big data, soil data are more available and richer than ever, but – outside of a few large soil survey resources – they remain largely unusable for informing soil management and understanding Earth system processes beyond the original study.Data science has promised a fully reusable research pipeline where data from past studies are used to contextualize new findings and reanalyzed for new insight.Yet synthesis projects encounter challenges at all steps of the data reuse pipeline, including unavailable data, labor-intensive transcription of datasets, incomplete metadata, and a lack of communication between collaborators.Here, using insights from a diversity of soil, data, and climate scientists, we summarize current practices in soil data synthesis across all stages of database creation: availability, input, harmonization, curation, and publication.We then suggest new soil-focused semantic tools to improve existing data pipelines, such as ontologies, vocabulary lists, and community practices.Our goal is to provide the soil data community with an overview of current practices in soil data and where we need to go to fully leverage big data to solve soil problems in the next century.
  3. Abstract The National Ecological Observatory Network (NEON) is a multidecadal and continental-scale observatory with sites across the United States. Having entered its operational phase in 2018, NEON data products, software, and services become available to facilitate research on the impacts of climate change, land-use change, and invasive species. An essential component of NEON are its 47 tower sites, where eddy-covariance (EC) sensors are operated to determine the surface–atmosphere exchange of momentum, heat, water, and CO 2 . EC tower networks such as AmeriFlux, the Integrated Carbon Observation System (ICOS), and NEON are vital for providing the distributed observations to address interactions at the soil–vegetation–atmosphere interface. NEON represents the largest single-provider EC network globally, with standardized observations and data processing explicitly designed for intersite comparability and analysis of feedbacks across multiple spatial and temporal scales. Furthermore, EC is tightly integrated with soil, meteorology, atmospheric chemistry, isotope, phenology, and rich contextual observations such as airborne remote sensing and in situ sampling bouts. Here, we present an overview of NEON’s observational design, field operation, and data processing that yield community resources for the study of surface–atmosphere interactions. Near-real-time data products become available from the NEON Data Portal, and EC and meteorological data aremore »ingested into AmeriFlux and FLUXNET globally harmonized data releases. Open-source software for reproducible, extensible, and portable data analysis includes the eddy4R family of R packages underlying the EC data product generation. These resources strive to integrate with existing infrastructures and networks, to suggest novel systemic solutions, and to synergize ongoing research efforts across science communities.« less
  4. Abstract

    Understanding patterns and drivers of species distribution and abundance, and thus biodiversity, is a core goal of ecology. Despite advances in recent decades, research into these patterns and processes is currently limited by a lack of standardized, high‐quality, empirical data that span large spatial scales and long time periods. The NEON fills this gap by providing freely available observational data that are generated during robust and consistent organismal sampling of several sentinel taxonomic groups within 81 sites distributed across the United States and will be collected for at least 30 years. The breadth and scope of these data provide a unique resource for advancing biodiversity research. To maximize the potential of this opportunity, however, it is critical that NEON data be maximally accessible and easily integrated into investigators' workflows and analyses. To facilitate its use for biodiversity research and synthesis, we created a workflow to process and format NEON organismal data into the ecocomDP (ecological community data design pattern) format that were available through the ecocomDP R package; we then provided the standardized data as an R data package (neonDivData). We briefly summarize sampling designs and data wrangling decisions for the major taxonomic groups included in this effort. Our workflowsmore »are open‐source so the biodiversity community may: add additional taxonomic groups; modify the workflow to produce datasets appropriate for their own analytical needs; and regularly update the data packages as more observations become available. Finally, we provide two simple examples of how the standardized data may be used for biodiversity research. By providing a standardized data package, we hope to enhance the utility of NEON organismal data in advancing biodiversity research and encourage the use of the harmonized ecocomDP data design pattern for community ecology data from other ecological observatory networks.

    « less
  5. Pickett, Brett E. ; Jurado, Kellie (Ed.)
    ABSTRACT Data that catalogue viral diversity on Earth have been fragmented across sources, disciplines, formats, and various degrees of open sharing, posing challenges for research on macroecology, evolution, and public health. Here, we solve this problem by establishing a dynamically maintained database of vertebrate-virus associations, called The Global Virome in One Network (VIRION). The VIRION database has been assembled through both reconciliation of static data sets and integration of dynamically updated databases. These data sources are all harmonized against one taxonomic backbone, including metadata on host and virus taxonomic validity and higher classification; additional metadata on sampling methodology and evidence strength are also available in a harmonized format. In total, the VIRION database is the largest open-source, open-access database of its kind, with roughly half a million unique records that include 9,521 resolved virus “species” (of which 1,661 are ICTV ratified), 3,692 resolved vertebrate host species, and 23,147 unique interactions between taxonomically valid organisms. Together, these data cover roughly a quarter of mammal diversity, a 10th of bird diversity, and ∼6% of the estimated total diversity of vertebrates, and a much larger proportion of their virome than any previous database. We show how these data can be used to testmore »hypotheses about microbiology, ecology, and evolution and make suggestions for best practices that address the unique mix of evidence that coexists in these data. IMPORTANCE Animals and their viruses are connected by a sprawling, tangled network of species interactions. Data on the host-virus network are available from several sources, which use different naming conventions and often report metadata in different levels of detail. VIRION is a new database that combines several of these existing data sources, reconciles taxonomy to a single consistent backbone, and reports metadata in a format designed by and for virologists. Researchers can use VIRION to easily answer questions like “Can any fish viruses infect humans?” or “Which bats host coronaviruses?” or to build more advanced predictive models, making it an unprecedented step toward a full inventory of the global virome.« less