skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Design, development, and implementation of IsoBank: A centralized repository for isotopic data
Stable isotope data have made pivotal contributions to nearly every discipline of the physical and natural sciences. As the generation and application of stable isotope data continues to grow exponentially, so does the need for a unifying data repository to improve accessibility and promote collaborative engagement. This paper provides an overview of the design, development, and implementation of IsoBank (www.isobank.org), a community-driven initiative to create an open-access repository for stable isotope data implemented online in 2021. A central goal of IsoBank is to provide a web-accessible database supporting interdisciplinary stable isotope research and educational opportunities. To achieve this goal, we convened a multi-disciplinary group of over 40 analytical experts, stable isotope researchers, database managers, and web developers to collaboratively design the database. This paper outlines the main features of IsoBank and provides a focused description of the core metadata structure. We present plans for future database and tool development and engagement across the scientific community. These efforts will help facilitate interdisciplinary collaboration among the many users of stable isotopic data while also offering useful data resources and standardization of metadata reporting across eco-geoinformatics landscapes.  more » « less
Award ID(s):
1759937
PAR ID:
10659615
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Editor(s):
Becker, Daniel
Publisher / Repository:
PLOS ONE
Date Published:
Journal Name:
PLOS ONE
Volume:
19
Issue:
9
ISSN:
1932-6203
Page Range / eLocation ID:
e0295662
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The goal of the Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) project is to establish a collection of nonhuman ovary histology images for multiple species as a resource for researchers and educators. An important component of sharing scientific data is the inclusion of the contextual metadata that describes the data. MOTHER extends the Ecological Metadata Language (EML) for documenting research data, leveraging its data provenance and usage license with the inclusion of metadata for ovary histology images. The design of the MOTHER metadata includes information on the donor animal, including reproductive cycle status, the slide and its preparation. MOTHER also extends the ezEML tool, called ezEML+MOTHER, for the specification of the metadata. The design of the MOTHER database (MOTHER-DB) captures the metadata about the histology images, providing a searchable resource for discovering relevant images. MOTHER also defines a curation process for the ingestion of a collection of images and its metadata, verifying the validity of the metadata before its inclusion in the MOTHER collection. A Web search provides the ability to identify relevant images based on various characteristics in the metadata itself, such as genus and species, using filters. 
    more » « less
  2. Abstract. The hydrogen and oxygen stable isotope ratios of water have been used to identify sources, transport pathways, and phase-change processes within the water cycle, supporting hydrologic, forensic, ecologic, and hydroclimatic investigations. Here, we introduce a unique, open-access, global database of stable water isotope ratios (δ18O, δ17O, and δ2H) from various water types. This database facilitates data preservation, supports standardized metadata collection, and decreases the time investment for meta-analytic research and reference dataset discovery. As of July 2019, the database includes 231 586 samples from 52 210 sites, associated with 218 projects, spanning 1949 through 2019. Key information stored includes the hydrogen and oxygen isotope ratios, water type, collection date and time, site location, and project information. To promote rapid data discovery and collaboration, the database exposes metadata such as data owner contact information of embargoed data, but only permits downloads of public data. The database is supported by two companion apps, one for processing and upload of analytical data from laboratories and the other an iOS application that supports the digital collection of sample metadata. 
    more » « less
  3. BackgroundThe advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. MethodsOne recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for MMseqs2 (iMMseqs2) and Diamond (iDiamond), thereby providing a more generalized and broadly effective incremental search framework. Moreover, the previously published iBlast wrapper has to be revised to be more robust and usable by the general community. ResultsiMMseqs2 and iDiamond, which apply the incremental approach, perform nearly identical to MMseqs2 and Diamond. Notably, when comparing ranking comparison methods such as the Pearson correlation, we observe a high concordance of over 0.9, indicating similar results. Moreover, in some cases, our incremental approach, iSeqsSearch, which extends the iBlast merge function to iMMseqs2 and iDiamond, provides more hits compared to the conventional MMseqs2 and Diamond methods. ConclusionThe incremental approach using iMMseqs2 and iDiamond demonstrates efficiency in terms of reusing previously processed data while maintaining high accuracy and concordance in search results. This method can reduce resource waste in continually growing genomic and proteomic database searches. The sample codes and data are available at GitHub and Zenodo (https://github.com/EESI/Incremental-Protein-Search; DOI:10.5281/zenodo.14675319). 
    more » « less
  4. In paleoceanography, carbon and oxygen stable isotope ratios from benthic foraminifera are used as tracers of physical and biogeochemical properties of the deep ocean. We present the first version of the Ocean Carbon Cycling working group database,  of stable isotope ratios of oxygen and carbon from benthic foraminifera from deep ocean sediment cores from the Last Glacial Maximum (LGM, 23-20 ky before present (BP)) to the Holocene (<10 ky BP) with a particular focus on the early last deglaciation (20-15 ky BP). It includes 287 globally distributed coring sites, with metadata, isotopic and chronostratigraphic information, and age models. A quality check was performed for all data and age models. Sites with at least millennial resolution were preferred, because the main goal is to resolve ocean changes associated with the last deglaciation on at least millennial timescales. Software tools were produced to access and analyze the data, and are included with this publication. Deep water mass structure as well as differences between the early deglaciation and LGM are captured by the data in the compilation, even though its coverage is still sparse in many ocean regions. We find high correlations among time series calculated with different age models at sites that allow such analysis. The database provides a useful dynamical approach to map physical and biogeochemical changes of the ocean throughout the last deglaciation.</p> Custom python scripts to read and analyze the data base may be found in https://github.com/juanmuglia/OC3-python-scripts and in OC3-python-scripts.zip in this repository. plots_d13c.pdf and plots_d18o.pdf contain time series for all sites and available age models. 
    more » « less
  5. Abstract It has become common for researchers to make their data publicly available to meet the data management and accessibility requirements of funding agencies and scientific publishers. However, many researchers face the challenge of determining what data to preserve and share and where to preserve and share those data. This can be especially challenging for those who run dynamical models, which can produce complex, voluminous data outputs, and have not considered what outputs may need to be preserved and shared as part of the project design. This manuscript presents findings from the NSF EarthCube Research Coordination Network project titled “What About Model Data? Best Practices for Preservation and Replicability” (https://modeldatarcn.github.io/). These findings suggest that if the primary goal of sharing data are to communicate knowledge, most simulation-based research projects only need to preserve and share selected model outputs along with the full simulation experiment workflow. One major result of this project has been the development of a rubric, designed to provide guidance for making decisions on what simulation output needs to be preserved and shared in trusted community repositories to achieve the goal of knowledge communication. This rubric, along with use cases for selected projects, provide scientists with guidance on data accessibility requirements in the planning process of research, allowing for more thoughtful development of data management plans and funding requests. Additionally, this rubric can be referred to by publishers for what is expected in terms of data accessibility for publication. 
    more » « less