skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Geochemical databases
Geochemistry is a data-driven discipline. Modern laboratories produce highly diverse data, and the recent exponential increase in data volumes is challenging established practices and capabilities for organizing, analyzing, preserving, and accessing these data. At the same time, sophisticated computational techniques, including machine learning, are increasingly applied to geochemical research questions, which require easy access to large volumes of high-quality, well-organized, and standardized data. Data management has been important since the beginning of geochemistry but has recently become a necessity for the discipline to thrive in the age of digitalization and artificial intelligence. This paper summarizes the landscape of geochemical databases, distinguishing different types of data systems based on their purpose, and their evolution in a historic context. We apply the life cycle model of geochemical data; explain the relevance of current standards, practices, and policies that determine the design of modern geochemical databases and data management; the ethics of data reuse such as data ownership, data attribution, and data citation; and finally create a vision for the future of geochemical databases: data being born digital, connected to agreed community standards, and contributing to global democratization of geochemical data.  more » « less
Award ID(s):
2148939
PAR ID:
10588376
Author(s) / Creator(s):
; ;
Publisher / Repository:
Elsevier
Date Published:
ISBN:
9780323997638
Page Range / eLocation ID:
97 to 135
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science. 
    more » « less
  2. Abstract Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as ‘databases’ throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases 
    more » « less
  3. Geochemical data from ancient marine sediments are crucial for studying palaeo-environments, palaeo-climates, and elemental cycles. With increased accessibility to geochemical data, many databases have emerged. However, there remains a need for a more comprehensive database that focuses on deep-time marine sediment records. Here, we introduce the Deep-Time Marine Sedimentary Element Database (DM-SED). The DM-SED has been built upon the Sedimentary Geochemistry and Paleoenvironments Project (SGP) database with a new compilation of 34 874 data entries from 433 studies, totalling 63 627 entries. The DM-SED contains 2 522 255 discrete marine sedimentary data points, including major and trace elements and some stable isotopes. It includes 9207 entries from the Precambrian and 54 420 entries from the Phanerozoic, thus providing significant references for reconstructing deep-time Earth system evolution. The data files described in this paper are available at https://doi.org/10.5281/zenodo.14771859 (Lai et al., 2025). 
    more » « less
  4. Accurately reconstructing original Total Organic Carbon (TOC) in thermally mature rocks is essential for the correct application of geochemical proxies and understanding organic carbon burial through time. To reconstruct original TOC using empirical methods, it is vital to have an accurate estimate of the original Hydrogen Index (HI). The two most common methods are estimating original HI using kerogen type or using average HI values from immature rocks elsewhere in the basin. This study tests the ability to use inorganic geochemical data to reconstruct original HI using the Upper Cretaceous-Paleogene Moreno Formation from the San Joaquin Basin, California, USA as a case study. The study utilized cores from the Moreno Formation that are thermally immature, thus preserving original HI values, and that span a range in initial HI. First, inorganic geochemical data were produced (elemental abundances and iron speciation) for samples previously analyzed for organic geochemistry. These data suggest that bottom water conditions during deposition of the Moreno Formation were ferruginous (anoxic and non-sulfidic), without development of sustained euxinia (anoxic and sulfidic). Next, a random forest machine learning analysis was implemented to analyze which inorganic geochemical variables best predict HI in the Moreno Formation. The most important proxies were those for detrital input (Ti, Th), marine export productivity (Cu, Ni), and redox proxies for suboxic conditions (Se, Cr, iron speciation). Finally, the random forest framework was used to predict HI values for three main study cores based on their inorganic geochemistry. These predictions were compared stratigraphically and statistically against the measured values and the kerogen type and average HI methods for reconstructing HI and show this new method has better predictive power than approaches based on single values. This indicates strong promise for using inorganic geochemistry, which is relatively immune to thermal maturation, to reconstruct organic geochemical parameters that are modified during burial and diagenetic process. 
    more » « less
  5. Abstract Increased use and improved methodology of carbonate clumped isotope thermometry has greatly enhanced our ability to interrogate a suite of Earth‐system processes. However, interlaboratory discrepancies in quantifying carbonate clumped isotope (Δ47) measurements persist, and their specific sources remain unclear. To address interlaboratory differences, we first provide consensus values from the clumped isotope community for four carbonate standards relative to heated and equilibrated gases with 1,819 individual analyses from 10 laboratories. Then we analyzed the four carbonate standards along with three additional standards, spanning a broad range of δ47and Δ47values, for a total of 5,329 analyses on 25 individual mass spectrometers from 22 different laboratories. Treating three of the materials as known standards and the other four as unknowns, we find that the use of carbonate reference materials is a robust method for standardization that yields interlaboratory discrepancies entirely consistent with intralaboratory analytical uncertainties. Carbonate reference materials, along with measurement and data processing practices described herein, provide the carbonate clumped isotope community with a robust approach to achieve interlaboratory agreement as we continue to use and improve this powerful geochemical tool. We propose that carbonate clumped isotope data normalized to the carbonate reference materials described in this publication should be reported as Δ47(I‐CDES) values for Intercarb‐Carbon Dioxide Equilibrium Scale. 
    more » « less