Geochemistry is a data-driven discipline. Modern laboratories produce highly diverse data, and the recent exponential increase in data volumes is challenging established practices and capabilities for organizing, analyzing, preserving, and accessing these data. At the same time, sophisticated computational techniques, including machine learning, are increasingly applied to geochemical research questions, which require easy access to large volumes of high-quality, well-organized, and standardized data. Data management has been important since the beginning of geochemistry but has recently become a necessity for the discipline to thrive in the age of digitalization and artificial intelligence. This paper summarizes the landscape of geochemical databases, distinguishing different types of data systems based on their purpose, and their evolution in a historic context. We apply the life cycle model of geochemical data; explain the relevance of current standards, practices, and policies that determine the design of modern geochemical databases and data management; the ethics of data reuse such as data ownership, data attribution, and data citation; and finally create a vision for the future of geochemical databases: data being born digital, connected to agreed community standards, and contributing to global democratization of geochemical data.
more »
« less
The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community
More Like this
-
-
Accurately reconstructing original Total Organic Carbon (TOC) in thermally mature rocks is essential for the correct application of geochemical proxies and understanding organic carbon burial through time. To reconstruct original TOC using empirical methods, it is vital to have an accurate estimate of the original Hydrogen Index (HI). The two most common methods are estimating original HI using kerogen type or using average HI values from immature rocks elsewhere in the basin. This study tests the ability to use inorganic geochemical data to reconstruct original HI using the Upper Cretaceous-Paleogene Moreno Formation from the San Joaquin Basin, California, USA as a case study. The study utilized cores from the Moreno Formation that are thermally immature, thus preserving original HI values, and that span a range in initial HI. First, inorganic geochemical data were produced (elemental abundances and iron speciation) for samples previously analyzed for organic geochemistry. These data suggest that bottom water conditions during deposition of the Moreno Formation were ferruginous (anoxic and non-sulfidic), without development of sustained euxinia (anoxic and sulfidic). Next, a random forest machine learning analysis was implemented to analyze which inorganic geochemical variables best predict HI in the Moreno Formation. The most important proxies were those for detrital input (Ti, Th), marine export productivity (Cu, Ni), and redox proxies for suboxic conditions (Se, Cr, iron speciation). Finally, the random forest framework was used to predict HI values for three main study cores based on their inorganic geochemistry. These predictions were compared stratigraphically and statistically against the measured values and the kerogen type and average HI methods for reconstructing HI and show this new method has better predictive power than approaches based on single values. This indicates strong promise for using inorganic geochemistry, which is relatively immune to thermal maturation, to reconstruct organic geochemical parameters that are modified during burial and diagenetic process.more » « less
An official website of the United States government

