Abstract The open data movement has brought revolutionary changes to the field of mineralogy. With a growing number of datasets made available through community efforts, researchers are now able to explore new scientific topics such as mineral ecology, mineral evolution and new classification systems. The recent results have shown that the necessary open data coupled with data science skills and expertise in mineralogy will lead to impressive new scientific discoveries. Yet, feedback from researchers also reflects the needs for better FAIRness of open data, that is, findable, accessible, interoperable and reusable for both humans and machines. In this paper, we present our recent work on building the open data service of Mindat, one of the largest mineral databases in the world. In the past years, Mindat has supported numerous scientific studies but a machine interface for data access has never been established. Through the OpenMindat project we have achieved solid progress on two activities: (1) cleanse data and improve data quality, and (2) build a data sharing platform and establish a machine interface for data query and access. We hope OpenMindat will help address the increasing data needs from researchers in mineralogy for an internationally recognized authoritative database that is fully compliant with the FAIR guiding principles and helps accelerate scientific discoveries.
more »
« less
This content will become publicly available on January 1, 2026
The OpenMindat v1.0.0 R package: a machine interface to Mindat open data to facilitate data-intensive geoscience discoveries
Abstract. Technologies such as machine learning and deep learning are powering the discovery of meaningful patterns in Earth science big data. In the field of mineralogy, Mindat (“mindat.org”) is one of the largest databases. Although its front-end website is open and free, a machine interface for bulk data query and download had never been set up before 2022. Through a project called OpenMindat, an application programming interface (API) to enable open data query and access from Mindat was set up in 2023. To further lower the barrier between Mindat open data and geoscientists with limited coding skills, we developed an R package (OpenMindat v1.0.0) on top of the API. The Mindat API includes multiple data subjects such as geomaterials (e.g., rocks, minerals, synonyms, variety, mixture, and commodity), localities, and the IMA-approved (International Mineralogical Association) mineral list. The OpenMindat v1.0.0 package wraps the capabilities of the Mindat API and is designed to be user-friendly and extensible. In addition to providing functions for querying data subjects on the API, the package supports exporting data to various formats. In real-world applications, these functions only require minor coding for users to get desired datasets, and various other packages in the R environment can be used to analyze and visualize the data. The OpenMindat v1.0.0 package, which includes detailed tutorials and examples, is available on GitHub under the MIT license. The field of mineralogy and many other geoscience disciplines are facing opportunities enabled by open data. Various research topics such as mineral network analysis, mineral association rule mining, mineral ecology, mineral evolution, and critical minerals have already benefited from Mindat's open data efforts in recent years. We hope this R package can help accelerate those data-intensive studies and lead to more scientific discoveries.
more »
« less
- Award ID(s):
- 2126315
- PAR ID:
- 10618801
- Publisher / Repository:
- Copernicus
- Date Published:
- Journal Name:
- Geoscientific Model Development
- Volume:
- 18
- Issue:
- 14
- ISSN:
- 1991-9603
- Page Range / eLocation ID:
- 4455 to 4467
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The rare earth elements (REE) are essential for the high-tech and green technology industries, and used, for example, in computers, smartphones, and wind turbines. The REE are considered critical minerals and can be highly enriched in certain magmatic-hydrothermal systems including alkaline complexes and carbonatites. Almost all of the critical mineral deposits show a complex overprint by hydrothermal processes during their genesis. However, our understanding of the mobility in these ore- forming systems and our knowledge about the stability of REE minerals is still very limited. The MINES thermodynamic database is an open-access database and continuously updated with the most up to date thermodynamic data for REE aqueous species and minerals. This database also includes rock-forming minerals and permits simulating the mineralogy and alteration geochemistry that relates to the formation of these critical mineral deposits. This study gives a short overview of the MINES thermodynamic database and the GEMS code package for simulating the formation of hydrothermal calcite, fluorite and bastnäsite-(Ce) veins relevant to interpreting critical mineral deposits.more » « less
-
The Mindat open data service, encompassing data from over 6,000 mineral species and 400,000 localities, has big potential to support the work of mineral exploration by providing insights into mineral associations, paragenetic modes, and visual network analyses through labeled photographs. These tools enable geologists to identify indicator minerals, understand mineral formation sequences, and visually assess mineral assemblages. Mineral association analysis highlights minerals commonly found together, while paragenetic studies offer clues to formation environments. Visual networks of mineral relationships provide rapid identification references. Together, these resources raise new opportunities to enable data-driven strategies that eventually enhance the efficiency and accuracy of mineral exploration.more » « less
-
Understanding the mineralogy and geochemistry of the subsurface is key when assessing and exploring for mineral deposits. To achieve this goal, rapid acquisition and accurate interpretation of drill core data are essential. Hyperspectral shortwave infrared imaging is a rapid and non-destructive analytical method widely used in the minerals industry to map minerals with diagnostic features in core samples. In this paper, we present an automated method to interpret hyperspectral shortwave infrared data on drill core to decipher major felsic rock-forming minerals using supervised machine learning techniques for processing, masking, and extracting mineralogical and textural information. This study utilizes a co-registered training dataset that integrates hyperspectral data with quantitative scanning electron microscopy data instead of spectrum matching using a spectral library. Our methodology overcomes previous limitations in hyperspectral data interpretation for the full mineralogy (i.e., quartz and feldspar) caused by the need to identify spectral features of minerals; in particular, it detects the presence of minerals that are considered invisible in traditional shortwave infrared hyperspectral analysis.more » « less
-
Abstract PremiseDigitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R‐based Scrubbing (gatoRs). Methods and ResultsThe gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps. ConclusionsOur pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity‐related concepts into the classroom via the use of herbarium specimens.more » « less
An official website of the United States government
