Abstract. Technologies such as machine learning and deep learning are powering the discovery of meaningful patterns in Earth science big data. In the field of mineralogy, Mindat (“mindat.org”) is one of the largest databases. Although its front-end website is open and free, a machine interface for bulk data query and download had never been set up before 2022. Through a project called OpenMindat, an application programming interface (API) to enable open data query and access from Mindat was set up in 2023. To further lower the barrier between Mindat open data and geoscientists with limited coding skills, we developed an R package (OpenMindat v1.0.0) on top of the API. The Mindat API includes multiple data subjects such as geomaterials (e.g., rocks, minerals, synonyms, variety, mixture, and commodity), localities, and the IMA-approved (International Mineralogical Association) mineral list. The OpenMindat v1.0.0 package wraps the capabilities of the Mindat API and is designed to be user-friendly and extensible. In addition to providing functions for querying data subjects on the API, the package supports exporting data to various formats. In real-world applications, these functions only require minor coding for users to get desired datasets, and various other packages in the R environment can be used to analyze and visualize the data. The OpenMindat v1.0.0 package, which includes detailed tutorials and examples, is available on GitHub under the MIT license. The field of mineralogy and many other geoscience disciplines are facing opportunities enabled by open data. Various research topics such as mineral network analysis, mineral association rule mining, mineral ecology, mineral evolution, and critical minerals have already benefited from Mindat's open data efforts in recent years. We hope this R package can help accelerate those data-intensive studies and lead to more scientific discoveries. 
                        more » 
                        « less   
                    
                            
                            OpenMindat : Open and FAIR mineralogy data from the Mindat database
                        
                    
    
            Abstract The open data movement has brought revolutionary changes to the field of mineralogy. With a growing number of datasets made available through community efforts, researchers are now able to explore new scientific topics such as mineral ecology, mineral evolution and new classification systems. The recent results have shown that the necessary open data coupled with data science skills and expertise in mineralogy will lead to impressive new scientific discoveries. Yet, feedback from researchers also reflects the needs for better FAIRness of open data, that is, findable, accessible, interoperable and reusable for both humans and machines. In this paper, we present our recent work on building the open data service of Mindat, one of the largest mineral databases in the world. In the past years, Mindat has supported numerous scientific studies but a machine interface for data access has never been established. Through the OpenMindat project we have achieved solid progress on two activities: (1) cleanse data and improve data quality, and (2) build a data sharing platform and establish a machine interface for data query and access. We hope OpenMindat will help address the increasing data needs from researchers in mineralogy for an internationally recognized authoritative database that is fully compliant with the FAIR guiding principles and helps accelerate scientific discoveries. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2126315
- PAR ID:
- 10416648
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Geoscience Data Journal
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2049-6060
- Format(s):
- Medium: X Size: p. 94-104
- Size(s):
- p. 94-104
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Minerals are information-rich materials that offer researchers a glimpse into the evolution of planetary bodies. Thus, it is important to extract, analyze, and interpret this abundance of information to improve our understanding of the planetary bodies in our solar system and the role our planet’s geosphere played in the origin and evolution of life. Over the past several decades, data-driven efforts in mineralogy have seen a gradual increase. The development and application of data science and analytics methods to mineralogy, while extremely promising, has also been somewhat ad hoc in nature. To systematize and synthesize the direction of these efforts, we introduce the concept of “Mineral Informatics,” which is the next frontier for researchers working with mineral data. In this paper, we present our vision for Mineral Informatics and the X-Informatics underpinnings that led to its conception, as well as the needs, challenges, opportunities, and future directions of the field. The intention of this paper is not to create a new specific field or a sub-field as a separate silo, but to document the needs of researchers studying minerals in various contexts and fields of study, to demonstrate how the systemization and enhanced access to mineralogical data will increase cross- and interdisciplinary studies, and how data science and informatics methods are a key next step in integrative mineralogical studies.more » « less
- 
            Hummer, Daniel (Ed.)Abstract The mindat.org website (Mindat) has been operating since October 2000 as a free, crowd-sourced, and expert-curated database particularly focused on mineral species and their occurrences worldwide. The project has transformed from a hobbyist site in the beginning into a resource that has found use in various scientific research projects and educational programs. Together with other open data resources, Mindat has helped accelerate scientific discoveries in many fields, such as mineral evolution, mineral ecology, and the co-evolution of the geosphere and biosphere. Recently, through open data efforts, machine interfaces and software packages have been established to enable flexible data discovery and download from Mindat. We assume that the data access and usage will further scale up in the next years. Although Mindat is curated by a team of geoscience and database experts across the world, the crowd-sourced records in Mindat possess some bias. In this paper, we first present an overview of the primary data subjects in Mindat and then give extensive details about the characteristics and partiality of three of the most popular data subjects: locality, mineral species, and mineral occurrence. In the discussion, we also give an outlook on appropriate data usage and future extension of data records. We hope users can obtain a more comprehensive view of the Mindat database through this paper and thus better plan their data use. We also hope more people will be inspired to contribute to the data curation work to make Mindat a sustained data ecosystem for geoscience research.more » « less
- 
            Abstract During the past few years, distributed acoustic sensing (DAS) has become an invaluable tool for recording high-fidelity seismic wavefields with great spatiotemporal resolutions. However, the considerable amount of data generated during DAS experiments limits their distribution with the broader scientific community. Such a bottleneck inherently slows down the pursuit of new scientific discoveries in geosciences. Here, we introduce PubDAS—the first large-scale open-source repository where several DAS datasets from multiple experiments are publicly shared. PubDAS currently hosts eight datasets covering a variety of geological settings (e.g., urban centers, underground mines, and seafloor), spanning from several days to several years, offering both continuous and triggered active source recordings, and totaling up to ∼90 TB of data. This article describes these datasets, their metadata, and how to access and download them. Some of these datasets have only been shallowly explored, leaving the door open for new discoveries in Earth sciences and beyond.more » « less
- 
            New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude byhighlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
