Abstract The open data movement has brought revolutionary changes to the field of mineralogy. With a growing number of datasets made available through community efforts, researchers are now able to explore new scientific topics such as mineral ecology, mineral evolution and new classification systems. The recent results have shown that the necessary open data coupled with data science skills and expertise in mineralogy will lead to impressive new scientific discoveries. Yet, feedback from researchers also reflects the needs for better FAIRness of open data, that is, findable, accessible, interoperable and reusable for both humans and machines. In this paper, we present our recent work on building the open data service of Mindat, one of the largest mineral databases in the world. In the past years, Mindat has supported numerous scientific studies but a machine interface for data access has never been established. Through the OpenMindat project we have achieved solid progress on two activities: (1) cleanse data and improve data quality, and (2) build a data sharing platform and establish a machine interface for data query and access. We hope OpenMindat will help address the increasing data needs from researchers in mineralogy for an internationally recognized authoritative database that is fully compliant with the FAIR guiding principles and helps accelerate scientific discoveries.
more »
« less
PubDAS: A PUBlic Distributed Acoustic Sensing Datasets Repository for Geosciences
Abstract During the past few years, distributed acoustic sensing (DAS) has become an invaluable tool for recording high-fidelity seismic wavefields with great spatiotemporal resolutions. However, the considerable amount of data generated during DAS experiments limits their distribution with the broader scientific community. Such a bottleneck inherently slows down the pursuit of new scientific discoveries in geosciences. Here, we introduce PubDAS—the first large-scale open-source repository where several DAS datasets from multiple experiments are publicly shared. PubDAS currently hosts eight datasets covering a variety of geological settings (e.g., urban centers, underground mines, and seafloor), spanning from several days to several years, offering both continuous and triggered active source recordings, and totaling up to ∼90 TB of data. This article describes these datasets, their metadata, and how to access and download them. Some of these datasets have only been shallowly explored, leaving the door open for new discoveries in Earth sciences and beyond.
more »
« less
- Award ID(s):
- 2022716
- PAR ID:
- 10437086
- Date Published:
- Journal Name:
- Seismological Research Letters
- Volume:
- 94
- Issue:
- 2A
- ISSN:
- 0895-0695
- Page Range / eLocation ID:
- 983 to 998
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the past decade, distributed acoustic sensing (DAS) has enabled many new monitoring applications in diverse fields including hydrocarbon exploration and extraction; induced, local, regional, and global seismology; infrastructure and urban monitoring; and several others. However, to date, the open-source software ecosystem for handling DAS data is relatively immature. Here we introduce DASCore, a Python library for analyzing, visualizing, and managing DAS data. DASCore implements an object-oriented interface for performing common data processing and transformations, reading and writing various DAS file types, creating simple visualizations, and managing file system-based DAS archives. DASCore also integrates with other Python-based tools which enable the processing of massive data sets in cloud environments. DASCore is the foundational package for the broader DAS data analysis ecosystem (DASDAE), and as such its main goal is to facilitate the development of other DAS libraries and applications.more » « less
-
Material characterization techniques are widely used to characterize the physical and chemical properties of materials at the nanoscale and, thus, play central roles in material scientific discoveries. However, the large and complex datasets generated by these techniques often require significant human effort to interpret and extract meaningful physicochemical insights. Artificial intelligence (AI) techniques such as machine learning (ML) have the potential to improve the efficiency and accuracy of surface analysis by automating data analysis and interpretation. In this perspective paper, we review the current role of AI in surface analysis and discuss its future potential to accelerate discoveries in surface science, materials science, and interface science. We highlight several applications where AI has already been used to analyze surface analysis data, including the identification of crystal structures from XRD data, analysis of XPS spectra for surface composition, and the interpretation of TEM and SEM images for particle morphology and size. We also discuss the challenges and opportunities associated with the integration of AI into surface analysis workflows. These include the need for large and diverse datasets for training ML models, the importance of feature selection and representation, and the potential for ML to enable new insights and discoveries by identifying patterns and relationships in complex datasets. Most importantly, AI analyzed data must not just find the best mathematical description of the data, but it must find the most physical and chemically meaningful results. In addition, the need for reproducibility in scientific research has become increasingly important in recent years. The advancement of AI, including both conventional and the increasing popular deep learning, is showing promise in addressing those challenges by enabling the execution and verification of scientific progress. By training models on large experimental datasets and providing automated analysis and data interpretation, AI can help to ensure that scientific results are reproducible and reliable. Although integration of knowledge and AI models must be considered for the transparency and interpretability of models, the incorporation of AI into the data collection and processing workflow will significantly enhance the efficiency and accuracy of various surface analysis techniques and deepen our understanding at an accelerated pace.more » « less
-
The study of seabirds can provide a fascinating subject for the integration of datasets and data practices with scientific phenomena. Workshop participants will examine trends and correlations in several decades of National Audubon Society data about puffins, using an accessible open-source education data tool (CODAP). They will examine relationships among variables including sea surface temperature, fish in the puffin diet, fledgling weight, and survival to breeding age. They will use present-day data from puffin webcams and sound recordings to supplement their work with historical datasets. They will train an artificial intelligence (AI) system to differentiate puffin vocalizations from those of other birds and puffin images from other bird images.more » « less
-
Abstract Geolocalization of distributed acoustic sensing (DAS) array channels represents a crucial step whenever the technology is deployed in the field. Commonly, the geolocalization is performed using point-wise active-source experiments, known as tap tests, conducted in the vicinity of the recording fiber. However, these controlled-source experiments are time consuming and greatly diminish the ability to promptly deploy such systems, especially for large-scale DAS experiments. We present a geolocalization methodology for DAS instrumentation that relies on seismic signals generated by a geotracked vehicle. We demonstrate the efficacy of our workflow by geolocating the channels of two DAS systems recording data on dark fibers stretching approximately 100 km within the Long Valley caldera area in eastern California. Our procedure permits the prompt calibration of DAS channel locations for seismic-related applications such as seismic hazard assessment, urban-noise monitoring, wavespeed inversion, and earthquake engineering. We share the developed set of codes along with a tutorial guiding users through the entire mapping process.more » « less
An official website of the United States government

