Abstract Data‐driven discovery in geoscience requires an enormous amount of FAIR (findable, accessible, interoperable and reusable) data derived from a multitude of sources. Many geology resources include data based on the geologic time scale, a system of dating that relates layers of rock (strata) to times in Earth history. The terminology of this geologic time scale, including the names of the strata and time intervals, is heterogeneous across data resources, hindering effective and efficient data integration. To address that issue, we created a deep‐time knowledge base that consists of knowledge graphs correlating international and regional geologic time scales, an online service of the knowledge graphs, and an R package to access the service. The knowledge base uses temporal topology to enable comparison and reasoning between various intervals and points in the geologic time scale. This work unifies and allows the querying of age‐related geologic information across the entirety of Earth history, resulting in a platform from which researchers can address complex deep‐time questions spanning numerous types of data and fields of study. 
                        more » 
                        « less   
                    
                            
                            The Deep-Time Digital Earth program: data-driven discovery in geosciences
                        
                    
    
            Abstract Current barriers hindering data-driven discoveries in deep-time Earth (DE) include: substantial volumes of DE data are not digitized; many DE databases do not adhere to FAIR (findable, accessible, interoperable and reusable) principles; we lack a systematic knowledge graph for DE; existing DE databases are geographically heterogeneous; a significant fraction of DE data is not in open-access formats; tailored tools are needed. These challenges motivate the Deep-Time Digital Earth (DDE) program initiated by the International Union of Geological Sciences and developed in cooperation with national geological surveys, professional associations, academic institutions and scientists around the world. DDE’s mission is to build on previous research to develop a systematic DE knowledge graph, a FAIR data infrastructure that links existing databases and makes dark data visible, and tailored tools for DE data, which are universally accessible. DDE aims to harmonize DE data, share global geoscience knowledge and facilitate data-driven discovery in the understanding of Earth's evolution. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1835717
- PAR ID:
- 10299877
- Date Published:
- Journal Name:
- National Science Review
- Volume:
- 8
- Issue:
- 9
- ISSN:
- 2095-5138
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Biology today is heavily data-driven and knowledge-centric that are stored across the linked open web in numerous heterogeneous deep web databases. To improve searching, finding, accessing, and inter-operating among these diverse information sources to increase usability, the FAIR data principle has been proposed. Unfortunately, FAIR compliance is extremely low and linked open data does not guarantee FAIRness, leaving biologists on a solo hunt for information on the open network. In this paper, we propose {\em SoDa}, for intelligent data foraging on the internet. SoDa helps biologists discover resources based on analysis requirements, generate resource access plans, and store cleaned data and knowledge for community use. A secondary search index is also supported for community members to find archived information conveniently.more » « less
- 
            Abstract Graph databases capture richly linked domain knowledge by integrating heterogeneous data and metadata into a unified representation. Here, we present the use of bespoke, interactive data graphics (bar charts, scatter plots, etc.) for visual exploration of a knowledge graph. By modeling a chart as a set of metadata that describes semantic context (SPARQL query) separately from visual context (Vega-Lite specification), we leverage the high-level, declarative nature of the SPARQL and Vega-Lite grammars to concisely specify web-based, interactive data graphics synchronized to a knowledge graph. Resources with dereferenceable URIs (uniform resource identifiers) can employ the hyperlink encoding channel or image marks in Vega-Lite to amplify the information content of a given data graphic, and published charts populate a browsable gallery of the database. We discuss design considerations that arise in relation to portability, persistence, and performance. Altogether, this pairing of SPARQL and Vega-Lite—demonstrated here in the domain of polymer nanocomposite materials science—offers an extensible approach to FAIR (findable, accessible, interoperable, reusable) scientific data visualization within a knowledge graph framework.more » « less
- 
            The FAIR Hackathon Workshop for Mathematics and the Physical Sciences (MPS) February 27-28, 2019 in Alexandria, Virginia brought together forty-four stakeholders in the physical sciences community to share skills, tools and techniques to FAIRify research data. As one of the first efforts of its kind in the US, the workshop offered participants a way to engage with FAIR principles (Findable, Accessible, Interoperable and Reusable) Data and metrics in the context of a hackathon. The workshop was designed to address issues of public access to data and to provide experience with FAIR tools and relevant hands-on experience for researchers. Existing FAIR tools and infrastructure were introduced. Hands-on hackathon breakout time was devoted to testing FAIR metrics and tools against physical sciences data. The hackathon invited MPS research data management stakeholders to react to the FAIR principles and to jointly consider gaps in the MPS data sharing ecosystem in the context of researcher’s actual projects. FAIR Gap analysis was introduced as a way to identify community-specific tools or infrastructure that could dramatically enhance the ability of domain scientists to make their data more FAIR.more » « less
- 
            A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    