Paleoscience data are extremely heterogeneous; hundreds of different types of measurements and reconstructions are routinely made by scientists on a variety of types of physical samples. This heterogeneity is one of the biggest barriers to finding paleoclimatic records, to building large‐scale data products, and to the use of paleoscience data beyond the community of specialists. Here, we document the Paleoenvironmental Standard Terms (PaST) thesaurus, the first authoritative vocabulary of standardized variable names for paleoclimatic and paleoenvironmental data developed in a formal knowledge organization structure. This structure is designed to improve data set discovery, support automated processing of data, and provide connectivity to other vocabularies. PaST is now used operationally at the World Data Service for Paleoclimatology (WDS‐Paleo), one of the largest repositories of paleoscience information. Terms from the PaST thesaurus standardize a broad array of paleoenvironmental and paleoclimatic measured and inferred variables, providing enough detail for accurate and precise data discovery and thereby promoting data reuse. We describe the main design decisions and features of the thesaurus, the governance structure for ongoing maintenance, and WDS‐Paleo services that now employ PaST. These services include an advanced search by variable name, an interface for thesaurus navigation, and a machine‐readable representation in the Simple Knowledge Organization System (SKOS) standard. This overview is designed for developers of thesauri, data contributors, and users of the WDS‐Paleo, and serves as a building block for future efforts within the broader paleoscience community to improve how data are described for long‐term findability, accessibility, interoperability, and reusability.
- Award ID(s):
- 1832184
- NSF-PAR ID:
- 10312715
- Date Published:
- Journal Name:
- Glycobiology
- Volume:
- 31
- Issue:
- 9
- ISSN:
- 1460-2423
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Background The proliferation of metagenomic sequencing technologies has enabled novel insights into the functional genomic potentials and taxonomic structure of microbial communities. However, cyberinfrastructure efforts to manage and enable the reproducible analysis of sequence data have not kept pace. Thus, there is increasing recognition of the need to make metagenomic data discoverable within machine-searchable frameworks compliant with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship. Although a variety of metagenomic web services exist, none currently leverage the hierarchically structured terminology encoded within common life science ontologies to programmatically discover data.
Results Here, we integrate large-scale marine metagenomic datasets with community-driven life science ontologies into a novel FAIR web service. This approach enables the retrieval of data discovered by intersecting the knowledge represented within ontologies against the functional genomic potential and taxonomic structure computed from marine sequencing data. Our findings highlight various microbial functional and taxonomic patterns relevant to the ecology of prokaryotes in various aquatic environments.
Conclusions In this work, we present and evaluate a novel Semantic Web architecture that can be used to ask novel biological questions of existing marine metagenomic datasets. Finally, the FAIR ontology searchable data products provided by our API can be leveraged by future research efforts.
-
null (Ed.)ABSTRACT The FaceBase Consortium was established by the National Institute of Dental and Craniofacial Research in 2009 as a ‘big data’ resource for the craniofacial research community. Over the past decade, researchers have deposited hundreds of annotated and curated datasets on both normal and disordered craniofacial development in FaceBase, all freely available to the research community on the FaceBase Hub website. The Hub has developed numerous visualization and analysis tools designed to promote integration of multidisciplinary data while remaining dedicated to the FAIR principles of data management (findability, accessibility, interoperability and reusability) and providing a faceted search infrastructure for locating desired data efficiently. Summaries of the datasets generated by the FaceBase projects from 2014 to 2019 are provided here. FaceBase 3 now welcomes contributions of data on craniofacial and dental development in humans, model organisms and cell lines. Collectively, the FaceBase Consortium, along with other NIH-supported data resources, provide a continuously growing, dynamic and current resource for the scientific community while improving data reproducibility and fulfilling data sharing requirements.more » « less
-
Summary High‐quality microbiome research relies on the integrity, management and quality of supporting data. Currently biobanks and culture collections have different formats and approaches to data management. This necessitates a standard data format to underpin research, particularly in line with the FAIR data standards of findability, accessibility, interoperability and reusability. We address the importance of a unified, coordinated approach that ensures compatibility of data between that needed by biobanks and culture collections, but also to ensure linkage between bioinformatic databases and the wider research community.
-
Digital publishing platforms and internet resources enable openness of access to scientific findings and data at scales never before realized. Unfortunately, researchers sometimes embrace lock-in systems for data generation and analysis out of necessity because meaningful alternatives do not exist. Scientific advances still take place when this occurs, but they become fragmented with discordant quality control, interoperability, reproducibility, and democratization of access. To maximize the value of these—often—publicly funded resources, disciplines are turning to FAIR Guiding Principles for data stewardship. FAIR (Findability, Accessibility, Interoperability, and Reuse) promotes the added value of widespread data sharing that is transparent, equitable, and inclusive. Here we present NoCTURN, an NSF-funded FAIR Open Science Research Coordination Network for computed tomography users. NoCTURN (the Non-clinical Computed Tomography Users Research Network) aims to address the fragmentation of tomography toolkits stemming from proprietary software, non-uniform metadata formats, and repeatability limits. In this presentation, we outline how we will achieve this aim together by 1) developing a community committed to information sharing; 2) coordinating data analysis, storage, and reporting requirements; 3) highlighting underrepresented voices in the field; 4) developing community standards inclusive of industry, research, education, and outreach stake-holders; and 5) modeling FAIR open science strategies for our colleagues and students. NoCTURN is recruiting undergraduates through established investigators from X-ray-, neutron-, and synchrotron-beam computed tomography communities—and we want to hear from you.more » « less