skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on January 17, 2025

Title: Enhancing the FAIRness of Arctic Research Data Through Semantic Annotation
The National Science Foundation’s Arctic Data Center is the primary data repository for NSF-funded research conducted in the Arctic. There are major challenges in discovering and interpreting resources in a repository containing data as heterogeneous and interdisciplinary as those in the Arctic Data Center. This paper reports on advances in cyberinfrastructure at the Arctic Data Center that help address these issues by leveraging semantic technologies that enhance the repository’s adherence to the FAIR data principles and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources in the repository. We describe the Arctic Data Center’s improvements. We use semantic annotation to bind metadata about Arctic data sets with concepts in web-accessible ontologies. The Arctic Data Center’s implementation of a semantic annotation mechanism is accompanied by the development of an extended search interface that increases the findability of data by allowing users to search for specific, broader, and narrower meanings of measurement descriptions, as well as through their potential synonyms. Based on research carried out by the DataONE project, we evaluated the potential impact of this approach, regarding the accessibility, interoperability, and reusability of measurement data. Arctic research often benefits from having additional data, typically from multiple, heterogeneous sources, that complement and extend the bases – spatially, temporally, or thematically – for understanding Arctic phenomena. These relevant data resources must be 'found', and 'harmonized' prior to integration and analysis. The findings of a case study indicated that the semantic annotation of measurement data enhances the capabilities of researchers to accomplish these tasks.  more » « less
Award ID(s):
2042102 1831937
PAR ID:
10523942
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Ubiquity Press
Date Published:
Journal Name:
Data Science Journal
Volume:
23
ISSN:
1683-1470
Subject(s) / Keyword(s):
Arctic research data data discovery FAIR knowledge modeling semantic annotation data repository
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Since 1971, the Protein Data Bank (PDB) has served as the single global archive for experimentally determined 3D structures of biological macromolecules made freely available to the global community according to the FAIR principles of Findability–Accessibility–Interoperability–Reusability. During the first 50 years of continuous PDB operations, standards for data representation have evolved to better represent rich and complex biological phenomena. Carbohydrate molecules present in more than 14,000 PDB structures have recently been reviewed and remediated to conform to a new standardized format. This machine-readable data representation for carbohydrates occurring in the PDB structures and the corresponding reference data improves the findability, accessibility, interoperability and reusability of structural information pertaining to these molecules. The PDB Exchange MacroMolecular Crystallographic Information File data dictionary now supports (i) standardized atom nomenclature that conforms to International Union of Pure and Applied Chemistry-International Union of Biochemistry and Molecular Biology (IUPAC-IUBMB) recommendations for carbohydrates, (ii) uniform representation of branched entities for oligosaccharides, (iii) commonly used linear descriptors of carbohydrates developed by the glycoscience community and (iv) annotation of glycosylation sites in proteins. For the first time, carbohydrates in PDB structures are consistently represented as collections of standardized monosaccharides, which precisely describe oligosaccharide structures and enable improved carbohydrate visualization, structure validation, robust quantitative and qualitative analyses, search for dendritic structures and classification. The uniform representation of carbohydrate molecules in the PDB described herein will facilitate broader usage of the resource by the glycoscience community and researchers studying glycoproteins. 
    more » « less
  2. Summary

    High‐quality microbiome research relies on the integrity, management and quality of supporting data. Currently biobanks and culture collections have different formats and approaches to data management. This necessitates a standard data format to underpin research, particularly in line with the FAIR data standards of findability, accessibility, interoperability and reusability. We address the importance of a unified, coordinated approach that ensures compatibility of data between that needed by biobanks and culture collections, but also to ensure linkage between bioinformatic databases and the wider research community.

     
    more » « less
  3. null (Ed.)
    ABSTRACT The FaceBase Consortium was established by the National Institute of Dental and Craniofacial Research in 2009 as a ‘big data’ resource for the craniofacial research community. Over the past decade, researchers have deposited hundreds of annotated and curated datasets on both normal and disordered craniofacial development in FaceBase, all freely available to the research community on the FaceBase Hub website. The Hub has developed numerous visualization and analysis tools designed to promote integration of multidisciplinary data while remaining dedicated to the FAIR principles of data management (findability, accessibility, interoperability and reusability) and providing a faceted search infrastructure for locating desired data efficiently. Summaries of the datasets generated by the FaceBase projects from 2014 to 2019 are provided here. FaceBase 3 now welcomes contributions of data on craniofacial and dental development in humans, model organisms and cell lines. Collectively, the FaceBase Consortium, along with other NIH-supported data resources, provide a continuously growing, dynamic and current resource for the scientific community while improving data reproducibility and fulfilling data sharing requirements. 
    more » « less
  4. Abstract Background

    The proliferation of metagenomic sequencing technologies has enabled novel insights into the functional genomic potentials and taxonomic structure of microbial communities. However, cyberinfrastructure efforts to manage and enable the reproducible analysis of sequence data have not kept pace. Thus, there is increasing recognition of the need to make metagenomic data discoverable within machine-searchable frameworks compliant with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship. Although a variety of metagenomic web services exist, none currently leverage the hierarchically structured terminology encoded within common life science ontologies to programmatically discover data.

    Results

    Here, we integrate large-scale marine metagenomic datasets with community-driven life science ontologies into a novel FAIR web service. This approach enables the retrieval of data discovered by intersecting the knowledge represented within ontologies against the functional genomic potential and taxonomic structure computed from marine sequencing data. Our findings highlight various microbial functional and taxonomic patterns relevant to the ecology of prokaryotes in various aquatic environments.

    Conclusions

    In this work, we present and evaluate a novel Semantic Web architecture that can be used to ask novel biological questions of existing marine metagenomic datasets. Finally, the FAIR ontology searchable data products provided by our API can be leveraged by future research efforts.

     
    more » « less
  5. Abstract

    This paper reports on the development of a metadata application profile (AP), MetaFAIR, designed to support research data management (RDM) to make research data findable, accessible, interoperable, and reusable. The development of MetaFAIR followed a three‐step process that included learning about the characteristics of datasets from researchers to establish their context and requirements, as well as iterative design and testing with researchers' feedback. Guided by the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), MetaFAIR focuses on accommodating description needs particular to computational social science datasets while seeking to provide general enough elements to describe data collections across many different domains. In this paper, MetaFAIR is placed in the context of historical and recent developments in the areas of RDM and application profile creation; following this contextualization, the paper describes the central considerations and challenges of the MetaFAIR development process and discusses its significance for future work in RDM.

     
    more » « less