Abstract Since 1971, the Protein Data Bank (PDB) has served as the single global archive for experimentally determined 3D structures of biological macromolecules made freely available to the global community according to the FAIR principles of Findability–Accessibility–Interoperability–Reusability. During the first 50 years of continuous PDB operations, standards for data representation have evolved to better represent rich and complex biological phenomena. Carbohydrate molecules present in more than 14,000 PDB structures have recently been reviewed and remediated to conform to a new standardized format. This machine-readable data representation for carbohydrates occurring in the PDB structures and the corresponding reference data improves the findability, accessibility, interoperability and reusability of structural information pertaining to these molecules. The PDB Exchange MacroMolecular Crystallographic Information File data dictionary now supports (i) standardized atom nomenclature that conforms to International Union of Pure and Applied Chemistry-International Union of Biochemistry and Molecular Biology (IUPAC-IUBMB) recommendations for carbohydrates, (ii) uniform representation of branched entities for oligosaccharides, (iii) commonly used linear descriptors of carbohydrates developed by the glycoscience community and (iv) annotation of glycosylation sites in proteins. For the first time, carbohydrates in PDB structures are consistently represented as collections of standardized monosaccharides, which precisely describe oligosaccharide structures and enable improved carbohydrate visualization, structure validation, robust quantitative and qualitative analyses, search for dendritic structures and classification. The uniform representation of carbohydrate molecules in the PDB described herein will facilitate broader usage of the resource by the glycoscience community and researchers studying glycoproteins. 
                        more » 
                        « less   
                    
                            
                            Enhancing the FAIRness of Arctic Research Data Through Semantic Annotation
                        
                    
    
            The National Science Foundation’s Arctic Data Center is the primary data repository for NSF-funded research conducted in the Arctic. There are major challenges in discovering and interpreting resources in a repository containing data as heterogeneous and interdisciplinary as those in the Arctic Data Center. This paper reports on advances in cyberinfrastructure at the Arctic Data Center that help address these issues by leveraging semantic technologies that enhance the repository’s adherence to the FAIR data principles and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources in the repository. We describe the Arctic Data Center’s improvements. We use semantic annotation to bind metadata about Arctic data sets with concepts in web-accessible ontologies. The Arctic Data Center’s implementation of a semantic annotation mechanism is accompanied by the development of an extended search interface that increases the findability of data by allowing users to search for specific, broader, and narrower meanings of measurement descriptions, as well as through their potential synonyms. Based on research carried out by the DataONE project, we evaluated the potential impact of this approach, regarding the accessibility, interoperability, and reusability of measurement data. Arctic research often benefits from having additional data, typically from multiple, heterogeneous sources, that complement and extend the bases – spatially, temporally, or thematically – for understanding Arctic phenomena. These relevant data resources must be 'found', and 'harmonized' prior to integration and analysis. The findings of a case study indicated that the semantic annotation of measurement data enhances the capabilities of researchers to accomplish these tasks. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10523942
- Publisher / Repository:
- Ubiquity Press
- Date Published:
- Journal Name:
- Data Science Journal
- Volume:
- 23
- ISSN:
- 1683-1470
- Subject(s) / Keyword(s):
- Arctic research data data discovery FAIR knowledge modeling semantic annotation data repository
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)ABSTRACT The FaceBase Consortium was established by the National Institute of Dental and Craniofacial Research in 2009 as a ‘big data’ resource for the craniofacial research community. Over the past decade, researchers have deposited hundreds of annotated and curated datasets on both normal and disordered craniofacial development in FaceBase, all freely available to the research community on the FaceBase Hub website. The Hub has developed numerous visualization and analysis tools designed to promote integration of multidisciplinary data while remaining dedicated to the FAIR principles of data management (findability, accessibility, interoperability and reusability) and providing a faceted search infrastructure for locating desired data efficiently. Summaries of the datasets generated by the FaceBase projects from 2014 to 2019 are provided here. FaceBase 3 now welcomes contributions of data on craniofacial and dental development in humans, model organisms and cell lines. Collectively, the FaceBase Consortium, along with other NIH-supported data resources, provide a continuously growing, dynamic and current resource for the scientific community while improving data reproducibility and fulfilling data sharing requirements.more » « less
- 
            Summary High‐quality microbiome research relies on the integrity, management and quality of supporting data. Currently biobanks and culture collections have different formats and approaches to data management. This necessitates a standard data format to underpin research, particularly in line with the FAIR data standards of findability, accessibility, interoperability and reusability. We address the importance of a unified, coordinated approach that ensures compatibility of data between that needed by biobanks and culture collections, but also to ensure linkage between bioinformatic databases and the wider research community.more » « less
- 
            Digital publishing platforms and internet resources enable openness of access to scientific findings and data at scales never before realized. Unfortunately, researchers sometimes embrace lock-in systems for data generation and analysis out of necessity because meaningful alternatives do not exist. Scientific advances still take place when this occurs, but they become fragmented with discordant quality control, interoperability, reproducibility, and democratization of access. To maximize the value of these—often—publicly funded resources, disciplines are turning to FAIR Guiding Principles for data stewardship. FAIR (Findability, Accessibility, Interoperability, and Reuse) promotes the added value of widespread data sharing that is transparent, equitable, and inclusive. Here we present NoCTURN, an NSF-funded FAIR Open Science Research Coordination Network for computed tomography users. NoCTURN (the Non-clinical Computed Tomography Users Research Network) aims to address the fragmentation of tomography toolkits stemming from proprietary software, non-uniform metadata formats, and repeatability limits. In this presentation, we outline how we will achieve this aim together by 1) developing a community committed to information sharing; 2) coordinating data analysis, storage, and reporting requirements; 3) highlighting underrepresented voices in the field; 4) developing community standards inclusive of industry, research, education, and outreach stake-holders; and 5) modeling FAIR open science strategies for our colleagues and students. NoCTURN is recruiting undergraduates through established investigators from X-ray-, neutron-, and synchrotron-beam computed tomography communities—and we want to hear from you.more » « less
- 
            Marine animal forests are benthic communities dominated by sessile suspension feeders (such as sponges, corals, and bivalves) able to generate three-dimensional (3D) frameworks with high structural complexity. The biodiversity and functioning of marine animal forests are strictly related to their 3D complexity. The present paper aims at providing new perspectives in underwater optical surveys. Starting from the current gaps in data collection and analysis that critically limit the study and conservation of marine animal forests, we discuss the main technological and methodological needs for the investigation of their 3D structural complexity at different spatial and temporal scales. Despite recent technological advances, it seems that several issues in data acquisition and processing need to be solved, to properly map the different benthic habitats in which marine animal forests are present, their health status and to measure structural complexity. Proper precision and accuracy should be chosen and assured in relation to the biological and ecological processes investigated. Besides, standardized methods and protocols are strictly necessary to meet the FAIR (findability, accessibility, interoperability, and reusability) data principles for the stewardship of habitat mapping and biodiversity, biomass, and growth data.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    