skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Building Community Consensus for Scientific Metadata with YAMZ
ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps.  more » « less
Award ID(s):
2118201
NSF-PAR ID:
10411850
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Data Intelligence
Volume:
5
Issue:
1
ISSN:
2641-435X
Page Range / eLocation ID:
242 to 260
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Persistent identifiers for research objects, researchers, organizations, and funders are the key to creating unambiguous and persistent connections across the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but metadata for existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. This paper introduces the global research infrastructure and demonstrates how repositories, and their user communities, can contribute to and benefit from connections to the global research infrastructure.

    The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here.

    Identifiers (Digital Object Identifiers, DOIs) for papers connected to datasets in Dryad have long been a critical part of the Dryad metadata creation and curation processes. Since 2019, the portion of datasets with connected papers has decreased from 100% to less than 40%. This decrease has significant ramifications for the re-curation efforts described above as connected papers have been an important source of metadata. In addition, missing connections to papers make understanding and re-using datasets more difficult.

    Connections between datasets and papers can be difficult to make because of time lags between submission and publication, lack of clear mechanisms for citing datasets and other research objects from papers, changing focus of researchers, and other obstacles. The Dryad community of members, i.e. users, research institutions, publishers, and funders have vested interests in identifying these connections and critical roles in the curation and re-curation efforts. Their engagement will be critical in building on the successes Dryad has already achieved and ensuring sustainable connectivity in the future.

     
    more » « less
  2. This paper describes Group Model Building (GMB) as an effective tool to bring together teams of researchers from different disciplines in theory‐building efforts. We propose that the simulation models, as well as other artefacts used during the modelling process, work as boundary objects useful to facilitate conversations among researchers of different disciplines, uncover insights, and build consensus on causal connections and actionable insights. In addition to providing a more robust theoretical basis for participatory system modelling as an approach to theory development in interdisciplinary work, we describe a study using GMB that illustrates its use. The assessment of the case suggests that system models provide interdisciplinary teams with opportunity to combine the strengths of qualitative and quantitative approaches to express theoretical issues, using an analytical meta‐language that permits iteratively building theory and testing its internal consistency. Moreover, the GMB process helps researchers navigate the tension between achieving interdisciplinary consensus (which often involves adding details) and building a parsimonious theory of the phenomenon under study. © 2018 John Wiley & Sons, Ltd.

     
    more » « less
  3. Garoufallou, E. (Ed.)
    Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that sup-port optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection contain-ing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving au-tomated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Inves-tigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the re-sults.ite this need, researchers seldom report their approaches for identi-fying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flex-ible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual back-ground, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implemen-tation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) 
    more » « less
  4. Abstract

    Interdisciplinary teams are on the rise as scientists attempt to address complex environmental issues. While the benefits of team science approaches are clear, researchers often struggle with its implementation, particularly for new team members. The challenges of large projects often weigh on the most vulnerable members of a team: trainees, including undergraduate students, graduate students, and post‐doctoral researchers. Trainees on big projects have to navigate their role on the team, with learning project policies, procedures, and goals, all while also training in key scientific tasks such as co‐authoring papers. To address these challenges, we created and participated in a project‐specific, graduate‐level team science course. The purposes of this course were to: (1) introduce students to the goals of the project, (2) build trainees' understanding of how big projects operate, and (3) allow trainees to explore how their research interests dovetailed with the overall project. Additionally, trainees received training regarding: (1) diversity, equity & inclusion, (2) giving and receiving feedback, and (3) effective communication. Onboarding through the team science course cultivated psychological safety and a collaborative student community across disciplines and institutions. Thus, we recommend a team science course for onboarding students to big projects to help students establish the skills necessary for collaborative research. Project‐based team science classes can benefit student advancement, enhance the productivity of the project, and accelerate the discovery of solutions to ecological issues by building community, establishing a shared project vocabulary, and building a workforce with collaborative skills to better answer ecological research questions.

     
    more » « less
  5. Improved RNA virus understanding is critical to studying animal and plant health, and environmental processes. However, the continuous and rapid RNA virus evolution makes their identification and characterization challenging. While recent sequence-based advances have led to extensive RNA virus discovery, there is growing variation in how RNA viruses are identified, analyzed, characterized, and reported. To this end, an RdRp Summit was organized and a hybrid meeting took place in Valencia, Spain in May 2023 to convene leading experts with emphasis on early career researchers (ECRs) across diverse scientific communities. Here we synthesize key insights and recommendations and offer these as a first effort to establish a consensus framework for advancing RNA virus discovery. First, we need interoperability through standardized methodologies, data-sharing protocols, metadata provision and interdisciplinary collaborations and offer specific examples as starting points. Second, as an emergent field, we recognize the need to incorporate cutting-edge technologies and knowledge early and often to improve omic-based viral detection and annotation as novel capabilities reveal new biology. Third, we underscore the significance of ECRs in fostering international partnerships to promote inclusivity and equity in virus discovery efforts. The proposed consensus framework serves as a roadmap for the scientific community to collectively contribute to the tremendous challenge of unveiling the RNA virosphere.

     
    more » « less