skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Value of a Data and Digital Object Management Plan (D(DO)MP) in Fostering Sharing Practices in a Multidisciplinary Multinational Project
Data Management Plans (DMP) are now a routine part of research proposals but are generally not referred to after funding is granted. The Belmont Forum requires an extensive document, a ‘Data and Digital Object Management Plan’ (D(DO)MP) for its awarded projects that is expected to be kept current over the life of the project. The D(DO)MP is intended to record team decisions about major tools and practices to be used over the life of the project for data and software stewardship, and for preservation of data and software products, aligned with the desired Open Science outcomes relevant to the project. Here we present one of the first instances of the use of Belmont’s D(DO)MP through a case study of the PARSEC project, a multinational and multidisciplinary investigation of the socioeconomic impacts of protected areas. We describe the development and revision of our interpretation of the D(DO)MP and discuss its adoption and acceptance by our research group. We periodically assessed the data management sophistication of team members and their use of the various nominated tools and practices. As a result, for example, we included summaries to enable the key components of the D(DO)MP to be readily viewed by the researcher. To meet the Open Science outcomes in a complex project like PARSEC, a comprehensive and appropriately structured D(DO)MP helps project leaders (a) ensure that team members are committed to the collaboration goals of the project, (b) that there is regular and effective feedback within the team, (c) training in new tools is provided as and when needed, and (d) there is easy access to a short reference to the tools and descriptions of the nominated practices.  more » « less
Award ID(s):
1929464 1831937
PAR ID:
10525546
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
CODATA
Date Published:
Journal Name:
Data Science Journal
Volume:
22
ISSN:
1683-1470
Subject(s) / Keyword(s):
Digital object management DMP collaboration transnational inter/multidisciplinary Open Science
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Automated experimentation methods are unlocking a new data-rich research paradigm in materials science that promises to accelerate the pace of materials discovery. However, if our data management practices do not keep pace with progress in automation, this revolution threatens to drown us in unusable data. In this perspective, we highlight the need to update data management practices to track, organize, process, and share data collected from laboratories with deeply integrated automation equipment. We argue that a holistic approach to data management that integrates multiple scales (experiment, group and community scales) is needed. We propose a vision for what this integrated data future could look like and compare existing work against this vision to find gaps in currently available data management tools. To realize this vision, we believe that development of standard protocols for communicating with equipment and data sharing, the development of new open-source software tools for managing data in research groups, and leadership and direction from funding agencies and other organizations are needed. 
    more » « less
  2. There is strong agreement across the sciences that replicable workflows are needed for computational modeling. Open and replicable workflows not only strengthen public confidence in the sciences, but also result in more efficient community science. However, the massive size and complexity of geoscience simulation outputs, as well as the large cost to produce and preserve these outputs, present problems related to data storage, preservation, duplication, and replication. The simulation workflows themselves present additional challenges related to usability, understandability, documentation, and citation. These challenges make it difficult for researchers to meet the bewildering variety of data management requirements and recommendations across research funders and scientific journals. This paper introduces initial outcomes and emerging themes from the EarthCube Research Coordination Network project titled “What About Model Data? - Best Practices for Preservation and Replicability,” which is working to develop tools to assist researchers in determining what elements of geoscience modeling research should be preserved and shared to meet evolving community open science expectations. Specifically, the paper offers approaches to address the following key questions: • How should preservation of model software and outputs differ for projects that are oriented toward knowledge production vs. projects oriented toward data production? • What components of dynamical geoscience modeling research should be preserved and shared? • What curation support is needed to enable sharing and preservation for geoscience simulation models and their output? • What cultural barriers impede geoscience modelers from making progress on these topics? 
    more » « less
  3. Abstract. Reproducible open science with FAIR data sharing principles requires research to be disseminated with open data and standardised metadata. Researchers in the geographic sciences may benefit from authoring and maintaining metadata from the earliest phases of the research life cycle, rather than waiting until the data dissemination phase. Fully open and reproducible research should be conducted within a version-controlled executable research compendium with registered pre-analysis plans, and may also involve research proposals, data management plans, and protocols for research with human subjects. We review metadata standards and research documentation needs through each phase of the research process to distil a list of features for software to support a metadata-rich open research life cycle. The review is based on open science and reproducibility literature and on our own work developing a template research compendium for conducting reproduction and replication studies. We then review available open source geographic metadata software against these requirements, finding each software program to offer a partial solution. We conclude with a vision for software-supported metadata-rich open research practices intended to reduce redundancies in open research work while expanding transparency and reproducibility in geographic research. 
    more » « less
  4. null (Ed.)
    In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted (and given up) by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions, and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption? In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the teamlevel determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions. 
    more » « less
  5. The Reproducible Software Environment (Resen) is an open-source software tool enabling computationally reproducible scientific results in the geospace science community. Resen was developed as part of a larger project called the Integrated Geoscience Observatory (InGeO), which aims to help geospace researchers bring together diverse datasets from disparate instruments and data repositories, with software tools contributed by instrument providers and community members. The main goals of InGeO are to remove barriers in accessing, processing, and visualizing geospatially resolved data from multiple sources using methodologies and tools that are reproducible. The architecture of Resen combines two mainstream open source software tools, Docker and JupyterHub, to produce a software environment that not only facilitates computationally reproducible research results, but also facilitates effective collaboration among researchers. In this technical paper, we discuss some challenges for performing reproducible science and a potential solution via Resen, which is demonstrated using a case study of a geospace event. Finally we discuss how the usage of mainstream, open-source technologies seems to provide a sustainable path towards enabling reproducible science compared to proprietary and closed-source software. 
    more » « less