What is the relationship between Data Management Plans (DMPs), DMP guidance documents, and the reality of end-of-project data preservation and access? In this short paper we report on some preliminary findings of a 3-year investigation into the impact of DMPs on federally funded science in the United States. We investigated a small sample of publicly accessible DMPs (N=14) published using DMPTool. We found that while DMPs followed the National Science Foundation's guidelines, the pathways to the resulting research data are often obscure, vague, or not obvious. We define two “data pathways” as the search tactics and strategies deployed in order to find datasets.
more »
« less
Evaluating Tools for Data Management Plans: A Comparative Study of the DART Rubric and the Belmont Scorecard
Data management plans (DMPs) are required from researchers seeking funding from federal agencies in the United States. Ideally, DMPs disclose how research outputs will be managed and shared. How well DMPs communicate those plans is less understood. Evaluation tools such as the DART rubric and the Belmont scorecard assess the completeness of DMPs and offer one view into what DMPs communicate. This paper compares the evaluation criteria of the two tools by applying them to the same corpus of 150 DMPs from five different NSF programs. Findings suggest that the DART rubric and the Belmont score overlap significantly, but the Belmont scorecard provides a better method to assess completeness. We find that most DMPs fail to address many of the best practices that are articulated by librarians and information professionals in the different evaluation tools. However, the evaluation methodology of both tools relies on a rating scale that does not account for the interaction of key areas of data management. This work contributes to the improvement of evaluation tools for data management planning.
more »
« less
- PAR ID:
- 10546297
- Editor(s):
- Sserwanga, I
- Publisher / Repository:
- Springer Nature Switzerland AG
- Date Published:
- ISBN:
- 978-3-031-28032-0
- Format(s):
- Medium: X
- Location:
- https://doi.org/10.1007/978-3-031-28032-0_3
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract It has become common for researchers to make their data publicly available to meet the data management and accessibility requirements of funding agencies and scientific publishers. However, many researchers face the challenge of determining what data to preserve and share and where to preserve and share those data. This can be especially challenging for those who run dynamical models, which can produce complex, voluminous data outputs, and have not considered what outputs may need to be preserved and shared as part of the project design. This manuscript presents findings from the NSF EarthCube Research Coordination Network project titled “What About Model Data? Best Practices for Preservation and Replicability” (https://modeldatarcn.github.io/). These findings suggest that if the primary goal of sharing data are to communicate knowledge, most simulation-based research projects only need to preserve and share selected model outputs along with the full simulation experiment workflow. One major result of this project has been the development of a rubric, designed to provide guidance for making decisions on what simulation output needs to be preserved and shared in trusted community repositories to achieve the goal of knowledge communication. This rubric, along with use cases for selected projects, provide scientists with guidance on data accessibility requirements in the planning process of research, allowing for more thoughtful development of data management plans and funding requests. Additionally, this rubric can be referred to by publishers for what is expected in terms of data accessibility for publication.more » « less
-
Data Management Plans (DMP) are now a routine part of research proposals but are generally not referred to after funding is granted. The Belmont Forum requires an extensive document, a ‘Data and Digital Object Management Plan’ (D(DO)MP) for its awarded projects that is expected to be kept current over the life of the project. The D(DO)MP is intended to record team decisions about major tools and practices to be used over the life of the project for data and software stewardship, and for preservation of data and software products, aligned with the desired Open Science outcomes relevant to the project. Here we present one of the first instances of the use of Belmont’s D(DO)MP through a case study of the PARSEC project, a multinational and multidisciplinary investigation of the socioeconomic impacts of protected areas. We describe the development and revision of our interpretation of the D(DO)MP and discuss its adoption and acceptance by our research group. We periodically assessed the data management sophistication of team members and their use of the various nominated tools and practices. As a result, for example, we included summaries to enable the key components of the D(DO)MP to be readily viewed by the researcher. To meet the Open Science outcomes in a complex project like PARSEC, a comprehensive and appropriately structured D(DO)MP helps project leaders (a) ensure that team members are committed to the collaboration goals of the project, (b) that there is regular and effective feedback within the team, (c) training in new tools is provided as and when needed, and (d) there is easy access to a short reference to the tools and descriptions of the nominated practices.more » « less
-
This poster reports on ongoing research into the National Science Foundation’s Data Management Plan guidelines and its impact on science data lifecycles. We ask two research questions (RQs): 1) How does guidance about the formulation of DMPs vary across different research areas? And 2) How has guidance about the management of data changed since the first DMP policies were published in 2011? To this end, we collected, examined, and compared 37 DMP guidance policies from 15 different research areas. We identify the following three themes during document analysis: 1) Responsibility for the future of data; 2) Data maintenance changes over time; and 3) The use of data repositories. Based on these preliminary findings we believe that National Science Foundation guidance policies represent a unique view into changes in data management practices over the last decade.more » « less
-
The increasing amount of data and the growing use of them in the information era have raised questions about the quality of data and its impact on the decision-making process. Currently, the importance of high-quality data is widely recognized by researchers and decision-makers. Sewer inspection data have been collected for over three decades, but the reliability of the data was questionable. It was estimated that between 25% and 50% of sewer inspection data is not usable due to data quality problems. In order to address reliability problems, a data quality evaluation framework is developed. Data quality evaluation is a multi-dimensional concept that includes both subjective perceptions and objective measurements. Five data quality metrics were defined to assess different quality dimensions of the sewer inspection data, including Accuracy, Consistency, Completeness, Uniqueness, and Validity. These data quality metrics were calculated for the collected sewer inspection data, and it was found that consistency and uniqueness are the major problems based on the current practices with sewer pipeline inspection. This paper contributes to the overall body of knowledge by providing a robust data quality evaluation framework for sewer system data for the first time, which will result in quality data for sewer asset management.more » « less