skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MAINSTREAMING METADATA INTO RESEARCH WORKFLOWS TO ADVANCE REPRODUCIBILITY AND OPEN GEOGRAPHIC INFORMATION SCIENCE
Abstract. Reproducible open science with FAIR data sharing principles requires research to be disseminated with open data and standardised metadata. Researchers in the geographic sciences may benefit from authoring and maintaining metadata from the earliest phases of the research life cycle, rather than waiting until the data dissemination phase. Fully open and reproducible research should be conducted within a version-controlled executable research compendium with registered pre-analysis plans, and may also involve research proposals, data management plans, and protocols for research with human subjects. We review metadata standards and research documentation needs through each phase of the research process to distil a list of features for software to support a metadata-rich open research life cycle. The review is based on open science and reproducibility literature and on our own work developing a template research compendium for conducting reproduction and replication studies. We then review available open source geographic metadata software against these requirements, finding each software program to offer a partial solution. We conclude with a vision for software-supported metadata-rich open research practices intended to reduce redundancies in open research work while expanding transparency and reproducibility in geographic research.  more » « less
Award ID(s):
2049837
PAR ID:
10412652
Author(s) / Creator(s):
;
Date Published:
Journal Name:
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Volume:
XLVIII-4/W1-2022
ISSN:
2194-9034
Page Range / eLocation ID:
201 to 208
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Thanks to increasing awareness of the importance of reproducibility in computer science research, initiatives such as artifact review and badging have been introduced to encourage reproducible research in this field. However, making "practical reproducibility" truly widespread requires more than just incentives. It demands an increase in capacity for reproducible research among computer scientists - more tools, workflows, and exemplar artifacts, and more human resources trained in best practices for reproducibility. In this paper, we describe our experiences in the first two years of the Summer of Reproducibility (SoR), a mentoring program that seeks to build global capacity by enabling students around the world to work with expert mentors while producing reproducibility artifacts, tools, and education materials. We give an overview of the program, report preliminary outcomes, and discuss plans to evolve this program. 
    more » « less
  2. Despite recent calls to make geographical analyses more reproducible, formal attempts to reproduce or replicate published work remain largely absent from the geographic literature. The reproductions of geographic research that do exist typically focus on computational reproducibility—whether results can be recreated using data and code provided by the authors—rather than on evaluating the conclusion and internal validity and evidential value of the original analysis. However, knowing if a study is computationally reproducible is insufficient if the goal of a reproduction is to identify and correct errors in our knowledge. We argue that reproductions of geographic work should focus on assessing whether the findings and claims made in existing empirical studies are well supported by the evidence presented. We aim to facilitate this transition by introducing a model framework for conducting reproduction studies, demonstrating its use, and reporting the findings of three exemplar studies. We present three model reproductions of geographical analyses of COVID‐19 based on a common, open access template. Each reproduction attempt is published as an open access repository, complete with pre‐analysis plan, data, code, and final report. We find each study to be partially reproducible, but moving past computational reproducibility, our assessments reveal conceptual and methodological concerns that raise questions about the predictive value and the magnitude of the associations presented in each study. Collectively, these reproductions and our template materials offer a practical framework others can use to reproduce and replicate empirical spatial analyses and ultimately facilitate the identification and correction of errors in the geographic literature. 
    more » « less
  3. Within the field of education technology, learning analytics has increased in popularity over the past decade. Researchers conduct experiments and develop software, building on each other’s work to create more intricate systems. In parallel, open science — which describes a set of practices to make research more open, transparent, and reproducible — has exploded in recent years, resulting in more open data, code, and materials for researchers to use. However, without prior knowledge of open science, many researchers do not make their datasets, code, and materials openly available, and those that are available are often difficult, if not impossible, to reproduce. The purpose of the current study was to take a close look at our field by examining previous papers within the proceedings of the International Conference on Learning Analytics and Knowledge, and document the rate of open science adoption (e.g., preregistration, open data), as well as how well available data and code could be reproduced. Specifically, we examined 133 research papers, allowing ourselves 15 minutes for each paper to identify open science practices and attempt to reproduce the results according to their provided specifications. Our results showed that less than half of the research adopted standard open science principles, with approximately 5% fully meeting some of the defined principles. Further, we were unable to reproduce any of the papers successfully in the given time period. We conclude by providing recommendations on how to improve the reproducibility of our research as a field moving forward. All openly accessible work can be found in an Open Science Foundation project1. 
    more » « less
  4. Abstract Open science and open data within scholarly research programs are growing both in popularity and by requirement from grant funding agencies and journal publishers. A central component of open data management, especially on collaborative, multidisciplinary, and multi-institutional science projects, is documentation of complete and accurate metadata, workflow, and source code in addition to access to raw data and data products to uphold FAIR (Findable, Accessible, Interoperable, Reusable) principles. Although best practice in data/metadata management is to use established internationally accepted metadata schemata, many of these standards are discipline-specific making it difficult to catalog multidisciplinary data and data products in a way that is easily findable and accessible. Consequently, scattered and incompatible metadata records create a barrier to scientific innovation, as researchers are burdened to find and link multidisciplinary datasets. One possible solution to increase data findability, accessibility, interoperability, reproducibility, and integrity within multi-institutional and interdisciplinary projects is a centralized and integrated data management platform. Overall, this type of interoperable framework supports reproducible open science and its dissemination to various stakeholders and the public in a FAIR manner by providing direct access to raw data and linking protocols, metadata and supporting workflow materials. 
    more » « less
  5. Despite increased efforts to assess the adoption rates of open science and robustness of reproducibility in sub-disciplines of education technology, there is a lack of understanding of why some research is not reproducible. Prior work has taken the first step toward assessing reproducibility of research, but has assumed certain constraints which hinder its discovery. Thus, the purpose of this study was to replicate previous work on papers within the proceedings of the International Conference on Educational Data Mining and develop metrics to accurately report on which papers are reproducible and why. Specifically, we examined 208 papers, attempted to reproduce them, documented reasons for reproducibility failures, and asked authors to provide additional information needed to reproduce their study. Our results showed that out of 12 papers that were potentially reproducible, only one successfully reproduced all analyses, and another two reproduced most of the analyses. The most common failure for reproducibility was failure to mention libraries needed, followed by non-seeded randomness. All openly accessible work can be found in an Open Science Foundation project1. 
    more » « less