skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reproducible Software Environment: a tool enabling computational reproducibility in geospace sciences and facilitating collaboration
The Reproducible Software Environment (Resen) is an open-source software tool enabling computationally reproducible scientific results in the geospace science community. Resen was developed as part of a larger project called the Integrated Geoscience Observatory (InGeO), which aims to help geospace researchers bring together diverse datasets from disparate instruments and data repositories, with software tools contributed by instrument providers and community members. The main goals of InGeO are to remove barriers in accessing, processing, and visualizing geospatially resolved data from multiple sources using methodologies and tools that are reproducible. The architecture of Resen combines two mainstream open source software tools, Docker and JupyterHub, to produce a software environment that not only facilitates computationally reproducible research results, but also facilitates effective collaboration among researchers. In this technical paper, we discuss some challenges for performing reproducible science and a potential solution via Resen, which is demonstrated using a case study of a geospace event. Finally we discuss how the usage of mainstream, open-source technologies seems to provide a sustainable path towards enabling reproducible science compared to proprietary and closed-source software.  more » « less
Award ID(s):
1835573 1933013
PAR ID:
10189291
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Journal of Space Weather and Space Climate
Volume:
10
ISSN:
2115-7251
Page Range / eLocation ID:
12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. One of the biggest barriers to conducting ocean science around the globe is limited access to computational tools and resources, including software, computing infrastructure, and data. Open tools, such as open-source software, open data, and online computing resources, offer promising solutions toward more equitable access to scientific resources. Here, we discuss the enabling power of these tools in under-resourced and non-English speaking regions, based on experience gained in the organization of three independent programs in West African, Latin American, and Indian Ocean nations. These programs have embraced the “hackweek” learning model that bridges the gap between data science and domain applications. Hackweeks function as knowledge exchange forums and foster meaningful international and regional connections among scientists. Lessons learned across the three case studies include the importance of using open computational and data resources, tailoring programs to regional and cultural differences, and the benefits and challenges of using cloud-based infrastructure. Sharing capacity in marine open data science through the regional hackweek approach can expand the participation of more diverse scientific communities and help incorporate different perspectives and broader solutions to threats to marine ecosystems and communities. 
    more » « less
  2. Abstract Science Gateways provide an easily accessible and powerful computing environment for researchers. These are built around a set of software tools that are frequently and heavily used by large number of researchers in specific domains. Science Gateways have been catering to a growing need of researchers for easy to use computational tools, however their usage model is typically single user-centric. As scientific research becomes ever more team oriented, the need driven by user-demand to support integrated collaborative capabilities in Science Gateways is natural progression. Ability to share data/results with others in an integrated manner is an important and frequently requested capability. In this article we will describe and discuss our work to provide a rich environment for data organization and data sharing by integrating the SeedMeLab (formerly SeedMe2) platform with two Science Gateways: CIPRES and GenApp. With this integration we also demonstrate SeedMeLab’s extensible features and how Science Gateways may incorporate and realize FAIR data principles in practice and transform into community data hubs. 
    more » « less
  3. Abstract. Reproducible open science with FAIR data sharing principles requires research to be disseminated with open data and standardised metadata. Researchers in the geographic sciences may benefit from authoring and maintaining metadata from the earliest phases of the research life cycle, rather than waiting until the data dissemination phase. Fully open and reproducible research should be conducted within a version-controlled executable research compendium with registered pre-analysis plans, and may also involve research proposals, data management plans, and protocols for research with human subjects. We review metadata standards and research documentation needs through each phase of the research process to distil a list of features for software to support a metadata-rich open research life cycle. The review is based on open science and reproducibility literature and on our own work developing a template research compendium for conducting reproduction and replication studies. We then review available open source geographic metadata software against these requirements, finding each software program to offer a partial solution. We conclude with a vision for software-supported metadata-rich open research practices intended to reduce redundancies in open research work while expanding transparency and reproducibility in geographic research. 
    more » « less
  4. Software engineering has long studied how software developers work, building a body of work which forms the foundation of many software engineering best practices, tools, and theories. Recently, some developers have begun recording videos of themselves engaged in programming tasks contributing to open source projects, enabling them to share knowledge and socialize with other developers. We believe that these videos offer an important opportunity for both software engineering research and education. In this paper, we discuss the potential use of these videos as well as open questions for how to best enable this envisioned use. We propose creating a central repository of programming videos, enabling analyzing and annotating videos to illustrate specific behaviors of interest such as asking and answering questions, employing strategies, and software engineering theories. Such a repository would offer an important new way in which both software engineering researchers and students can understand how software developers work. 
    more » « less
  5. Abstract The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. “Digital Microbes” are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacteriumRuegeria pomeroyiDSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotrophAlteromonascontaining 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings. 
    more » « less