skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Open-Source Workflow for Spatiotemporal Studies with COVID-19 as an Example
Many previous studies have shown that open-source technologies help democratize information and foster collaborations to enable addressing global physical and societal challenges. The outbreak of the novel coronavirus has imposed unprecedented challenges to human society. It affects every aspect of livelihood, including health, environment, transportation, and economy. Open-source technologies provide a new ray of hope to collaboratively tackle the pandemic. The role of open source is not limited to sharing a source code. Rather open-source projects can be adopted as a software development approach to encourage collaboration among researchers. Open collaboration creates a positive impact in society and helps combat the pandemic effectively. Open-source technology integrated with geospatial information allows decision-makers to make strategic and informed decisions. It also assists them in determining the type of intervention needed based on geospatial information. The novelty of this paper is to standardize the open-source workflow for spatiotemporal research. The highlights of the open-source workflow include sharing data, analytical tools, spatiotemporal applications, and results and formalizing open-source software development. The workflow includes (i) developing open-source spatiotemporal applications, (ii) opening and sharing the spatiotemporal resources, and (iii) replicating the research in a plug and play fashion. Open data, open analytical tools and source code, and publicly accessible results form the foundation for this workflow. This paper also presents a case study with the open-source spatiotemporal application development for air quality analysis in California, USA. In addition to the application development, we shared the spatiotemporal data, source code, and research findings through the GitHub repository.  more » « less
Award ID(s):
1841520
PAR ID:
10398242
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ISPRS International Journal of Geo-Information
Volume:
11
Issue:
1
ISSN:
2220-9964
Page Range / eLocation ID:
13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The reproducibility and replicability (R&R) crisis poses a significant challenge across disciplines, particularly in spatiotemporal studies. This paper focuses on the unique challenges within spatiotemporal research in the context of R&R, including data availability, methodological conception transparency, interdisciplinary collaboration complexities, the balance between R&R and innovation, and R&R education. Recognizing the potential of Scientific Workflow Management Systems (SWMS) to enhance R&R, we introduce a pioneering SWMS-based integrated spatiotemporal research approach (SISRA) utilizing KNIME, an open-source SWMS, to tackle these R&R challenges. First, we developed a set of KNIME extensions, including Geospatial and Dataverse extensions, to enhance spatiotemporal software availability in SWMS. Then we created spatial data virtual laboratory architecture to support multidisciplinary collaboration. Finally, we suggested a geographical research lifecycle that integrates SWMS-based methods to improve practices, efficiency, and innovation in R&R research and education. Our approach exemplifies how executable workflows can not only alleviate the R&R burden on researchers but also strengthen R&R education in geographical research, illustrating the benefits of our approach in training, teaching, and multidisciplinary collaboration. 
    more » « less
  2. Abstract The Spatial Data Lab (SDL) project is a collaborative initiative by the Center for Geographic Analysis at Harvard University, KNIME, Future Data Lab, China Data Institute, and George Mason University. Co-sponsored by the NSF IUCRC Spatiotemporal Innovation Center, SDL aims to advance applied research in spatiotemporal studies across various domains such as business, environment, health, mobility, and more. The project focuses on developing an open-source infrastructure for data linkage, analysis, and collaboration. Key objectives include building spatiotemporal data services, a reproducible, replicable, and expandable (RRE) platform, and workflow-driven data analysis tools to support research case studies. Additionally, SDL promotes spatiotemporal data science training, cross-party collaboration, and the creation of geospatial tools that foster inclusivity, transparency, and ethical practices. Guided by an academic advisory committee of world-renowned scholars, the project is laying the foundation for a more open, effective, and robust scientific enterprise. 
    more » « less
  3. The Reproducible Software Environment (Resen) is an open-source software tool enabling computationally reproducible scientific results in the geospace science community. Resen was developed as part of a larger project called the Integrated Geoscience Observatory (InGeO), which aims to help geospace researchers bring together diverse datasets from disparate instruments and data repositories, with software tools contributed by instrument providers and community members. The main goals of InGeO are to remove barriers in accessing, processing, and visualizing geospatially resolved data from multiple sources using methodologies and tools that are reproducible. The architecture of Resen combines two mainstream open source software tools, Docker and JupyterHub, to produce a software environment that not only facilitates computationally reproducible research results, but also facilitates effective collaboration among researchers. In this technical paper, we discuss some challenges for performing reproducible science and a potential solution via Resen, which is demonstrated using a case study of a geospace event. Finally we discuss how the usage of mainstream, open-source technologies seems to provide a sustainable path towards enabling reproducible science compared to proprietary and closed-source software. 
    more » « less
  4. Open source software (OSS) is ubiquitous, serving as specialized applications nurtured by devoted user communities, and as digital infrastructure underlying platforms used by millions of people. OSS is developed, maintained, and extended through the contribution of independent developers as well as people from businesses, universities, government research institutions, and nonprofits. Despite its prevalence, the scope and impact of OSS are not currently well-measured. Recent policies of the U.S. Federal Government promote sharing of software code developed by or for the Federal Government. While the policy to promote reusing and sharing of software created with public funding is relatively new, public funding plays an important and not fully accounted role in the creation of OSS. This paper aims to measure the scope and value of OSS development in the U.S. Federal Government. We collect data from Code.gov, the government’s platform for sharing OSS projects, and study contributions of agencies. The dataset contains 17K repositories from 21 agencies, with the majority of contributions originating from the DOE, NASA and GSA. In addition, we collect data on development activity (e.g., lines of code, contributors) of the repositories on GitHub, the largest hosting facility worldwide. Adopting a cost estimation model from software engineering, we generate estimates of investment in OSS that are consistent with the U.S. national accounting methods used for measuring software investment. Finally, we generate and analyze collaboration network resulting from cross-agency contributions to repositories and explore the centrality of agencies in the network. 
    more » « less
  5. null (Ed.)
    The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community. 
    more » « less