This study focuses on the process of updating and upgrading a large-scale legacy software system to ensure its compatibility with modern computing environments. The evolution and maintenance of legacy software pose significant challenges in software engineering, especially given the rapid advancements in technology, computing platforms, and dependent libraries. These challenges become even more pronounced when new systems are built upon existing open-source software, which may become outdated due to discontinued maintenance or lack of community support. In this work, we examine the problem from a sustainable computing perspective through the case study of the CyberWater project—an innovative cyberinfrastructure framework designed to support open data access and open model integration in water science and engineering. CyberWater is built on top of VisTrails, an open-source scientific workflow system. VisTrails has not been actively maintained since 2017, requiring an upgrade to ensure CyberWater’s continued functionality, compatibility, and long-term sustainability. This paper presents our work on upgrading VisTrails, including the complete upgrade process, tools developed and utilized, testing strategies, and the final outcomes. We also share key experiences and lessons learned, with a focus on the sustainability challenges and considerations that arise when maintaining and evolving large-scale open-source software systems in scientific computing environments.
more »
« less
The Big Effects of Short-term Efforts: Mentorship and Code Integration in Open Source Scientific Software
Scientific progress relies crucially on software, yet in practice there are significant challenges to scientific software production and maintenance. We conducted a case study of a bioinformatics software library called Biopython to investigate the promise of Google Summer of Code (GSoC), a program that pays students to work on open-source projects for the summer, for addressing these challenges. We find three positive outcomes of GSoC in the Biopython community: the addition of new features to the Biopython codebase, training, and personal development. We also find, however, that mentors face several challenges related to GSoC project selection and ranking. We believe that because GSoC provides an occasion to extend the software with capabilities that can be used to produce new knowledge, and to train successive generations of potential contributors to the software, it can play a vital role in the sustainability of open-source scientific software.
more »
« less
- PAR ID:
- 10038298
- Date Published:
- Journal Name:
- Journal of open research software
- ISSN:
- 2049-9647
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Reproducibility of results is a cornerstone of the scientific method. Scientific computing encounters two challenges when aiming for this goal. Firstly, reproducibility should not depend on details of the runtime environment, such as the compiler version or computing environment, so results are verifiable by third-parties. Secondly, different versions of software code executed in the same runtime environment should produce consistent numerical results for physical quantities. In this manuscript, we test the feasibility of reproducing scientific results obtained using the IllinoisGRMHD code that is part of an open-source community software for simulation in relativistic astrophysics, the Einstein Toolkit. We verify that numerical results of simulating a single isolated neutron star with IllinoisGRMHD can be reproduced, and compare them to results reported by the code authors in 2015. We use two different supercomputers: Expanse at SDSC, and Stampede2 at TACC. By compiling the source code archived along with the paper on both Expanse and Stampede2, we find that IllinoisGRMHD reproduces results published in its announcement paper up to errors comparable to round-off level changes in initial data parameters. We also verify that a current version of IllinoisGRMHD reproduces these results once we account for bug fixes which have occurred since the original publication.more » « less
-
Science policy makers are looking for approaches to increase the extent of collaboration in the production of scientific software, looking to open collaborations in open source software for inspiration. We examine the software ecosystem surrounding BLAST, a key bioinformatics tool, identifying outside improvements and interviewing their authors. We find that academic credit is a powerful motivator for the production and revealing of improvements. Yet surprisingly, we also find that improvements motivated by academic credit are less likely to be integrated than those with other motivations, including financial gain. We argue that this is because integration makes it harder to see who has contributed what and thereby undermines the ability of reputation to function as a reward for collaboration. We consider how open source avoids these issues and conclude with policy approaches to promoting wider collaboration by addressing incentives for integration.more » « less
-
Scientific software is essential to scientific innovation and in many ways it is distinct from other types of software. Abandoned (or unmaintained), buggy, and hard to use software, a perception often associated with scientific software can hinder scientific progress, yet, in contrast to other types of software, its longevity is poorly understood. Existing data curation efforts are fragmented by science domain and/or are small in scale and lack key attributes. We use large language models to classify public software repositories in World of Code into distinct scientific domains and layers of the software stack, curating a large and diverse collection of over 18,000 scientific software projects. Using this data, we estimate survival models to understand how the domain, infrastructural layer, and other attributes of scientific software affect its longevity. We further obtain a matched sample of non-scientific software repositories and investigate the differences. We find that infrastructural layers, downstream dependencies, mentions of publications, and participants from government are associated with a longer lifespan, while newer projects with participants from academia had shorter lifespan. Against common expectations, scientific projects have a longer lifetime than matched non-scientific open-source software projects. We expect our curated attribute-rich collection to support future research on scientific software and provide insights that may help extend longevity of both scientific and other projects.more » « less
-
In computational physics, chemistry, and biology, the implementation of new techniques in shared and open-source software lowers barriers to entry and promotes rapid scientific progress. However, effectively training new software users presents several challenges. Common methods like direct knowledge transfer and in-person workshops are limited in reach and comprehensiveness. Furthermore, while the COVID-19 pandemic highlighted the benefits of online training, traditional online tutorials can quickly become outdated and may not cover all the software’s functionalities. To address these issues, here we introduce “PLUMED Tutorials,” a collaborative model for developing, sharing, and updating online tutorials. This initiative utilizes repository management and continuous integration to ensure compatibility with software updates. Moreover, the tutorials are interconnected to form a structured learning path and are enriched with automatic annotations to provide broader context. This paper illustrates the development, features, and advantages of PLUMED Tutorials, aiming to foster an open community for creating and sharing educational resources.more » « less
An official website of the United States government

