skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 1, 2026

Title: Sustaining CyberWater-VisTrails: A Case Study in Software Upgrades and Reengineering
This study focuses on the process of updating and upgrading a large-scale legacy software system to ensure its compatibility with modern computing environments. The evolution and maintenance of legacy software pose significant challenges in software engineering, especially given the rapid advancements in technology, computing platforms, and dependent libraries. These challenges become even more pronounced when new systems are built upon existing open-source software, which may become outdated due to discontinued maintenance or lack of community support. In this work, we examine the problem from a sustainable computing perspective through the case study of the CyberWater project—an innovative cyberinfrastructure framework designed to support open data access and open model integration in water science and engineering. CyberWater is built on top of VisTrails, an open-source scientific workflow system. VisTrails has not been actively maintained since 2017, requiring an upgrade to ensure CyberWater’s continued functionality, compatibility, and long-term sustainability. This paper presents our work on upgrading VisTrails, including the complete upgrade process, tools developed and utilized, testing strategies, and the final outcomes. We also share key experiences and lessons learned, with a focus on the sustainability challenges and considerations that arise when maintaining and evolving large-scale open-source software systems in scientific computing environments.  more » « less
Award ID(s):
2209833 1835785
PAR ID:
10657137
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Information
Volume:
16
Issue:
11
ISSN:
2078-2489
Page Range / eLocation ID:
988
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nowadays, there is a fast-paced shift from legacy telecommunication systems to novel software-defined network (SDN) architectures that can support on-the-fly network reconfiguration, therefore, empowering advanced traffic engineering mechanisms. Despite this momentum, migration to SDN cannot be realized at once especially in high-end networks of Internet service providers (ISPs). It is expected that ISPs will gradually upgrade their networks to SDN over a period that spans several years. In this paper, we study the SDN upgrading problem in an ISP network: which nodes to upgrade and when we consider a general model that captures different migration costs and network topologies, and two plausible ISP objectives: 1) the maximization of the traffic that traverses at least one SDN node, and 2) the maximization of the number of dynamically selectable routing paths enabled by SDN nodes. We leverage the theory of submodular and supermodular functions to devise algorithms with provable approximation ratios for each objective. Using realworld network topologies and traffic matrices, we evaluate the performance of our algorithms and show up to 54% gains over state-of-the-art methods. Moreover, we describe the interplay between the two objectives; maximizing one may cause a factor of 2 loss to the other. We also study the dual upgrading problem, i.e., minimizing the upgrading cost for the ISP while ensuring specific performance goals. Our analysis shows that our proposed algorithm can achieve up to 2.5 times lower cost to ensure performance goals over state-of-the-art methods. 
    more » « less
  2. Scientific progress relies crucially on software, yet in practice there are significant challenges to scientific software production and maintenance. We conducted a case study of a bioinformatics software library called Biopython to investigate the promise of Google Summer of Code (GSoC), a program that pays students to work on open-source projects for the summer, for addressing these challenges. We find three positive outcomes of GSoC in the Biopython community: the addition of new features to the Biopython codebase, training, and personal development. We also find, however, that mentors face several challenges related to GSoC project selection and ranking. We believe that because GSoC provides an occasion to extend the software with capabilities that can be used to produce new knowledge, and to train successive generations of potential contributors to the software, it can play a vital role in the sustainability of open-source scientific software. 
    more » « less
  3. Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. Integrating existing open-source, scalable runtime tools and data frameworks on high-performance computing (HPC) platforms is crucial to address these challenges. Our objective is to establish a smooth and unified method of combining data engineering and deep learning frameworks with diverse execution capabilities that can be deployed on various high-performance computing platforms, including cloud and supercomputers. We aim to support heterogeneous systems with accelerators, where Cylon and other data engineering and deep learning frameworks can utilize heterogeneous execution. To achieve this, we propose Radical-Cylon, a heterogeneous runtime system with a parallel and distributed data framework to execute Cylon as a task of Radical Pilot. We thoroughly explain Radical-Cylon’s design and development and the execution process of Cylon tasks using Radical Pilot. This approach enables the use of heterogeneous MPI-Communicators across multiple nodes. Radical-Cylon achieves better performance than Bare-Metal Cylon with minimal and constant overhead. Radical-Cylon achieves (4 15)% faster execution time than batch execution while performing similar join and sort operations with 35 million and 3.5 billion rows with the same resources. The approach aims to excel in both scientific and engineering research HPC systems while demonstrating robust performance on cloud infrastructures. This dual capability fosters collaboration and innovation within the open-source scientific research community.Not Available 
    more » « less
  4. Abstract Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open‐source software can partially address these concerns, it can introduce complicating dependencies and delay development. These issues can be reduced if scientists and software developers collaborate. We present a case study wherein scientists from the SuperNova Early Warning System collaborated with software developers from the Scalable Cyberinfrastructure for Multi‐Messenger Astrophysics project. The collaboration addressed the difficulties of open‐source software development, but presented additional risks to each team. For the scientists, there was a concern of relying on external systems and lacking control in the development process. For the developers, there was a risk in supporting a user‐group while maintaining core development. These issues were mitigated by creating a second Agile Scrum framework in parallel with the developers' ongoing Agile Scrum process. This Agile collaboration promoted communication, ensured that the scientists had an active role in development, and allowed the developers to evaluate and implement the scientists' software requirements. The collaboration provided benefits for each group: the scientists actuated their development by using an existing platform, and the developers utilized the scientists' use‐case to improve their systems. This case study suggests that scientists and software developers can avoid scientific computing issues by collaborating and that Agile Scrum methods can address emergent concerns. 
    more » « less
  5. In computational physics, chemistry, and biology, the implementation of new techniques in shared and open-source software lowers barriers to entry and promotes rapid scientific progress. However, effectively training new software users presents several challenges. Common methods like direct knowledge transfer and in-person workshops are limited in reach and comprehensiveness. Furthermore, while the COVID-19 pandemic highlighted the benefits of online training, traditional online tutorials can quickly become outdated and may not cover all the software’s functionalities. To address these issues, here we introduce “PLUMED Tutorials,” a collaborative model for developing, sharing, and updating online tutorials. This initiative utilizes repository management and continuous integration to ensure compatibility with software updates. Moreover, the tutorials are interconnected to form a structured learning path and are enriched with automatic annotations to provide broader context. This paper illustrates the development, features, and advantages of PLUMED Tutorials, aiming to foster an open community for creating and sharing educational resources. 
    more » « less