skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Big Effects of Short-term Efforts: Mentorship and Code Integration in Open Source Scientific Software
Scientific progress relies crucially on software, yet in practice there are significant challenges to scientific software production and maintenance. We conducted a case study of a bioinformatics software library called Biopython to investigate the promise of Google Summer of Code (GSoC), a program that pays students to work on open-source projects for the summer, for addressing these challenges. We find three positive outcomes of GSoC in the Biopython community: the addition of new features to the Biopython codebase, training, and personal development. We also find, however, that mentors face several challenges related to GSoC project selection and ranking. We believe that because GSoC provides an occasion to extend the software with capabilities that can be used to produce new knowledge, and to train successive generations of potential contributors to the software, it can play a vital role in the sustainability of open-source scientific software.  more » « less
Award ID(s):
1111750 0943168 1064209
PAR ID:
10038298
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of open research software
ISSN:
2049-9647
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reproducibility of results is a cornerstone of the scientific method. Scientific computing encounters two challenges when aiming for this goal. Firstly, reproducibility should not depend on details of the runtime environment, such as the compiler version or computing environment, so results are verifiable by third-parties. Secondly, different versions of software code executed in the same runtime environment should produce consistent numerical results for physical quantities. In this manuscript, we test the feasibility of reproducing scientific results obtained using the IllinoisGRMHD code that is part of an open-source community software for simulation in relativistic astrophysics, the Einstein Toolkit. We verify that numerical results of simulating a single isolated neutron star with IllinoisGRMHD can be reproduced, and compare them to results reported by the code authors in 2015. We use two different supercomputers: Expanse at SDSC, and Stampede2 at TACC. By compiling the source code archived along with the paper on both Expanse and Stampede2, we find that IllinoisGRMHD reproduces results published in its announcement paper up to errors comparable to round-off level changes in initial data parameters. We also verify that a current version of IllinoisGRMHD reproduces these results once we account for bug fixes which have occurred since the original publication. 
    more » « less
  2. Science policy makers are looking for approaches to increase the extent of collaboration in the production of scientific software, looking to open collaborations in open source software for inspiration. We examine the software ecosystem surrounding BLAST, a key bioinformatics tool, identifying outside improvements and interviewing their authors. We find that academic credit is a powerful motivator for the production and revealing of improvements. Yet surprisingly, we also find that improvements motivated by academic credit are less likely to be integrated than those with other motivations, including financial gain. We argue that this is because integration makes it harder to see who has contributed what and thereby undermines the ability of reputation to function as a reward for collaboration. We consider how open source avoids these issues and conclude with policy approaches to promoting wider collaboration by addressing incentives for integration. 
    more » « less
  3. In computational physics, chemistry, and biology, the implementation of new techniques in shared and open-source software lowers barriers to entry and promotes rapid scientific progress. However, effectively training new software users presents several challenges. Common methods like direct knowledge transfer and in-person workshops are limited in reach and comprehensiveness. Furthermore, while the COVID-19 pandemic highlighted the benefits of online training, traditional online tutorials can quickly become outdated and may not cover all the software’s functionalities. To address these issues, here we introduce “PLUMED Tutorials,” a collaborative model for developing, sharing, and updating online tutorials. This initiative utilizes repository management and continuous integration to ensure compatibility with software updates. Moreover, the tutorials are interconnected to form a structured learning path and are enriched with automatic annotations to provide broader context. This paper illustrates the development, features, and advantages of PLUMED Tutorials, aiming to foster an open community for creating and sharing educational resources. 
    more » « less
  4. The Reproducible Software Environment (Resen) is an open-source software tool enabling computationally reproducible scientific results in the geospace science community. Resen was developed as part of a larger project called the Integrated Geoscience Observatory (InGeO), which aims to help geospace researchers bring together diverse datasets from disparate instruments and data repositories, with software tools contributed by instrument providers and community members. The main goals of InGeO are to remove barriers in accessing, processing, and visualizing geospatially resolved data from multiple sources using methodologies and tools that are reproducible. The architecture of Resen combines two mainstream open source software tools, Docker and JupyterHub, to produce a software environment that not only facilitates computationally reproducible research results, but also facilitates effective collaboration among researchers. In this technical paper, we discuss some challenges for performing reproducible science and a potential solution via Resen, which is demonstrated using a case study of a geospace event. Finally we discuss how the usage of mainstream, open-source technologies seems to provide a sustainable path towards enabling reproducible science compared to proprietary and closed-source software. 
    more » « less
  5. Abstract Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open‐source software can partially address these concerns, it can introduce complicating dependencies and delay development. These issues can be reduced if scientists and software developers collaborate. We present a case study wherein scientists from the SuperNova Early Warning System collaborated with software developers from the Scalable Cyberinfrastructure for Multi‐Messenger Astrophysics project. The collaboration addressed the difficulties of open‐source software development, but presented additional risks to each team. For the scientists, there was a concern of relying on external systems and lacking control in the development process. For the developers, there was a risk in supporting a user‐group while maintaining core development. These issues were mitigated by creating a second Agile Scrum framework in parallel with the developers' ongoing Agile Scrum process. This Agile collaboration promoted communication, ensured that the scientists had an active role in development, and allowed the developers to evaluate and implement the scientists' software requirements. The collaboration provided benefits for each group: the scientists actuated their development by using an existing platform, and the developers utilized the scientists' use‐case to improve their systems. This case study suggests that scientists and software developers can avoid scientific computing issues by collaborating and that Agile Scrum methods can address emergent concerns. 
    more » « less