skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Four Pillars of Research Software Engineering
We present four elements we believe are key to providing a comprehensive and sustainable support for research software engineering: software development, community, training, and policy. We also show how the wider developer community can learn from, and engage with, these activities.  more » « less
Award ID(s):
1743188
PAR ID:
10540136
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Software
Volume:
38
Issue:
1
ISSN:
0740-7459
Page Range / eLocation ID:
97 to 105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Computational modeling occupies a unique niche in Earth and environmental sciences. Models serve not just as scientific technology and infrastructure but also as digital containers of the scientific community's understanding of the natural world. As this understanding improves, so too must the associated software. This dual nature – models as both infrastructure and hypotheses – means that modeling software must be designed to evolve continually as geoscientific knowledge itself evolves. Here we describe design principles, protocols, and tools developed by the Community Surface Dynamics Modeling System (CSDMS) to promote a flexible, interoperable, and ever-improving research software ecosystem. These include a community repository for model sharing and metadata, interface and ontology standards for model interoperability, language-bridging tools, a modular programming library for model construction, modular software components for data access, and a Python-based execution and model-coupling framework. Methods of community support and engagement that help create a community-centered software ecosystem are also discussed. 
    more » « less
  2. Drawing from a longitudinal case study, we inspect the activities of an expanding team of scientists and their collaborators as they sought to develop a novel software pipeline that worked both for themselves and for their wider community. We argue that these two tasks - making the software work for themselves and also for their wider scientific community - could not be differentiated from each other at the beginning of the software development process. Rather, this division of labor and software capacities emerged, articulated by the actors themselves as they went about their tasks. The activities of making the novel software work at all, and the extra work of making that software repurposable or reusable could not be distinguished until near the end of the development process - rather than defined or structured in advance. We discuss implications for the trajectory of software development, and the practical work of making software repurposable. 
    more » « less
  3. The rapid growth of open source software necessitates a deeper understanding of moderation and governance methods currently used within these projects. The code of conduct, a set of rules articulating standard behavior and responsibilities for participation within a community, is becoming an increasingly common policy document in open source software projects for setting project norms of behavior and discouraging negative or harassing comments and conversation. This study describes the conversations around adopting and crafting a code of conduct as well as those utilizing code of conduct for community governance. We conduct a qualitative analysis of a random sample of GitHub issues that involve the code of conduct. We find that codes of conduct are used both proactively and reactively to govern community behavior in project issues. Oftentimes, the initial addition of a code of conduct does not involve much community participation and input. However, a controversial moderation act is capable of inciting mass community feedback and backlash. Project maintainers balance the tension between disciplining potentially offensive forms of speech and encouraging broad and inclusive participation. These results have implications for the design of inclusive and effective governance practices for open source software communities. 
    more » « less
  4. The Reproducible Software Environment (Resen) is an open-source software tool enabling computationally reproducible scientific results in the geospace science community. Resen was developed as part of a larger project called the Integrated Geoscience Observatory (InGeO), which aims to help geospace researchers bring together diverse datasets from disparate instruments and data repositories, with software tools contributed by instrument providers and community members. The main goals of InGeO are to remove barriers in accessing, processing, and visualizing geospatially resolved data from multiple sources using methodologies and tools that are reproducible. The architecture of Resen combines two mainstream open source software tools, Docker and JupyterHub, to produce a software environment that not only facilitates computationally reproducible research results, but also facilitates effective collaboration among researchers. In this technical paper, we discuss some challenges for performing reproducible science and a potential solution via Resen, which is demonstrated using a case study of a geospace event. Finally we discuss how the usage of mainstream, open-source technologies seems to provide a sustainable path towards enabling reproducible science compared to proprietary and closed-source software. 
    more » « less
  5. Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable source material for grey literature analysis. We curated a dataset of 24,669 talks from 87 open-source conferences between 2010 and 2021. We stored all relevant metadata from these conferences and provide scripts to collect the transcripts. We believe this data is useful for answering many kinds of questions, such as: What are the important/highly discussed topics within practitioner communities? How do practitioners interact? And how do they present themselves to the public? We demonstrate the usefulness of this data by reporting our findings from two small studies: a topic model analysis providing an overview of open-source community dynamics since 2011 and a qualitative analysis of a smaller community-oriented sample within our dataset to gain a better understanding of why contributors leave open-source software. 
    more » « less