Developers of open-source software projects tend to collaborate in bursts of activity over a few days at a time, rather than at an even pace. A project might find its productivity suffering if bursts of activity occur when a key person with the right role or right expertise is not available to participate. Open-source projects could benefit from monitoring the way they orchestrate attention among key developers, finding ways to make themselves available to one another when needed. In commercial software development, Sociotechnical Congruence (STC) has been used as a measure to assess whether coordination among developers is sufficient for a given task. However, STC has not previously been successfully applied to open-source projects, in which some industrial assumptions do not apply: managementchosen targets, mandated steady work hours, and top-down task allocation of inputs and targets. In this work we propose an operationalization of STC for open-source software development. We use temporal bursts of activity as a unit of analysis more suited to the natural rhythms of open-source work, as well as open source analogues of other component measures needed for calculating STC. As an illustration, we demonstrate that opensource development on PyPI projects in GitHub is indeed bursty, that activities in the bursts have topical coherence, and we apply our operationalization of STC. We argue that a measure of socio-technical congruence adapted to open source could provide projects with a better way of tracking how effectively they are collaborating when they come together to collaborate.
more »
« less
Novelty Begets Popularity, But Curbs Participation - A Macroscopic View of the Python Open-Source Ecosystem
Who creates the most innovative open-source software projects? And what fate do these projects tend to have? Building on a long history of research to understand innovation in business and other domains, as well as recent advances towards modeling innovation in scientific research from the science of science field, in this paper we adopt the analogy of innovation as emerging from the novel recombination of existing bits of knowledge. As such, we consider as innovative the software projects that recombine existing software libraries in novel ways, i.e., those built on top of atypical combinations of packages as extracted from import statements. We then report on a large-scale quantitative study of innovation in the Python open-source software ecosystem. Our results show that higher levels of innovativeness are statistically associated with higher GitHub star counts, i.e., novelty begets popularity. At the same time, we find that controlling for project size, the more innovative projects tend to involve smaller teams of contributors, as well as be at higher risk of becoming abandoned in the long term. We conclude that innovation and open source sustainability are closely related and, to some extent, antagonistic.
more »
« less
- PAR ID:
- 10552525
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400702174
- Page Range / eLocation ID:
- 1 to 11
- Format(s):
- Medium: X
- Location:
- Lisbon Portugal
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Reusable software libraries, frameworks, and components, such as those provided by open source ecosystems and third-party suppliers, accelerate digital innovation. However, recent years have shown almost exponential growth in attackers leveraging these software artifacts to launch software supply chain attacks. Past well-known software supply chain attacks include the SolarWinds, log4j, and xz utils incidents. Supply chain attacks are considered to have three major attack vectors: through vulnerabilities and malware accidentally or intentionally injected into open source and third-partydependencies/components/containers; by infiltrating thebuild infrastructureduring the build and deployment processes; and through targeted techniques aimed at thehumansinvolved in software development, such as through social engineering. Plummeting trust in the software supply chain could decelerate digital innovation if the software industry reduces its use of open source and third-party artifacts to reduce risks. This article contains perspectives and knowledge obtained from intentional outreach with practitioners to understand their practical challenges and from extensive research efforts. We then provide an overview of current research efforts to secure the software supply chain. Finally, we propose a future research agenda to close software supply chain attack vectors and support the software industry.more » « less
-
TaxonWorks is an open-source workbench for biodiversity researchers. With several years of development behind it, we highlight its present status, and discuss if and when it makes sense to release a version 1.0, i.e. software completed to specific stage. TaxonWorks' scope is broad; it seeks to touch nearly all areas that might be of interest to taxonomists, i.e. those who integrate everything that is known about a taxon into a single resource. Its role as a software platform is placed in a broader context, where many instances of TaxonWorks each can support multiple research projects. Instances may be supported by individuals or organizations. A suite of technical tools including containerization and unit tests facilitate collaboration at many different levels. TaxonWorks is a research tool, mechanisms for analyzing the results of data curation including its application programing interface are described. The long-term development of TaxonWorks is supported by an endowment to the Species File Group. Its source is available on Github.more » « less
-
Open-source projects do not exist in a vacuum. They benefit from reusing other projects and themselves are being reused by others, creating complex networks of interdependencies, i.e., software ecosystems. Therefore, the sustainability of projects comprising ecosystems may no longer by determined solely by factors internal to the project, but rather by the ecosystem context as well. In this paper we report on a mixed-methods study of ecosystem-level factors affecting the sustainability of open-source Python projects. Quantitatively, using historical data from 46,547 projects in the PyPI ecosystem, we modeled the chances of project development entering a period of dormancy (limited activity) as a function of the projects' position in their dependency networks, organizational support, and other factors. Qualitatively, we triangulated the revealed effects and further expanded on our models through interviews with project maintainers. Results show that the number of project ties and the relative position in the dependency network have significant impact on sustained project activity, with nuanced effects early in a project's life cycle and later on.more » « less
-
The incubator and research projects sponsored by the Center for Research in Open Source Software (CROSS, cross.ucsc.edu) at UC Santa Cruz have been very effective at promoting the professional and technical development of research software engineers. Carlos Maltzahn founded CROSS in 2015 with a generous gift of $2,000,000 from UC Santa Cruz alumnus Dr. Sage Weil and founding memberships of Toshiba America Electronic Components, SK Hynix Memory Solutions, and Micron Technology. Over the past five years, CROSS funding has enabled PhD students to not only create re- search software projects but also learn how to draw in new contributors and leverage established open source software communities. This position paper will present CROSS fellowships as case studies for how university-led open source projects can create a real- world, reproducible model for effectively training, funding and sup- porting research software engineers.more » « less
An official website of the United States government

