Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a better understanding of such reuse by measuring its prevalence and identifying factors affecting the propensity to reuse. To identify reused artifacts and trace their origins, our method exploits World of Code infrastructure. We begin with a set of theory-derived factors related to the propensity to reuse, sample instances of different reuse types, and survey developers to better understand their intentions. Our results indicate that copy-based reuse is common, with many developers being aware of it when writing code. The propensity for a file to be reused varies greatly among languages and between source code and binary files, consistently decreasing over time. Files introduced by popular projects are more likely to be reused, but at least half of reused resources originate from “small” and “medium” projects. Developers had various reasons for reuse but were generally positive about using a package manager.more » « lessFree, publicly-accessible full text available January 31, 2026
-
In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copybased reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks.more » « less
-
Who creates the most innovative open-source software projects? And what fate do these projects tend to have? Building on a long history of research to understand innovation in business and other domains, as well as recent advances towards modeling innovation in scientific research from the science of science field, in this paper we adopt the analogy of innovation as emerging from the novel recombination of existing bits of knowledge. As such, we consider as innovative the software projects that recombine existing software libraries in novel ways, i.e., those built on top of atypical combinations of packages as extracted from import statements. We then report on a large-scale quantitative study of innovation in the Python open-source software ecosystem. Our results show that higher levels of innovativeness are statistically associated with higher GitHub star counts, i.e., novelty begets popularity. At the same time, we find that controlling for project size, the more innovative projects tend to involve smaller teams of contributors, as well as be at higher risk of becoming abandoned in the long term. We conclude that innovation and open source sustainability are closely related and, to some extent, antagonistic.more » « less
-
While lots of research has explored howto prevent maintainers from abandoning the open-source projects that serve as our digital infrastructure, there are very few insights on addressing abandonment when it occurs. We argue open-source sustainability research must expand its focus beyond trying to keep particular projects alive, to also cover the sustainable use of open source by supporting users when they face potential or actual abandonment.We interviewed 33 developers who have experienced open-source dependency abandonment. Often, they used multiple strategies to cope with abandonment, for example, first reaching out to the community to find potential alternatives, then switching to a community-accepted alternative if one exists. We found many developers felt they had little to no support or guidance when facing abandonment, leaving them to figure out what to do through a trial-and-error process on their own. Abandonment introduces cost for otherwise seemingly free dependencies, but users can decide whether and how to prepare for abandonment through a number of different strategies, such as dependency monitoring, building abstraction layers, and community involvement. In many cases, community members can invest in resources that help others facing the same abandoned dependency, but often do not because of the many other competing demands on their time – a form of the volunteer’s dilemma. We discuss cost reduction strategies and ideas to overcome this volunteer’s dilemma. Our findings can be used directly by open-source users seeking resources on dealing with dependency abandonment, or by researchers to motivate future work supporting the sustainable use of open source.more » « less
-
Attracting and retaining new developers is often at the heart of open-source project sustainability and success. Previous research found many intrinsic (or endogenous) project characteristics asso- ciated with the attractiveness of projects to new developers, but the impact of factors external to the project itself have largely been overlooked. In this work, we focus on one such external factor, a project’s labor pool, which is dened as the set of contributors active in the overall open-source ecosystem that the project could plausibly attempt to recruit from at a given time. How are the size and characteristics of the labor pool associated with a project’s attractiveness to new contributors? Through an empirical study of over 516,893 Python projects, we found that the size of the project’s labor pool, the technical skill match, and the social connection be- tween the project’s labor pool and members of the focal project all signicantly inuence the number of new developers that the focal project attracts, with the competition between projects with overlapping labor pools also playing a role. Overall, the labor pool factors add considerable explanatory power compared to models with only project-level characteristics.more » « less
-
The diffusion of information about open-source projects is a key factor influencing the adoption of projects and the allocation of developer efforts. Developers learn about new projects, and evaluate their quality and importance by accessing the related information. Social media is an important channel for information diffusion about open-source projects, with previous research suggesting the existence of a social media ecosystem that consists of multiple platforms and collectively supports information diffusion in open source. With different features supporting information diffusion, the same piece of information likely reaches different developer communities on different platforms, which attracts the attention and contribution of different developers and thus influences the success of open-source projects. Despite its importance, few works looked at the identity of the developer community that projectrelated information reaches on social media platforms and its associated impact on the discussed project. In this work, we track social media discussions on open-source projects on three different platforms: Twitter, HackerNews, and Reddit. We first describe the dynamics of project-related information diffusion across platforms, and we analyze the association between the number of posts on each platform, and the number of developers attracted to the discussed project from different communities. We find that posts about open-source projects first appear on Twitter and HackerNews, then move more towards Reddit. The number of project-related posts on Twitter mostly associate with the attracted developers from communities that are close to the project’s main contributor, while posts on other platforms associate more with the attention from remote communities.more » « less
-
Twitter is widely used by software developers. But how effective are tweets at promoting open source projects? How could one use Twitter to increase a project’s popularity or attract new contributors? In this paper we report on a mixed-methods empirical study of 44,544 tweets containing links to 2,370 open-source GitHub repositories, looking for evidence of causal effects of these tweets on the projects attracting new GitHub stars and contributors, as well as characterizing the high-impact tweets, the people likely being attracted by them, and how they differ from contributors attracted otherwise. Among others, we find that tweets have a statistically significant and practically sizable effect on obtaining new stars and a small average effect on attracting new contributors. The popularity, content of the tweet, as well as the identity of tweet authors all affect the scale of the attraction effect. In addition, our qualitative analysis suggests that forming an active Twitter community for an open source project plays an important role in attracting new committers via tweets. We also report that developers who are new to GitHub or have a long history of Twitter usage but few tweets posted are most likely to be attracted as contributors to the repositories mentioned by tweets. Our work contributes to the literature on open source sustainability.more » « less
-
Online toxicity is ubiquitous across the internet and its negative impact on the people and that online communities that it effects has been well documented. However, toxicity manifests differently on various platforms and toxicity in open source communities, while frequently discussed, is not well understood. We take a first stride at understanding the characteristics of open source toxicity to better inform future work on designing effective intervention and detection methods. To this end, we curate a sample of 100 toxic GitHub issue discussions combining multiple search and sampling strategies. We then qualitatively analyze the sample to gain an understanding of the characteristics of open-source toxicity. We find that the pervasive forms of toxicity in open source differ from those observed on other platforms like Reddit or Wikipedia. In our sample, some of the most prevalent forms of toxicity are entitled, demanding, and arrogant comments from project users as well as insults arising from technical disagreements. In addition, not all toxicity was written by people external to the projects; project members were also common authors of toxicity. We also discuss the implications of our findings. Among others we hope that our findings will be useful for future detection work.more » « less
-
null (Ed.)Open source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem’s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems’ practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.more » « less
An official website of the United States government

Full Text Available