skip to main content


Title: Insights from open source software supply chains
Open Source Software (OSS) forms an infrastructure on which numerous (often critical) software applications are based. Substantial research was done to investigate central projects such as Linux kernel but we have only a limited understanding of how the periphery of the larger OSS ecosystem is interconnected through technical dependencies, code sharing, and knowledge flows. We aim to close this gap by a) creating a nearly complete and rapidly updateable collection of version control data for FLOSS projects; b) by cleaning, correcting, and augmenting the data to measure several types of dependencies among code, developers, and projects; c) by creating models that rely on the resulting supply chains to investigate structural and dynamic properties of the entire OSS. The current implementation is capable of being updated each month, occupies over 300Tb of disk space with 1.5B commits and 12B git objects. Highly accurate algorithms to correct identity data and extract dependencies from the source code are used to characterize the current structure of OSS and the way it has evolved. In particular, models of technology spread demonstrate the implicit factors developers use when choosing software components. We expect the resulting research platform will both spur investigations on how the huge periphery in OSS both sustains and is sustained by the central OSS projects and, as a result, will increase resiliency and effectiveness of the OSS.  more » « less
Award ID(s):
1633437
NSF-PAR ID:
10177381
Author(s) / Creator(s):
Date Published:
Journal Name:
ESEC/FSE '19: 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Page Range / eLocation ID:
3
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are tens of millions of projects in the periphery interconnected through technical dependencies, code sharing, or knowledge flows? To answer such questions we a) create a very large and frequently updated collection of version control data for FLOSS projects named World of Code (WoC) and b) provide basic tools for conducting research that depends on measuring interdependencies among all FLOSS projects. Our current WoC implementation is capable of being updated on a monthly basis and contains over 12B git objects. To evaluate its research potential and to create vignettes for its usage, we employ WoC in conducting several research tasks. In particular, we find that it is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage. We expect WoC to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem. Our infrastructure facilitates the discovery of key technical dependencies, code flow, and social networks that provide the basis to determine the structure and evolution of the relationships that drive FLOSS activities and innovation. 
    more » « less
  2. Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers’ interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. We introduce CODER, a novel graph-based CODE Recommendation framework for open source software developers, which accounts for the complex interactions among multiple parties within the system. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on filestructure graphs that reflect the project hierarchy. Moreover, to overcome the lack of reliable benchmarks, we construct three largescale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intraproject, cross-project, and cold-start recommendation. 
    more » « less
  3. Women are underrepresented in Open Source Software (OSS) projects, as a result of which, not only do women lose career and skill development opportunities, but the projects themselves suffer from a lack of diversity of perspectives. Practitioners and researchers need to understand more about the phenomenon; however, studies about women in open source are spread across multiple fields, including information systems, software engineering, and social science. This paper systematically maps, aggregates, and synthesizes the state-of-the-art on women’s participation in OSS. It focuses on women contributors’ representation and demographics, how they contribute, their motivations and challenges, and strategies employed by communities to attract and retain women. We identified 51 articles (published between 2000 and 2021) that investigated women’s participation in OSS. We found evidence in these papers about who are the women who contribute, what motivates them to contribute, what types of contributions they make, challenges they face, and strategies proposed to support their participation. According to these studies, only about 5% of projects were reported to have women as core developers, and women authored less than 5% of pull-requests, but had similar or even higher rates of pull request acceptances than men. Women make both code and non-code contributions and their motivations to contribute include, learning new skills, altruism, reciprocity, and kinship. Challenges that women face in OSS are mainly social, including lack of peer parity and non-inclusive communication from a toxic culture. We found ten strategies reported in the literature, which we mapped to the reported challenges. Based on these results, we provide guidelines for future research and practice. 
    more » « less
  4. Free and/or open-source software (or F/OSS) projects now play a major and dominant role in society, constituting critical digital infrastructure relied upon by companies, academics, non-profits, activists, and more. As F/OSS has become larger and more established, we investigate the labor of maintaining and sustaining those projects at various scales. We report findings from an interview-based study with contributors and maintainers working in a wide range of F/OSS projects. Maintainers of F/OSS projects do not just maintain software code in a more traditional software engineering understanding of the term: fixing bugs, patching security vulnerabilities, and updating dependencies. F/OSS maintainers also perform complex and often-invisible interpersonal and organizational work to keep their projects operating as active communities of users and contributors. We particularly focus on how this labor of maintaining and sustaining changes as projects and their software grow and scale across many dimensions. In understanding F/OSS to be as much about maintaining a communal project as it is maintaining software code, we discuss broadly applicable considerations for peer production communities and other socio-technical systems more broadly. 
    more » « less
  5. Abstract

    Software bots have been facilitating several development activities in Open Source Software (OSS) projects, including code review. However, these bots may bring unexpected impacts to group dynamics, as frequently occurs with new technology adoption. Understanding and anticipating such effects is important for planning and management. To analyze these effects, we investigate how several activity indicators change after the adoption of a code review bot. We employed a regression discontinuity design on 1,194 software projects from GitHub. We also interviewed 12 practitioners, including open-source maintainers and contributors. Our results indicate that the adoption of code review bots increases the number of monthly merged pull requests, decreases monthly non-merged pull requests, and decreases communication among developers. From the developers’ perspective, these effects are explained by the transparency and confidence the bot comments introduce, in addition to the changes in the discussion focused on pull requests. Practitioners and maintainers may leverage our results to understand, or even predict, bot effects on their projects.

     
    more » « less