skip to main content


Title: I'm leaving you, Travis: a continuous integration breakup story
Continuous Integration (CI) services, which can automatically build, test, and deploy software projects, are an invaluable asset in distributed teams, increasing productivity and helping to maintain code quality. Prior work has shown that CI pipelines can be sophisticated, and choosing and configuring a CI system involves tradeoffs. As CI technology matures, new CI tool offerings arise to meet the distinct wants and needs of software teams, as they negotiate a path through these tradeoffs, depending on their context. In this paper, we begin to uncover these nuances, and tell the story of open-source projects falling out of love with Travis, the earliest and most popular cloud-based CI system. Using logistic regression, we quantify the effects that open-source community factors and project technical factors have on the rate of Travis abandonment. We find that increased build complexity reduces the chances of abandonment, that larger projects abandon at higher rates, and that a project's dominant language has significant but varying effects. Finally, we find the surprising result that metrics of configuration attempts and knowledge dispersion in the project do not affect the rate of abandonment.  more » « less
Award ID(s):
1717415
NSF-PAR ID:
10063057
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 15th International Conference on Mining Software Repositories
Page Range / eLocation ID:
165 to 169
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sustainable Open Source Software (OSS) forms much of the fabric of our digital society, especially successful and sustainable ones. But many OSS projects do not become sustainable, resulting in abandonment and even risks for the world's digital infrastructure. Prior work has looked at the reasons for this mainly from two very different perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS programmers' day-to-day activities and the artifacts they create. In institutional analysis, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure project governance. Even though each is necessary for a comprehensive understanding of OSS projects, the connection and interaction between the two approaches have been barely explored.

    In this paper, we make the first effort toward understanding OSS project sustainability using a dual-view analysis, by combining institutional analysis with socio-technical systems analysis. In particular, we (i) use linguistic approaches to extract institutional rules and norms from OSS contributors' communications to represent the evolution of their governance systems, and (ii) construct socio-technical networks based on longitudinal collaboration records to represent each project's organizational structure. We combined the two methods and applied them to a dataset of developer digital traces from 253 nascent OSS projects within the Apache Software Foundation (ASF) incubator. We find that the socio-technical and institutional features relate to each other, and provide complimentary views into the progress of the ASF's OSS projects. Refining these combined analyses can help provide a more precise understanding of the synchronization between the evolution of institutional governance and organizational structure.

     
    more » « less
  2. null (Ed.)
    Team communication is essential for the development of modern software systems. For distributed software development teams, such as those found in many open source projects, this communication usually takes place using electronic tools. Among these, modern chat platforms such as Gitter are becoming the de facto choice for many software projects due to their advanced features geared towards software development and effective team communication. Gitter channels contain numerous messages exchanged by developers regarding the state of the project, issues and features of the system, team logistics, etc. These messages can contain important information to researchers studying open source software systems, developers new to a particular project and trying to get familiar with the software, etc. Therefore, uncovering what developers are communicating about through Gitter is an essential first step towards successfully understanding and leveraging this information. We present a new dataset, called GitterCom, which aims to enable research in this direction and represents the largest manually labeled and curated dataset of Gitter developer messages. The dataset is comprised of 10,000 messages collected from 10 Gitter communities associated with the development of open source software. Each message was manually annotated and verified by two of the authors, capturing the purpose of the communication expressed by the message. While the dataset has not yet been used in any publication, we discuss how it can enable interesting research opportunities. 
    more » « less
  3. Most students readily see the value of studying computing as a path to a good job and know that the application of computing can generate business value. At the same time, we now face a world where pervasive computing is enabling a misalignment between business value and social value. However, computing also holds the potential to drive positive social change and to serve the greater good. This talk will discuss how some of the recent revisions to the computer science major at Dickinson College elevate this potential. A thread emphasizing computing for the greater good that now runs through our courses engages students in Free and Open Source Software (FOSS) with explicit humanitarian goals (HFOSS). The missions of these HFOSS communities provide real examples of ways in which computing can be focused intentionally on creating a positive social impact. At the same time, learning and working with HFOSS tools, processes, artifacts and community members builds the hard and soft skills that students and employers want. Students are exposed to FOSS concepts and examples of HFOSS in our first course. A sequence of two ½ courses at the intermediate level familiarize students with FOSS communities and build their technical skills by engaging them in an authentic HFOSS project. This project is FarmData2, which we manage in collaboration with the Dickinson College organic farm. In a year-long senior capstone students research FOSS and HFOSS projects/communities that are of interest to them. They then form teams, choose projects and engage with their selected project communities “in the wild.” Details on some of the activities that students complete and examples of the types of contributions they have made both to FarmData2 and to the projects they have chosen in the capstone will be given. Analyses of pre/post course survey data and the types of projects selected in the capstone will be presented. These analyses suggest that (1) students gain an appreciation that they can use computing to contribute to the greater good, (2) that they become more likely to continue contributing to FOSS or HFOSS projects and (3) that engaging students in HFOSS holds potential for broadening participation in computing. 
    more » « less
  4. null (Ed.)
    In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted (and given up) by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions, and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption? In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the teamlevel determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions. 
    more » « less
  5. Continuous Integration (CI) practices encourage developers to frequently integrate code into a shared repository. Each integration is validated by automatic build and testing such that errors are revealed as early as possible. When CI failures or integration errors are reported, existing techniques are insufficient to automatically locate the root causes for two reasons. First, a CI failure may be triggered by faults in source code and/or build scripts, while current approaches consider only source code. Second, a tentative integration can fail because of build failures and/or test failures, while existing tools focus on test failures only. This paper presents UniLoc, the first unified technique to localize faults in both source code and build scripts given a CI failure log, without assuming the failure’s location (source code or build scripts) and nature (a test failure or not). Adopting the information retrieval (IR) strategy, UniLoc locates buggy files by treating source code and build scripts as documents to search and by considering build logs as search queries. However, instead of naïvely applying an off-the-shelf IR technique to these software artifacts, for more accurate fault localization, UniLoc applies various domain-specific heuristics to optimize the search queries, search space, and ranking formulas. To evaluate UniLoc, we gathered 700 CI failure fixes in 72 open-source projects that are built with Gradle. UniLoc could effectively locate bugs with the average MRR (Mean Reciprocal Rank) value as 0.49, MAP (Mean Average Precision) value as 0.36, and NDCG (Normalized Discounted Cumulative Gain) value as 0.54. UniLoc outperformed the state-of-the-art IR-based tool BLUiR and Locus. UniLoc has the potential to help developers diagnose root causes for CI failures more accurately and efficiently. 
    more » « less