skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub
The notion of forking has changed with the rise of distributed ver- sion control systems and social coding environments, like GitHub. Traditionally forking refers to splitting off an independent devel- opment branch (which we call hard forks); research on hard forks, conducted mostly in pre-GitHub days showed that hard forks were often seen critical as they may fragment a community. Today, in so- cial coding environments, open-source developers are encouraged to fork a project in order to contribute to the community (which we call social forks), which may have also influenced perceptions and practices around hard forks. To revisit hard forks, we identify, study, and classify 15,306 hard forks on GitHub and interview 18 owners of hard forks or forked repositories. We find that, among others, hard forks often evolve out of social forks rather than being planned deliberately and that perception about hard forks have indeed changed dramatically, seeing them often as a positive non- competitive alternative to the original project.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 42nd International Conference on Software Engineering (ICSE)
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Forking and pull requests have been widely used in open-source communities as a uniform development and contribution mechanism, giving developers the flexibility to modify their own fork without affecting others before attempting to contribute back. However, not all projects use forks efficiently; many experience lost and duplicate contributions and fragmented communities. In this paper, we explore how open-source projects on GitHub differ with regard to forking inefficiencies. First, we observed that different communities experience these inefficiencies to widely different degrees and interviewed practitioners to understand why. Then, using multiple regression modeling, we analyzed which context factors correlate with fewer inefficiencies.We found that better modularity and centralized management are associated with more contributions and a higher fraction of accepted pull requests, suggesting specific best practices that project maintainers can adopt to reduce forking-related inefficiencies in their communities. 
    more » « less
  2. While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor effort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GitHub, where visible cues on personal pro"le and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that influence the contributors’ decision to join a GitHub project. We first interviewed 15 GitHub contributors about their project evaluation processes and identified the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to test out the impact of each signal based on the data of 9,977 GitHub projects. We reveal that many important pieces of information lack easily observable signals, and that some signals may be both attractive and unattractive. Our findings have direct implications for open-source maintainers and the design of social coding environments, e.g., features to be added to facilitate better project searching experience 
    more » « less
  3. Transparent environments and social-coding platforms asGitHub help developers to stay abreast of changes during the development and maintenance phase of a project. Especially, notification feeds can help developers to learn about relevant changes in other projects. Unfortunately, transparent environments can quickly overwhelm developers with too many notifications, such that they lose the important ones in a sea of noise. Complementing existing prioritization and filtering strategies based on binary compatibility and code ownership, we develop an anomaly detection mechanism to identify unusual commits in a repository, which stand out with respect to other changes in the same repository or by the same developer. Among others, we detect exceptionally large commits, commits at unusual times, and commits touching rarely changed file types given the characteristics of a particular repository or developer. We automatically flag unusual commits on GitHub through a browser plug-in. In an interactive survey with 173 active GitHub users, rating commits in a project of their interest, we found that, although our unusual score is only a weak predictor of whether developers want to be notified about a commit, information about unusual characteristics of a commit changes how developers regard commits. Our anomaly detection mechanism is a building block for scaling transparent environments. 
    more » « less
  4. Open source software projects often rely on code contributions from a wide variety of developers to extend the capabilities of their software. Project members evaluate these contributions and often engage in extended discussions to decide whether to integrate changes. These discussions have important implications for project management regarding new contributors and evolution of project requirements and direction. We present a study of how developers in open work environments evaluate and discuss pull requests, a primary method of contribution in GitHub, analyzing a sample of extended discussions around pull requests and interviews with GitHub developers. We found that developers raised issues around contributions over both the appropriateness of the problem that the submitter attempted to solve and the correctness of the implemented solution. Both core project members and third-party stakeholders discussed and sometimes implemented alternative solutions to address these issues. Different stakeholders also influenced the outcome of the evaluation by eliciting support from different communities such as dependent projects or even companies. We also found that evaluation outcomes may be more complex than simply acceptance or rejection. In some cases, although a submitter's contribution was rejected, the core team fulfilled the submitter's technical goals by implementing an alternative solution. We found that the level of a submitter's prior interaction on a project changed how politely developers discussed the contribution and the nature of proposed alternative solutions. 
    more » « less
  5. In 2017, the report Undergraduate Research Experiences for STEM Students from the National Academy of Science and Engineering and Medicine (NASEM) invited research programs to develop experiences that extend from disciplinary knowledge and skills education. This call to action asks to include social responsibility learning goals in ethical development, cultural issues in research, and the promotion of inclusive learning environments. Moreover, the Accreditation Board for Engineering and Technology (ABET), the National Academy of Engineering (NAE), and the National Science Foundation (NSF) all agree that social responsibility is a significant component of an engineer’s professional formation and must be a guiding force in their education. Social Responsibility involves the ethical obligation engineers have to society and the environment, including responsible conduct research (RCR), ethical decision-making, human safety, sustainability, pro bono work, social justice, and diversity. For this work, we explored the views of Social Responsibility in engineering students that could provide insight into developing formal and informal educational activities for future summer programs. In this exploratory multi-methods study, we investigated the following research question: What views of social responsibility are important for engineering students conducting scientific in an NSF Research Experiences for Undergraduates (REU)? The REU Site selected for this study was a college of engineering located at a major, public, comprehensive, land-grant research university. The Views of Social Responsibility of Scientists and Engineers (VSRoSE) was used to guide our research design. This validated instrument considers the following major social responsibility elements: 1) Consideration of societal consequences, 2) Protection of human welfare and safety, 3) Promotion of environmental sustainability, 4) Efforts to minimize risks, 5) Communication with the public, and 6) Service and Community engagement. Data collection was conducted at the end of their 10-week-long experience in Summer 2022 using Qualtrics. REU students were invited to complete an IRB-approved questionnaire, including collecting demographic data, the VSRoSE-validated survey, and open-ended questions. Open-ended questions were used to explore what experiences have influenced positive student views of social responsibility and provide rich information beyond the six elements of the VSRoSE instrument. The quantitative data from the VSRoSE is analyzed using SPSS. The qualitative data is analyzed by the research team using an inductive coding approach. In this coding process, the researchers derive codes from the data allowing the narrative or theory to emerge from the raw data itself, which is great for exploratory research. The results from this exploratory study will help to strategically initiate a formal and informal research education curriculum at the selected university. In addition, the results may serve as a way for REU administrators and faculty to create metrics of impact on their research activities regarding social responsibility. Finally, this work intends to provoke the ethics and research community to have a deeper conversation about the needs and strategies to educate this unique population of students. 
    more » « less