skip to main content


Title: The Secret Life of Hackathon Code Where does it come from and where does it go?
Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existing code or how much new code they develop during a hackathon. Moreover, we aim to understand if and where that code gets reused, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from DEVPOST– a hackathon database – and obtained related code (blobs), authors, and project characteristics from the WORLD OF CODE. We investigated if code blobs in hackathon projects were created before, during, or after an event by identifying the original blob creation date and author, and also checked if the original author was a hackathon project member. We tracked code reuse by first identifying all commits containing blobs created during an event before determining all projects that contain those commits. Result: While only approximately 9.14% of the code blobs are created during hackathons, this amount is still significant considering time and member constraints of such events. Approximately a third of these code blobs get reused in other projects. The number of associated technologies and the number of participants in a project increase reuse probability. Conclusion: Our study demonstrates to what extent pre-existing code is used and new code is created during a hackathon and how much of it is reused elsewhere afterwards. Our findings help to better understand code reuse as a phenomenon and the role of hackathons in this context and can serve as a starting point for further studies in this area.  more » « less
Award ID(s):
1901311 1925520 1633437 1901102 1925615
NSF-PAR ID:
10284447
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Mining Software Repositories
Page Range / eLocation ID:
68 to 79
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Context

    Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the hackathon code.

    Objective

    We aim to understand the evolution of code used in and created during hackathon events, with a particular focus on the code blobs, specifically, how frequently hackathon teams reuse pre-existing code, how much new code they develop, if that code gets reused afterwards, and what factors affect reuse.

    Method

    We collected information about 22,183 hackathon projects from Devpost and obtained related code blobs, authors, project characteristics, original author, code creation time, language, and size information from World of Code. We tracked the reuse of code blobs by identifying all commits containing blobs created during hackathons and identifying all projects that contain those commits. We also conducted a series of surveys in order to gain a deeper understanding of hackathon code evolution that we sent out to hackathon participants whose code was reused, whose code was not reused, and developers who reused some hackathon code.

    Result

    9.14% of the code blobs in hackathon repositories and 8% of the lines of code (LOC) are created during hackathons and around a third of the hackathon code gets reused in other projects by both blob count and LOC. The number of associated technologies and the number of participants in hackathons increase reuse probability.

    Conclusion

    The results of our study demonstrates hackathons are not always “one-off” events as the common knowledge dictates and it can serve as a starting point for further studies in this area.

     
    more » « less
  2. null (Ed.)
    Time-bounded events such as hackathons have become a global phenomenon. Scientific communities in particular show growing interest in organizing them to attract newcomers and develop technical artifacts to expand their code base. Current hackathon approaches presume that participants have sufficient expertise to work on projects on their own. They only provide occasional support by domain experts serving as mentors which might not be sufficient for newcomers. Drawing from work on workplace and educational mentoring, we developed and evaluated an approach where each hackathon team is supported by a community member who serves in a mentor role that goes beyond providing occasional support. Evaluating this approach, we found that teams who took ownership of their projects, set achievable goals early while building social ties with their mentor and receiving learning-oriented support reported positive perceptions related to their project and an increased interest in the scientific community that organized the hackathon. Our work thus contributes to our understanding of mentoring in hackathons, an area which has not been extensively studied. It also proposes a feasible approach for scientific communities to attract and integrate newcomers which is crucial for their long-term survival. 
    more » « less
  3. Hackathons and similar time-bounded events have become a popular form of collaboration in various domains. They are commonly organized as in-person events during which teams engage in intense collaboration over a short period of time to complete a project that is of interest to them. Most research to date has thus consequently focused on studying how teams collaborate in a co-located setting, pointing towards the advantages of radical co-location. The global pandemic of 2020, however, has led to many hackathons moving online, which challenges our current understanding of how they function. In this paper, we address this gap by presenting findings from a multiple-case study of 10 hackathon teams that participated in 4 hackathon events across two continents. By analyzing the collected data, we found that teams merged synchronous and asynchronous means of communication to maintain a common understanding of work progress as well as to maintain awareness of each other's tasks. Task division was self-assigned based on individual skills or interests, while leaders emerged from different strategies (e.g., participant experience, the responsibility of registering the team in an event). Some of the affordances of in-person hackathons, such as the radical co-location of team members, could be partially reproduced in teams that kept open synchronous communication channels while working (i.e., shared audio territories), in a sort of "radical virtual co-location". However, others, such as interactions with other teams, easy access to mentors, and networking with other participants, decreased. In addition, the technical constraints of the different communication tools and platforms brought technical problems and were overwhelming to participants. Our work contributes to understanding the virtual collaboration of small teams in the context of online hackathons and how technologies and event structures proposed by organizers imply this collaboration. 
    more » « less
  4. Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created. 
    more » « less
  5. For two days in February 2018, 17 cybersecurity ed- ucators and professionals from government and in- dustry met in a “hackathon” to refine existing draft multiple-choice test items, and to create new ones, for a Cybersecurity Concept Inventory (CCI) and Cyber- security Curriculum Assessment (CCA) being devel- oped as part of the Cybersecurity Assessment Tools (CATS) Project. We report on the results of the CATS Hackathon, discussing the methods we used to develop test items, highlighting the evolution of a sample test item through this process, and offer- ing suggestions to others who may wish to organize similar hackathons. Each test item embodies a scenario, question stem, and five answer choices. During the Hackathon, par- ticipants organized into teams to (1) Generate new scenarios and question stems, (2) Extend CCI items into CCA items, and generate new answer choices for new scenarios and stems, and (3) Review and refine draft CCA test items. The CATS Project provides rigorous evidence- based instruments for assessing and evaluating educa- tional practices; these instruments can help identify pedagogies and content that are effective in teach- ing cybersecurity. The CCI measures how well stu- dents understand basic concepts in cybersecurity— especially adversarial thinking—after a first course in the field. The CCA measures how well students understand core concepts after completing a full cy- bersecurity curriculum. 
    more » « less