Search for: All records

Award ID contains: 1633437

« Prev Next »

Total Resources

30

Resource Type
Conference Paper

22

Conference Proceeding

0

Dataset

0

Journal Article

8

Workshop Report

0

Availability
Full Text / Resource Available

30

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Companies’ Participation in OSS Development–An Empirical Study of OpenStack

https://doi.org/10.1109/TSE.2019.2946156

Zhang, Yuxia ; Zhou, Minghui ; Mockus, Audris ; Jin, Zhi ( October 2021 , IEEE Transactions on Software Engineering)
Representation of Developer Expertise in Open Source Software

https://doi.org/10.1109/ICSE43902.2021.00094

Dey, Tapajit ; Karnauch, Andrey ; Mockus, Audris ( June 2021 , 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE))
null (Ed.)
Full Text Available
An Exploratory Study of Project Activity Changepoints in Open Source Software Evolution

https://doi.org/10.1109/MSR52588.2021.00088

Walden, James ; Burgin, Noah ; Kaur, Kuljit ( May 2021 , 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR))
null (Ed.)
To explore the prevalence of abrupt changes (changepoints) in open source project activity, we assembled a dataset of 8,919 projects from the World of Code. Projects were selected based on age, number of commits, and number of authors. Using the nonparametric PELT algorithm, we identified changepoints in project activity time series, finding that more than 90% of projects had between one and six changepoints. Increases and decreases in project activity occurred with roughly equal frequency. While most changes are relatively small, on the order of a few authors or few dozen commits per month, there were long tails of much larger project activity changes. In future work, we plan to focus on larger changes to search for common open source lifecycle patterns as well as common responses to external events.
more » « less
Full Text Available
The Secret Life of Hackathon Code Where does it come from and where does it go?

https://doi.org/10.1109/MSR52588.2021.00020

Imam, Ahmed ; Dey, Tapajit ; Nolte, Alexander ; Mockus, Audris ; Herbsleb, James D. ( May 2021 , Mining Software Repositories)
null (Ed.)
Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existing code or how much new code they develop during a hackathon. Moreover, we aim to understand if and where that code gets reused, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from DEVPOST– a hackathon database – and obtained related code (blobs), authors, and project characteristics from the WORLD OF CODE. We investigated if code blobs in hackathon projects were created before, during, or after an event by identifying the original blob creation date and author, and also checked if the original author was a hackathon project member. We tracked code reuse by first identifying all commits containing blobs created during an event before determining all projects that contain those commits. Result: While only approximately 9.14% of the code blobs are created during hackathons, this amount is still significant considering time and member constraints of such events. Approximately a third of these code blobs get reused in other projects. The number of associated technologies and the number of participants in a project increase reuse probability. Conclusion: Our study demonstrates to what extent pre-existing code is used and new code is created during a hackathon and how much of it is reused elsewhere afterwards. Our findings help to better understand code reuse as a phenomenon and the role of hackathons in this context and can serve as a starting point for further studies in this area.
more » « less
Full Text Available
Building the Collaboration Graph of Open-Source Software Ecosystem

https://doi.org/10.1109/MSR52588.2021.00086

Lyulina, Elena ; Jahanshahi, Mahmoud ( May 2021 , 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR))
null (Ed.)
The Open-Source Software community has become the center of attention for many researchers, who are investigating various aspects of collaboration in this extremely large ecosystem. Due to its size, it is difficult to grasp whether or not it has structure, and if so, what it may be. Our hackathon project aims to facilitate the understanding of the developer collaboration structure and relationships among projects based on the bi-graph of what projects developers contribute to by providing an interactive collaboration graph of this ecosystem, using the data obtained from World of Code [1] infrastructure. Our attempts to visualize the entirety of projects and developers were stymied by the inability of the layout and visualization tools to process the exceedingly large scale of the full graph. We used WoC to filter the nodes (developers and projects) and edges (developer contributions to a project) to reduce the scale of the graph that made it amenable to an interactive visualization and published the resulting visualizations. We plan to apply hierarchical approaches to be able to incorporate the entire data in the interactive visualizations and also to evaluate the utility of such visualizations for several tasks.
more » « less
Full Text Available
World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data

https://doi.org/10.1007/s10664-020-09905-9

Ma, Yuxing ; Dey, Tapajit ; Bogart, Chris ; Amreen, Sadika ; Valiev, Marat ; Tutko, Adam ; Kennard, David ; Zaretzki, Russell ; Mockus, Audris ( March 2021 , Empirical Software Engineering)
null (Ed.)
Full Text Available
Coordinating interdependencies in an open source software project: A replication of lindberg, et al.

https://doi.org/10.17705/1atrr.00057

Bradley, R. ; A., Ma ; Zaretzki, R. ; Bichescu, B. ( October 2020 , AIS transactions on replication research)
null (Ed.)
Full Text Available
Effect of technical and social factors on pull request quality for the npm ecosystem

Dey, Tapajit ; Mockus, Audris ( October 2020 , Empirical software engineering)

ackground: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after creation using a Random Forest model utilizing 50 predictors representing properties of the author, PR, and the project to which PR is submitted. Method: 483,988 PRs from 4218 popular NPM packages were analysed and we selected a subset of 14 predictors sufficient for a tuned Random Forest model to reach high accuracy. Result: An AUC-ROC value of 0.95 was achieved predicting PR acceptance. The model excluding PR properties that change after submission gave an AUC-ROC value of 0.89. We tested the utility of our model in practical scenarios by training it with historical data for the NPM package \textit{bootstrap} and predicting if the PRs submitted in future will be accepted. This gave us an AUC-ROC value of 0.94 with all 14 predictors, and 0.77 excluding PR properties that change after its creation. Conclusion: PR integrators can use our model for a highly accurate assessment of the quality of the open PRs and PR creators may benefit from the model by understanding which characteristics of their PRs may be undesirable from the integrators' perspective. The model can be implemented as a tool, which we plan to do as a future work
more » « less
Full Text Available
A Complete Set of Related Git Repositories Identified viaCommunity Detection Approaches Based on Shared Commits

Mockus, A ; Spinellis, D. ; Kotti, Z ; Dusing, G ( June 2020 , IEEE International Working Conference on Mining Software Repositories)

In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utilize Git, which is a distributed version control system allowing easy creation of clones and resulting in numerous repositories that are almost entirely based on some parent repository from which they were cloned. Git commits are unlikely to get produce and represent a way to group cloned repositories. We use World of Code infrastructure containing approximately 2B commits and 100M repositories to create and share such a map. We discover that the largest group contains almost 14M repositories most of which are unrelated to each other. As it turns out, the developers can push git object to an arbitrary repository or pull objects from unrelated repositories, thus linking unrelated repositories. To address this, we apply Louvain community detection algorithm to this very large graph consisting of links between commits and projects. The approach successfully reduces the size of the megacluster with the largest group of highly interconnected projects containing under 400K repositories. We expect that the resulting map of related projects as well as tools and methods to handle the very large graph will serve as a reference set for mining software projects and other applications. Further work is needed to determine different types of relationships among projects induced by shared commits and other relationships, for example, by shared source code or similar filenames.
more » « less
Full Text Available
Detecting and Characterizing Bots that Commit Code

https://doi.org/10.1145/3379597.3387478

Dey, Tapajit ; Mousavi, Sara ; Ponce, Eduardo ; Fry, Tanner ; Vasilescu, Bogdan ; Filippova, Anna ; Mockus, Audris ( June 2020 , IEEE International Working Conference on Mining Software Repositories)

Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.
more » « less
Full Text Available

« Prev Next »