Replication can improve prior results: a GitHub study of pull request acceptance

Chen, Di; Stolee, Kathryn T.; Menzies, Tim

doi:10.1109/ICPC.2019.00037

Citation Details

Replication can improve prior results: a GitHub study of pull request acceptance

Crowdsourcing and data mining can be used to effectively reduce the effort associated with the partial replication and enhancement of qualitative studies. For example, in a primary study, other researchers explored factors influencing the fate of GitHub pull requests using an extensive qualitative analysis of 20 pull requests. Guided by their findings, we mapped some of their qualitative insights onto quantitative questions. To determine how well their findings generalize, we collected much more data (170 additional pull requests from 142 GitHub projects). Using crowdsourcing, that data was augmented with subjective qualitative human opinions about how pull requests extended the original issue. The crowd’s answers were then combined with quantitative features and, using data mining, used to build a predictor for whether code would be merged. That predictor was far more accurate than the one built from the primary study’s qualitative factors (F1=90 vs 68%), illustrating the value of a mixed-methods approach and replication to improve prior results. To test the generality of this approach, the next step in future work is to conduct other studies that extend qualitative studies with crowdsourcing and data mining. more »

Award ID(s):: 1714699

PAR ID:: 10100321

Author(s) / Creator(s):: Chen, Di; Stolee, Kathryn T.; Menzies, Tim

Date Published:: 2019-01-01

Journal Name:: Proceedings of the 27th International Conference on Program Comprehension

Page Range / eLocation ID:: 179-190

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICPC.2019.00037

More Like this