World of code: an infrastructure for mining the universe of open source VCS data

Ma, Yuxing; Bogart, Chris; Amreen, Sadika; Zaretzki, Russell; Mockus, Audris

doi:10.1109/MSR.2019.00031

Citation Details

World of code: an infrastructure for mining the universe of open source VCS data

Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are tens of millions of projects in the periphery interconnected through technical dependencies, code sharing, or knowledge flows? To answer such questions we a) create a very large and frequently updated collection of version control data for FLOSS projects named World of Code (WoC) and b) provide basic tools for conducting research that depends on measuring interdependencies among all FLOSS projects. Our current WoC implementation is capable of being updated on a monthly basis and contains over 12B git objects. To evaluate its research potential and to create vignettes for its usage, we employ WoC in conducting several research tasks. In particular, we find that it is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage. We expect WoC to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem. Our infrastructure facilitates the discovery of key technical dependencies, code flow, and social networks that provide the basis to determine the structure and evolution of the relationships that drive FLOSS activities and innovation. more »

Award ID(s):: 1633437

NSF-PAR ID:: 10106629

Author(s) / Creator(s):: Ma, Yuxing; Bogart, Chris; Amreen, Sadika; Zaretzki, Russell; Mockus, Audris

Date Published:: 2019-07-26

Journal Name:: MSR '19 Proceedings of the 16th International Conference on Mining Software Repositories

Page Range / eLocation ID:: 143-154

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/MSR.2019.00031

More Like this