Boosting Coverage-Based Fault Localization via Graph-Based Representation Learning

Lou, Yiling; Zhu, Qihao; Dong, Jinhao; Li, Xia; Sun, Zeyu; Hao, Dan; Zhang, Lu; Zhang, Lingming

doi:10.1145/3468264.3468580

Coverage-based fault localization has been extensively studied in the literature due to its effectiveness and lightweightness for real-world systems. However, existing techniques often utilize coverage in an oversimplified way by abstracting detailed coverage into numbers of tests or boolean vectors, thus limiting their effectiveness in practice. In this work, we present a novel coverage-based fault localization technique, Grace, which fully utilizes detailed coverage information with graph-based representation learning. Our intuition is that coverage can be regarded as connective relationships between tests and program entities, which can be inherently and integrally represented by a graph structure: with tests and program entities as nodes, while with coverage and code structures as edges. Therefore, we first propose a novel graph-based representation to reserve all detailed coverage information and fine-grained code structures into one graph. Then we leverage Gated Graph Neural Network to learn valuable features from the graph-based coverage representation and rank program entities in a listwise way. Our evaluation on the widely used benchmark Defects4J (V1.2.0) shows that Grace significantly outperforms state-of-the-art coverage-based fault localization: Grace localizes 195 bugs within Top-1 whereas the best compared technique can at most localize 166 bugs within Top-1. We further investigate the impact of each Grace component and find that they all positively contribute to Grace. In addition, our results also demonstrate that Grace has learnt essential features from coverage, which are complementary to various information used in existing learning-based fault localization. Finally, we evaluate Grace in the cross-project prediction scenario on extra 226 bugs from Defects4J (V2.0.0), and find that Grace consistently outperforms state-of-the-art coverage-based techniques.

More Like this