TECCD: A Tree Embedding Approach for Code Clone Detection

Gao, Yi; Wang, Zan; Liu, Shuang; Yang, Lin; Sang, Wei; Cai, Yuanfang

doi:10.1109/ICSME.2019.00025

Citation Details

TECCD: A Tree Embedding Approach for Code Clone Detection

Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches. more »

Award ID(s):: 1816594

PAR ID:: 10194570

Author(s) / Creator(s):: Gao, Yi; Wang, Zan; Liu, Shuang; Yang, Lin; Sang, Wei; Cai, Yuanfang

Date Published:: 2019-09-01

Journal Name:: IEEE International Conference on Software Maintenance and Evolution (ICSME)

Page Range / eLocation ID:: 145 to 156

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICSME.2019.00025

More Like this