SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction

Hoq, Muntasir; Chilla, Sushanth Reddy; Ahmadi_Ranjbar, Melika; Brusilovsky, Peter; Akram, Bita

doi:10.1145/3583780.3615047

Citation Details

SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction

Automated analysis of programming data using code representation methods offers valuable services for programmers, from code completion to clone detection to bug detection. Recent studies show the effectiveness of Abstract Syntax Trees (AST), pre-trained Transformer-based models, and graph-based embeddings in programming code representation. However, pre-trained large language models lack interpretability, while other embedding-based approaches struggle with extracting important information from large ASTs. This study proposes a novel Subtree-based Attention Neural Network (SANN) to address these gaps by integrating different components: an optimized sequential subtree extraction process using Genetic algorithm optimization, a two-way embedding approach, and an attention network. We investigate the effectiveness of SANN by applying it to two different tasks: program correctness prediction and algorithm detection on two educational datasets containing both small and large-scale code snippets written in Java and C, respectively. The experimental results show SANN's competitive performance against baseline models from the literature, including code2vec, ASTNN, TBCNN, CodeBERT, GPT-2, and MVG, regarding accurate predictive power. Finally, a case study is presented to show the interpretability of our model prediction and its application for an important human-centered computing application, student modeling. Our results indicate the effectiveness of the SANN model in capturing important syntactic and semantic information from students' code, allowing the construction of accurate student models, which serve as the foundation for generating adaptive instructional support such as individualized hints and feedback. more »

Award ID(s):: 2213789

PAR ID:: 10518256

Author(s) / Creator(s):: Hoq, Muntasir; Chilla, Sushanth Reddy; Ahmadi_Ranjbar, Melika; Brusilovsky, Peter; Akram, Bita

Publisher / Repository:: ACM

Date Published:: 2023-10-21

Journal Name:: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

ISBN:: 9798400701245

Page Range / eLocation ID:: 783 to 792

Subject(s) / Keyword(s):: program analysis code representation static analysis algorithm detection program correctness prediction

Format(s):: Medium: X

Location:: Birmingham, United Kingdom

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3583780.3615047

More Like this