DIRECT : A Transformer-based Model for Decompiled Identifier Renaming

Nitin, Vikram; Saieva, Anthony; Ray, Baishakhi; Kaiser, Gail

Citation Details

Decompiling binary executables to high-level code is an important step in reverse engineering scenarios, such as malware analysis and legacy code maintenance. However, the generated high-level code is difficult to understand since the original variable names are lost. In this paper, we leverage transformer models to reconstruct the original variable names from decompiled code. Inherent differences between code and natural language present certain challenges in applying conventional transformer-based architectures to variable name recovery. We propose DIRECT, a novel transformer-based architecture customized specifically for the task at hand. We evaluate our model on a dataset of decompiled functions and find that DIRECT outperforms the previous state-of-the-art model by up to 20%. We also present ablation studies evaluating the impact of each of our modifications. We make the source code of DIRECT available to encourage reproducible research. more »

Award ID(s):: 1815494 1563555

PAR ID:: 10281285

Author(s) / Creator(s):: Nitin, Vikram; Saieva, Anthony; Ray, Baishakhi; Kaiser, Gail

Date Published:: 2021-01-01

Journal Name:: 1st Workshop on Natural Language Processing for Programming (NLP4Prog)

Page Range / eLocation ID:: 48 to 57

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this