DESCG: data encoding scheme classification with GNN in binary analysis

Dai, Xushu; Luo, Nanqing; Wang, Haizhou; Wang, Zhilong; Cao, Chen; Liu, Peng

doi:10.1007/s10515-025-00538-0

Citation Details

This content will become publicly available on July 18, 2026

DESCG: data encoding scheme classification with GNN in binary analysis

Abstract Binary analysis, the process of examining software without its source code, plays a crucial role in understanding program behavior, e.g., evaluating the security properties of commercial software, and analyzing malware. One challenging aspect of this process is to classify data encoding schemes, such as encryption and compression, due to the absence of high-level semantic information. Existing approaches either rely on code similarity, which only works for known schemes, or heuristic rules, which lack scalability. In this paper, we propose DESCG, a novel deep learning-based method for automatically classifying four widely employed kinds of data encoding schemes in binary programs: encryption, compression, decompression, and hashing. Our approach leverages dynamic analysis to extract execution traces from binary programs, builds data dependency graphs from these traces, and incorporates critical feature engineering. By combining the specialized graph representation with the Graph Neural Network (GNN), our approach enables accurate classification without requiring prior knowledge of specific encoding schemes. The Evaluation result shows that DESCG achieves 97.7% accuracy and an F1 score of 97.67%, outperforming baseline models. We also conducted an extensive evaluation of DESCG to explore which feature is more important for it and examine its performance and overhead. more »

Award ID(s):: 2140175 2019340

PAR ID:: 10632072

Author(s) / Creator(s):: Dai, Xushu; Luo, Nanqing; Wang, Haizhou; Wang, Zhilong; Cao, Chen; Liu, Peng

Publisher / Repository:: Springer

Date Published:: 2025-07-18

Journal Name:: Automated Software Engineering

Volume:: 32

Issue:: 2

ISSN:: 0928-8910

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 18, 2026
Journal Article:
https://doi.org/10.1007/s10515-025-00538-0

More Like this