NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DESCG: data encoding scheme classification with GNN in binary analysis

https://doi.org/10.1007/s10515-025-00538-0

Dai, Xushu; Luo, Nanqing; Wang, Haizhou; Wang, Zhilong; Cao, Chen; Liu, Peng (July 2025, Automated Software Engineering)

Abstract Binary analysis, the process of examining software without its source code, plays a crucial role in understanding program behavior, e.g., evaluating the security properties of commercial software, and analyzing malware. One challenging aspect of this process is to classify data encoding schemes, such as encryption and compression, due to the absence of high-level semantic information. Existing approaches either rely on code similarity, which only works for known schemes, or heuristic rules, which lack scalability. In this paper, we propose DESCG, a novel deep learning-based method for automatically classifying four widely employed kinds of data encoding schemes in binary programs: encryption, compression, decompression, and hashing. Our approach leverages dynamic analysis to extract execution traces from binary programs, builds data dependency graphs from these traces, and incorporates critical feature engineering. By combining the specialized graph representation with the Graph Neural Network (GNN), our approach enables accurate classification without requiring prior knowledge of specific encoding schemes. The Evaluation result shows that DESCG achieves 97.7% accuracy and an F1 score of 97.67%, outperforming baseline models. We also conducted an extensive evaluation of DESCG to explore which feature is more important for it and examine its performance and overhead.
more » « less
Free, publicly-accessible full text available July 18, 2026
Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection

https://doi.org/10.1186/s42400-022-00135-8

Wang, Haizhou; Singhal, Anoop; Liu, Peng (January 2023, Cybersecurity)

Abstract In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.
more » « less
Position paper: GPT conjecture: understanding the trade-offs between granularity, performance and timeliness in control-flow integrity

https://doi.org/10.1186/s42400-021-00098-2

Wang, Zhilong; Liu, Peng (September 2021, Cybersecurity)

Abstract Performance/security trade-off is widely noticed in CFI research, however, we observe that not every CFI scheme is subject to the trade-off. Motivated by the key observation, we ask three questions: ➊ does trade-off really exist in different CFI schemes? ➋ if trade-off do exist, how do previous works comply with it? ➌ how can it inspire future research? Although the three questions probably cannot be directly answered, they are inspiring. We find that a deeper understanding of the nature of the trade-off will help answer the three questions. Accordingly, we proposed theGPTconjecture to pinpoint the trade-off in designing CFI schemes, which says that at most two out of three properties (fine granularity, acceptable performance, and preventive protection) could be achieved.
more » « less
Adversarial Evasion through Semantics-Preserving Reinforcement Learning in Graph-Based Malware Detection

Zhang, Lan; Liu, Peng (December 2025, IntechOpen)

The deployment of deep learning-based malware detection systems has transformed cybersecurity, offering sophisticated pattern recognition capabilities that surpass traditional signature-based approaches. However, these systems introduce new vulnerabilities requiring systematic investigation. This chapter examines adversarial attacks against graph neural network-based malware detection systems, focusing on semantics-preserving methodologies that evade detection while maintaining program functionality. We introduce a reinforcement learning (RL) framework that formulates the attack as a sequential decision making problem, optimizing the insertion of no-operation (NOP) instructions to manipulate graph structure without altering program behavior. Comparative analysis includes three baseline methods: random insertion, hill-climbing, and gradient-approximation attacks. Our experimental evaluation on real world malware datasets reveals significant differences in effectiveness, with the reinforcement learning approach achieving perfect evasion rates against both Graph Convolutional Network and Deep Graph Convolutional Neural Network architectures while requiring minimal program modifications. Our findings reveal three critical research gaps: transitioning from abstract Control Flow Graph representations to executable binary manipulation, developing universal vulnerability discovery across different architectures, and systematically translating adversarial insights into defensive enhancements. This work contributes to understanding adversarial vulnerabilities in graph-based security systems while establishing frameworks for evaluating machine learning-based malware detection robustness.
more » « less
Free, publicly-accessible full text available December 1, 2026
To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt

https://doi.org/10.1109/DSN-S65789.2025.00037

Wang, Zhilong; Nagaraja, Neha; Zhang, Lan; Bahsi, Hayretdin; Patil, Pawan; Liu, Peng (June 2025, IEEE)

Free, publicly-accessible full text available June 23, 2026
Can AI Fix Buggy Code? Exploring the Use of Large Language Models in Automated Program Repair

https://doi.org/10.1109/MC.2025.3527407

Zhang, Lan; Singhal, Anoop; Zou, Qingtian; Sun, Xiaoyan; Liu, Peng; Lin, Hsiao-Ying (May 2025, Computer)

This article reviews the current human–large language models collaboration approach to bug fixing and points out the research directions toward (the development of) autonomous program repair artificial intelligence agents.
more » « less
Free, publicly-accessible full text available May 1, 2026
IoT Firmware Emulation and Its Security Application in Fuzzing: A Critical Revisit

https://doi.org/10.3390/fi17010019

Zhou, Wei; Shen, Shandian; Liu, Peng (January 2025, Future Internet)

As IoT devices with microcontroller (MCU)-based firmware become more common in our lives, memory corruption vulnerabilities in their firmware are increasingly targeted by adversaries. Fuzzing is a powerful method for detecting these vulnerabilities, but it poses unique challenges when applied to IoT devices. Direct fuzzing on these devices is inefficient, and recent efforts have shifted towards creating emulation environments for dynamic firmware testing. However, unlike traditional software, firmware interactions with peripherals that are significantly more diverse presents new challenges for achieving scalable full-system emulation and effective fuzzing. This paper reviews 27 state-of-the-art works in MCU-based firmware emulation and its applications in fuzzing. Instead of classifying existing techniques based on their capabilities and features, we first identify the fundamental challenges faced by firmware emulation and fuzzing. We then revisit recent studies, organizing them according to the specific challenges they address, and discussing how each specific challenge is addressed. We compare the emulation fidelity and bug detection capabilities of various techniques to clearly demonstrate their strengths and weaknesses, aiding users in selecting or combining tools to meet their needs. Finally, we highlight the remaining technical gaps and point out important future research directions in firmware emulation and fuzzing.
more » « less
Free, publicly-accessible full text available January 1, 2026
Evaluating Large Language Models for Real-World Vulnerability Repair in C/C++ Code

https://doi.org/10.1145/3643651.3659892

Zhang, Lan; Zou, Qingtian; Singhal, Anoop; Sun, Xiaoyan; Liu, Peng (June 2024, ACM)

Full Text Available
Analysis of neural network detectors for network attacks

https://doi.org/10.3233/JCS-230031

Zou, Qingtian; Zhang, Lan; Singhal, Anoop; Sun, Xiaoyan; Liu, Peng (June 2024, Journal of Computer Security)

While network attacks play a critical role in many advanced persistent threat (APT) campaigns, an arms race exists between the network defenders and the adversary: to make APT campaigns stealthy, the adversary is strongly motivated to evade the detection system. However, new studies have shown that neural network is likely a game-changer in the arms race: neural network could be applied to achieve accurate, signature-free, and low-false-alarm-rate detection. In this work, we investigate whether the adversary could fight back during the next phase of the arms race. In particular, noticing that none of the existing adversarial example generation methods could generate malicious packets (and sessions) that can simultaneously compromise the target machine and evade the neural network detection model, we propose a novel attack method to achieve this goal. We have designed and implemented the new attack. We have also used Address Resolution Protocol (ARP) Poisoning and Domain Name System (DNS) Cache Poisoning as the case study to demonstrate the effectiveness of the proposed attack.
more » « less
Full Text Available
Using Explainable AI for Neural Network-Based Network Attack Detection

https://doi.org/10.1109/MC.2023.3342602

Zou, Qingtian; Zhang, Lan; Sun, Xiaoyan; Singhal, Anoop; Liu, Peng (May 2024, Computer)

Full Text Available

« Prev Next »

Search for: All records