NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer

Lu, Hanxiao; Cai, Hongyu; Liang, Yiming; Bianchi, Antonio; Celik, Z Berkay (March 2025, Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER))

Language model approaches have recently been integrated into binary analysis tasks, such as function similarity detection and function signature recovery. These models typically employ a two-stage training process: pre-training via Masked Language Modeling (MLM) on machine code and fine-tuning for specific tasks. While MLM helps to understand binary code struc- tures, it ignores essential code characteristics, including control and data flow, which negatively affect model generalization. Recent work leverages domain-specific features (e.g., control flow graphs and dynamic execution traces) in transformer-based approaches to improve binary code semantic understanding. However, this approach involves complex feature engineering, a cumbersome and time-consuming process that can introduce predictive uncertainty when dealing with stripped or obfuscated code, leading to a performance drop. In this paper, we introduce PROTST, a novel transformer-based methodology for binary code embedding. PROTST employs a hierarchical training process based on a unique tree-like structure, where knowledge progressively flows from fundamental tasks at the root to more specialized tasks at the leaves. This progressive teacher-student paradigm allows the model to build upon previously learned knowledge, resulting in high-quality embeddings that can be effectively leveraged for diverse downstream binary analysis tasks. The effectiveness of PROTST is evaluated in seven binary analysis tasks, and the results show that PROTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training and an average validation score of 10.7% compared to multimodal two-stage frameworks.
more » « less
Free, publicly-accessible full text available March 4, 2026
Finding Traceability Attacks in the Bluetooth Low Energy Specification and Its Implementations

Wu, Jianliang; Traynor, Patrick; Xu, Dongyan; Tian, Dave Jing; Bianchi, Antonio (August 2024, USENIX)

Full Text Available
D-Helix: A Generic Decompiler Testing Framework Using Symbolic Differentiation

Zou, Muqi; Khan, Arslan; Wu, Ruoyu; Gao, Han; Bianchi, Antonio; Tian, Dave Jing (August 2024, USENIX)

Full Text Available
ATTention Please! An Investigation of the App Tracking Transparency Permission

Mohamed, Reham; Arunasalam, Arjun; Farrukh, Habiba; Tong, Jason; Bianchi, Antonio; Celik, ZBerkay (August 2024, USENIX Security Symposium)

Full Text Available
SoK: The Long Journey of Exploiting and Defending the Legacy of King Harald Bluetooth

https://doi.org/10.1109/SP54263.2024.00023

Wu, Jianliang; Wu, Ruoyu; Xu, Dongyan; Tian, Dave Jing; Bianchi, Antonio (May 2024, IEEE)

Full Text Available
Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables

Yeke, Doguhan; Ibrahim, Muhammad; Tuncay, Guliz Seray; Farrukh, Habiba; Imran, Abdullah; Bianchi, Antonio; Celik, ZBerkay (May 2024, IEEE Symposium on Security and Privacy (IEEE S&P))

Full Text Available
Making Sense of Constellations: Methodologies for Understanding Starlink's Scheduling Algorithms

https://doi.org/10.1145/3624354.3630586

Tanveer, Hammas Bin; Puchol, Mike; Singh, Rachee; Bianchi, Antonio; Nithyanand, Rishab (December 2023, ACM)

Full Text Available
LocIn: Inferring Semantic Location from Spatial Maps in Mixed Reality

Farrukh, Habiba; Mohamed, Reham; Nare, Aniket; Bianchi, Antonio; Celik, Z. Berkay (August 2023, USENIX Security Symposium)

Full Text Available
ARGUS: a framework for staged static taint analysis of GitHub workflows and actions

Muralee, Siddharth; Koishybayev, Igibek; Nahapetyan, Aleksandr; Tystahl, Greg; Reaves, Brad; Bianchi, Antonio; Enck, William; Kapravelos, Alexandros; Machiry, Aravind (August 2023, USENIX Security)
ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions

Muralee, Siddharth; Koishybayev, Igibek; Nahapetyan, Aleksandr; Tystahl, Greg; Reaves, Brad; Bianchi, Antonio; Enck, William; Kapravelos, Alexandros; Machiry, Aravind (August 2023, Proceedings of the USENIX conference)

Millions of software projects leverage automated workflows, like GitHub Actions, for performing common build and deploy tasks. While GitHub Actions have greatly improved the software build process for developers, they pose significant risks to the software supply chain by adding more dependencies and code complexity that may introduce security bugs. This paper presents ARGUS, the first static taint analysis system for identifying code injection vulnerabilities in GitHub Actions. We used ARGUS to perform a large-scale evaluation on 2,778,483 Workflows referencing 31,725 Actions and discovered critical code injection vulnerabilities in 4,307 Workflows and 80 Actions. We also directly compared ARGUS to two existing pattern-based GitHub Actions vulnerability scanners, demonstrating that our system exhibits a marked improvement in terms of vulnerability detection, with a discovery rate more than seven times (7x) higher than the state-of-the-art approaches. These results demonstrate that command injection vulnerabilities in the GitHub Actions ecosystem are not only pervasive but also require taint analysis to be detected.
more » « less
Full Text Available

« Prev Next »

Search for: All records