skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 30, 2026

Title: reAnalyst: Scalable annotation of reverse engineering activities
This paper introduces reAnalyst, a framework designed to facilitate the study of reverse engineering (RE) practices through the semi-automated annotation of RE activities across various RE tools. By integrating tool-agnostic data collection of screenshots, keystrokes, active processes, and other types of data during RE experiments with semi-automated data analysis and generation of annotations, reAnalyst aims to overcome the limitations of traditional RE studies that rely heavily on manual data collection and subjective analysis. The framework enables more efficient data analysis, which will in turn allow researchers to explore the effectiveness of protection techniques and strategies used by reverse engineers more comprehensively and efficiently. Experimental evaluations validate the framework’s capability to identify RE activities from a diverse range of screenshots with varied complexities. Observations on past experiments with our framework as well as a survey among reverse engineers provide further evidence of the acceptability and practicality of our approach.  more » « less
Award ID(s):
2040206
PAR ID:
10637799
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Journal of Systems and Software
ISSN:
0164-1212
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, the RE process can be done for validation as well as Intellectual Property (IP) stealing intentions. In addition, RE also facilitates different illicit activities such as the insertion of hardware Trojan, pirating, or counterfeiting a design, or developing an attack. In this work, we propose an approach to introduce cognitive perturbations, with the aid of adversarial machine learning, to the IC layout that could prevent the RE process from succeeding. We first construct a layer-by-layer image dataset of 45 nm predictive technology. With this dataset, we propose a conventional neural network model called RecoG-Net to recognize the logic gates, which is the first step in RE. RecoG-Net is successful in recognizing the gates with more than 99.7% accuracy. Our thwarting approach utilizes the concept of adversarial attack generation algorithms to generate perturbation. Unlike traditional adversarial attacks in machine learning, the perturbation generation needs to be highly constrained to meet the fab rules such as Design Rule Checking (DRC) Layout vs. Schematic (LVS) checks. Hence, we propose CAPTIVE as a constrained perturbation generation satisfying the DRC. The experiments show that the accuracy of reverse engineering using machine learning techniques can decrease from 100% to approximately 30% based on the adversary generator. 
    more » « less
  2. Human analysts must reverse engineer binary programs as a prerequisite for a number of security tasks, such as vulnerability analysis, malware detection, and firmware re-hosting. Existing studies of human reversers and the processes they follow are limited in size and often use qualitative metrics that require subjective evaluation. In this paper, we reframe the problem of reverse engineering binaries as the problem of perfect decompilation, which is the process of recovering, from a binary program, source code that, when compiled, produces binary code that is identical to the original binary. This gives us a quantitative measure of understanding, and lets us examine the reversing process programmatically. We developed a tool, called Decomperson, that supported a group of reverse engineers during a large-scale security competition designed to collect information about the participants' reverse engineering process, with the well-defined goal of achieving perfect decompilation. Over 150 people participated, and we collected more than 35,000 code submissions, the largest manual reverse engineering dataset to date. This includes snapshots of over 300 successful perfect decompilation attempts. In this paper, we show how perfect decompilation allows programmatic analysis of such large datasets, providing new insights into the reverse engineering process. 
    more » « less
  3. This study describes a hybrid framework for post-hazard building performance assessments. The framework relies upon rapid imaging data collected by regional scout teams being integrated into broader data platforms that are parsed by virtual teams of hazards engineers to efficiently create robust performance assessment datasets. The study also pilots a machine-in-the-loop approach whereby deep learning and computer vision-based models are used to automatically define common building attributes, enabling hazard engineers to focus more of their efforts on precise damage quantification and other more nuanced elements of performance assessments. The framework shows promise, but to achieve optimal accuracy of the automated methods requires regional tuning. 
    more » « less
  4. This study describes a hybrid framework for post-hazard building performance assessments. The framework relies upon rapid imaging data collected by regional scout teams being integrated into broader data platforms that are parsed by virtual teams of hazards engineers to efficiently create robust performance assessment datasets. The study also pilots a machine-in-the-loop approach whereby deep learning and computer vision-based models are used to automatically define common building attributes, enabling hazard engineers to focus more of their efforts on precise damage quantification and other more nuanced elements of performance assessments. The framework shows promise, but to achieve optimal accuracy of the automated methods requires regional tuning. 
    more » « less
  5. In software development, many documents (e.g., tutorials for tools and mobile application websites) contain screenshots of graphical user interfaces (GUIs) to illustrate functionalities. Although screenshots are critical in such documents, screenshots can become outdated, especially if document developers forget to update them. Outdated screenshots can mislead users and diminish the credibility of documentation. Identifying screenshots manually is tedious and error-prone, especially when documents are numerous. However, no existing tools are proposed to detect outdated screenshots in GUI documents. To mitigate manual efforts, we propose DOSUD, a novel approach for detecting outdated screenshots. It is challenging to identify outdated screenshots since the differences are subtle and only specific areas are useful to identify such screenshots. To address the challenges, DOSUD automatically extracts and labels screenshots and trains a classification model to identify outdated screenshots. As the first exploration, we focus on Android applications and the most popular IDE, VS Code. We evaluated DOSUD on a benchmark comprising 10 popular applications, achieving high F1-scores. When applied in the wild, DOSUD identified 20 outdated screenshots across 50 Android application websites and 17 outdated screenshots in VS Code documentation. VS Code developers have confirmed and fixed all our bug reports. 
    more » « less