skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 30, 2026

Title: reAnalyst: Scalable annotation of reverse engineering activities
This paper introduces reAnalyst, a framework designed to facilitate the study of reverse engineering (RE) practices through the semi-automated annotation of RE activities across various RE tools. By integrating tool-agnostic data collection of screenshots, keystrokes, active processes, and other types of data during RE experiments with semi-automated data analysis and generation of annotations, reAnalyst aims to overcome the limitations of traditional RE studies that rely heavily on manual data collection and subjective analysis. The framework enables more efficient data analysis, which will in turn allow researchers to explore the effectiveness of protection techniques and strategies used by reverse engineers more comprehensively and efficiently. Experimental evaluations validate the framework’s capability to identify RE activities from a diverse range of screenshots with varied complexities. Observations on past experiments with our framework as well as a survey among reverse engineers provide further evidence of the acceptability and practicality of our approach.  more » « less
Award ID(s):
2040206
PAR ID:
10637799
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Journal of Systems and Software
ISSN:
0164-1212
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, the RE process can be done for validation as well as Intellectual Property (IP) stealing intentions. In addition, RE also facilitates different illicit activities such as the insertion of hardware Trojan, pirating, or counterfeiting a design, or developing an attack. In this work, we propose an approach to introduce cognitive perturbations, with the aid of adversarial machine learning, to the IC layout that could prevent the RE process from succeeding. We first construct a layer-by-layer image dataset of 45 nm predictive technology. With this dataset, we propose a conventional neural network model called RecoG-Net to recognize the logic gates, which is the first step in RE. RecoG-Net is successful in recognizing the gates with more than 99.7% accuracy. Our thwarting approach utilizes the concept of adversarial attack generation algorithms to generate perturbation. Unlike traditional adversarial attacks in machine learning, the perturbation generation needs to be highly constrained to meet the fab rules such as Design Rule Checking (DRC) Layout vs. Schematic (LVS) checks. Hence, we propose CAPTIVE as a constrained perturbation generation satisfying the DRC. The experiments show that the accuracy of reverse engineering using machine learning techniques can decrease from 100% to approximately 30% based on the adversary generator. 
    more » « less
  2. Human analysts must reverse engineer binary programs as a prerequisite for a number of security tasks, such as vulnerability analysis, malware detection, and firmware re-hosting. Existing studies of human reversers and the processes they follow are limited in size and often use qualitative metrics that require subjective evaluation. In this paper, we reframe the problem of reverse engineering binaries as the problem of perfect decompilation, which is the process of recovering, from a binary program, source code that, when compiled, produces binary code that is identical to the original binary. This gives us a quantitative measure of understanding, and lets us examine the reversing process programmatically. We developed a tool, called Decomperson, that supported a group of reverse engineers during a large-scale security competition designed to collect information about the participants' reverse engineering process, with the well-defined goal of achieving perfect decompilation. Over 150 people participated, and we collected more than 35,000 code submissions, the largest manual reverse engineering dataset to date. This includes snapshots of over 300 successful perfect decompilation attempts. In this paper, we show how perfect decompilation allows programmatic analysis of such large datasets, providing new insights into the reverse engineering process. 
    more » « less
  3. This study describes a hybrid framework for post-hazard building performance assessments. The framework relies upon rapid imaging data collected by regional scout teams being integrated into broader data platforms that are parsed by virtual teams of hazards engineers to efficiently create robust performance assessment datasets. The study also pilots a machine-in-the-loop approach whereby deep learning and computer vision-based models are used to automatically define common building attributes, enabling hazard engineers to focus more of their efforts on precise damage quantification and other more nuanced elements of performance assessments. The framework shows promise, but to achieve optimal accuracy of the automated methods requires regional tuning. 
    more » « less
  4. This study describes a hybrid framework for post-hazard building performance assessments. The framework relies upon rapid imaging data collected by regional scout teams being integrated into broader data platforms that are parsed by virtual teams of hazards engineers to efficiently create robust performance assessment datasets. The study also pilots a machine-in-the-loop approach whereby deep learning and computer vision-based models are used to automatically define common building attributes, enabling hazard engineers to focus more of their efforts on precise damage quantification and other more nuanced elements of performance assessments. The framework shows promise, but to achieve optimal accuracy of the automated methods requires regional tuning. 
    more » « less
  5. Automated or semi-automated pavement condition data collection is replacing manual data collection in many state and local highway agencies due to its advantages of reducing labor, time, and cost. However, the practical experience of highway agencies indicates that there are still data quality issues with the pavement condition data collected using existing image and sensor-based data collection technologies. This study aims to investigate the implementation experiences and issues of automated or semi-automated pavement condition surveys. An online questionnaire survey was conducted, along with scheduled virtual/phone interviews to gather information from government, industry, and academia about the state of the practice and state of the art. Open questions about the data quality and quality control & quality assurance (QC/QA) were used to receive first-hand inputs from highway agencies and pavement experts. The study has compiled the following observations: (1) Highway agencies urgently need a uniform data collection protocol for automated data collection; (2) the current QA requires too much human intervention; (3) cost ($100–$200 per mile) is a significant burden for state and local agencies; (4) the main issues regarding data quality are data inconsistencies and discrepancies; (5) agencies expect a greater accuracy once the image processing algorithms are improved using artificial intelligence technologies; and (6) existing automated data collection methods are not available for project-level data collection. 
    more » « less