skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: REMaQE: Reverse Engineering Math Equations from Executables
Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical equations of the implemented algorithm. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This article proposes the REMaQE automated framework for reverse engineering of math equations from binary executables. Improving over state-of-the-art, REMaQE handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer equations from object-oriented implementations such as C++ classes. Using REMaQE, we discovered a bug in the Linux kernel thermal monitoring tool “tmon.” To evaluate REMaQE, we generate a dataset of 25,096 binaries with math equations implemented in C and Simulink. REMaQE successfully recovers a semantically matching equation for all 25,096 binaries. REMaQE executes in 0.48 seconds on average and in up to 2 seconds for complex equations. Real-time execution enables integration in an interactive math-oriented reverse engineering workflow.  more » « less
Award ID(s):
2039615
PAR ID:
10646434
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Cyber-Physical Systems
Volume:
8
Issue:
4
ISSN:
2378-962X
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic communication systems. Since current works on backdoor attacks are not tailored for semantic communication scenarios, a new backdoor attack paradigm on semantic symbols (BASS) is introduced, based on which the corresponding defense measures are designed. Specifically, a training framework is proposed to prevent BASS. Additionally, reverse engineering-based and pruning-based defense strategies are designed to protect against backdoor attacks in semantic communication. Simulation results demonstrate the effectiveness of both the proposed attack paradigm and the defense strategies. 
    more » « less
  2. Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler optimization. These optimizations can transform comprehensible, ``natural" source constructions into something entirely unrecognizable. Reverse engineering binaries, especially those suspected of being malevolent or guilty of intellectual property theft, are important and time-consuming tasks. There is a great deal of interest in tools to ``decompile" binaries back into more natural source code to aid reverse engineering. Decompilation involves several desirable steps, including recreating source-language constructions, variable names, and perhaps even comments. One central step in creating binaries is optimizing function calls, using steps such as inlining. Recovering these (possibly inlined) function calls from optimized binaries is an essential task that most state-of-the-art decompiler tools try to do but do not perform very well. In this paper, we evaluate a supervised learning approach to the problem of recovering optimized function calls. We leverage open-source software and develop an automated labeling scheme to generate a reasonably large dataset of binaries labeled with actual function usages. We augment this large but limited labeled dataset with a pre-training step, which learns the decompiled code statistics from a much larger unlabeled dataset. Thus augmented, our learned labeling model can be combined with an existing decompilation tool, Ghidra, to achieve substantially improved performance in function call recovery, especially at higher levels of optimization. 
    more » « less
  3. Static binary analysis is critical to various security tasks such as vulnerability discovery and malware detection. In recent years, binary analysis has faced new challenges as vendors of the Internet of Things (IoT) and Industrial Control Systems (ICS) continue to introduce customized or non-standard binary formats that existing tools cannot readily process. Reverse-engineering each of the new formats is costly as it requires extensive expertise and analysts’ time. In this paper, we investigate the first step to automate the analysis of non-standard binaries, which is to recognize the bytes representing “code” from “data” (i.e., data-code separation). We propose Loadstar, and its key idea is to use the abundant labeled data from standard binaries to train a classifier and adapt it for processing unlabeled non-standard binaries. We use a pseudo-label-based method for domain adaption and leverage knowledge-inspired rules for pseudo-label correction, which serves as the guardrail for the adaption process. A key advantage of the system is that it does not require labeling any non-standard binaries. Using three datasets of non-standard PLC binaries, we evaluate Loadstar and show it outperforms existing tools in terms of both accuracy and processing speed. We will share the tool (open source) with the community. 
    more » « less
  4. Abstract. Geoscientific models are facing increasing challenges to exploit growing datasets coming from remote sensing. Universal differential equations (UDEs), aided by differentiable programming, provide a new scientific modelling paradigm enabling both complex functional inversions to potentially discover new physical laws and data assimilation from heterogeneous and sparse observations. We demonstrate an application of UDEs as a proof of concept to learn the creep component of ice flow, i.e. a nonlinear diffusivity differential equation, of a glacier evolution model. By combining a mechanistic model based on a two-dimensional shallow-ice approximation partial differential equation with an embedded neural network, i.e. a UDE, we can learn parts of an equation as nonlinear functions that then can be translated into mathematical expressions. We implemented this modelling framework as ODINN.jl, a package in the Julia programming language, providing high performance, source-to-source automatic differentiation (AD) and seamless integration with tools and global datasets from the Open Global Glacier Model in Python. We demonstrate this concept for 17 different glaciers around the world, for which we successfully recover a prescribed artificial law describing ice creep variability by solving ∼ 500 000 ordinary differential equations in parallel. Furthermore, we investigate which are the best tools in the scientific machine learning ecosystem in Julia to differentiate and optimize large nonlinear diffusivity UDEs. This study represents a proof of concept for a new modelling framework aiming at discovering empirical laws for large-scale glacier processes, such as the variability in ice creep and basal sliding for ice flow, and new hybrid surface mass balance models. 
    more » « less
  5. Abstract We report the discovery of three ultracompact binary white dwarf systems hosting accretion disks, with orbital periods of 7.95, 8.68, and 13.15 minutes. This significantly augments the population of mass-transferring binaries at the shortest periods, and provides the first evidence that accretors in ultracompacts can be dense enough to host accretion disks even below 10 minutes (where previously only direct-impact accretors were known). In the two shortest-period systems, we measured changes in the orbital periods driven by the combined effect of gravitational-wave emission and mass transfer. We find P ̇ is negative in one case, and positive in the other. This is only the second system measured with a positive P ̇ , and it is the most compact binary known that has survived a period minimum. Using these systems as examples, we show how the measurement of P ̇ is a powerful tool in constraining the physical properties of binaries, e.g., the mass and mass–radius relation of the donor stars. We find that the chirp masses of ultracompact binaries at these periods seem to cluster around M c 0.3 M , perhaps suggesting a common origin for these systems or a selection bias in electromagnetic discoveries. Our new systems are among the highest-amplitude known gravitational-wave sources in the millihertz regime, providing an exquisite opportunity for multimessenger study with future space-based observatories such as LISA and TianQin. We discuss how such systems provide fascinating laboratories to study the unique regime where the accretion process is mediated by gravitational waves. 
    more » « less