Obfuscation is an important technique to protect software from adversary analysis. Control flow obfuscation effectively prevents attackers from understanding the program structure, hence impeding a broad set of reverse engineering efforts. In this paper, we propose a novel control flow obfuscation method which employs Turing machines to simulate the computation of branch conditions. By weaving the orig- inal program with Turing machine components, program control flow graph and call graph can become much more complicated. In addition, due to the runtime computation complexity of a Turing machine, pro- gram execution flow would be highly obfuscated and become resilient to advanced reverse engineering approaches via symbolic execution and concolic testing. We have implemented a prototype tool for Turing obfuscation. Compar- ing with previous work, our control flow obfuscation technique delivers three distinct advantages. 1). Complexity: the complicated structure of a Turing machine makes it difficult for attackers to understand the program control flow. 2). Universality: Turing machines can encode any compu- tation and hence applicable to obfuscate any program component. 3). Resiliency: Turing machine brings in complex execution model, which is shown to withstand automated reverse engineering efforts. Our evalua- tion obfuscates control flow predicates of two widely-used applications, and the experimental results show that the proposed technique can ob- fuscate programs in stealth with good performance and robustness.
more »
« less
Obfuscation resilient search through executable classification
Android applications are usually obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables and/or modifying program structures. It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently. Prior approaches toward obfuscation resilient search have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relationships between identifiers are not broken by obfuscators; others have assumed that control flow graphs maintain their structures. Both approaches can be easily defeated by a motivated obfuscator. We present a new approach, MACNETO, to search for programs relevant to obfuscated executables leveraging deep learning and principal components on instructions. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform. We show that it has high search precision for executables obfuscated by a state-of-the-art obfuscator that changes control flow. Further, we also demonstrate the potential of MACNETO to help developers understand executables, where MACNETO infers keywords (which are from relevant un-obfuscated programs) for obfuscated executables.
more »
« less
- PAR ID:
- 10110096
- Date Published:
- Journal Name:
- Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages
- Page Range / eLocation ID:
- 20 to 30
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One of the most popular location privacy-preserving mechanisms applied in location-based services (LBS) is location obfuscation, where mobile users are allowed to report obfuscated locations instead of their real locations to services. Many existing obfuscation approaches consider mobile users that can move freely over a region. However, this is inadequate for protecting the location privacy of vehicles, as their mobility is restricted by external factors, such as road networks and traffic flows. This auxiliary information about external factors helps an attacker to shrink the search range of vehicles' locations, increasing the risk of location exposure. In this paper, we propose a vehicle traffic flow aware attack that leverages public traffic flow information to recover a vehicle's real location from obfuscated location. As a countermeasure, we then develop an adaptive strategy to obfuscate a vehicle's location by a "fake" trajectory that follows a realistic traffic flow. The fake trajectory is designed to not only hide the vehicle's real location but also guarantee the quality of service (QoS) of LBS. Our experimental results demonstrate that 1) the new threat model can accurately track vehicles' real locations, which have been obfuscated by two state-of-the-art algorithms, and 2) the proposed obfuscation method can effectively protect vehicles' location privacy under the new threat model without compromising QoS.more » « less
-
Cryptographic functions have been commonly abused by malware developers to hide malicious behaviors, disguise destructive payloads, and bypass network-based fire- walls. Now-infamous crypto-ransomware even encrypts victim’s computer documents until a ransom is paid. Therefore, de- tecting cryptographic functions in binary code is an appealing approach to complement existing malware defense and forensics. However, pervasive control and data obfuscation schemes make cryptographic function identification a challenging work. Existing detection methods are either brittle to work on obfuscated binaries or ad hoc in that they can only identify specific cryp- tographic functions. In this paper, we propose a novel technique called bit-precise symbolic loop mapping to identify cryptographic functions in obfuscated binary code. Our trace-based approach captures the semantics of possible cryptographic algorithms with bit-precise symbolic execution in a loop. Then we perform guided fuzzing to efficiently match boolean formulas with known reference implementations. We have developed a prototype called CryptoHunt and evaluated it with a set of obfuscated synthetic examples, well-known cryptographic libraries, and malware. Compared with the existing tools, CryptoHunt is a general approach to detecting commonly used cryptographic functions such as TEA, AES, RC4, MD5, and RSA under different control and data obfuscation scheme combinations.more » « less
-
Machine learning techniques are widely used in addition to signatures and heuristics to increase the detection rate of anti-malware software, as they automate the creation of detection models, making it possible to handle an ever-increasing number of new malware samples. In order to foil the analysis of anti-malware systems and evade detection, malware uses packing and other forms of obfuscation. However, few realize that benign applications use packing and obfuscation as well, to protect intellectual property and prevent license abuse. In this paper, we study how machine learning based on static analysis features operates on packed samples. Malware researchers have often assumed that packing would prevent machine learning techniques from building effective classifiers. However, both industry and academia have published results that show that machine-learning-based classifiers can achieve good detection rates, leading many experts to think that classifiers are simply detecting the fact that a sample is packed, as packing is more prevalent in malicious samples. We show that, different from what is commonly assumed, packers do preserve some information when packing programs that is “useful” for malware classification. However, this information does not necessarily capture the sample’s behavior. We demonstrate that the signals extracted from packed executables are not rich enough for machine-learning-based models to (1) generalize their knowledge to operate on unseen packers, and (2) be robust against adversarial examples. We also show that a na¨ıve application of machine learning techniques results in a substantial number of false positives, which, in turn, might have resultedmore » « less
-
Modern semiconductor manufacturing often leverages a fabless model in which design and fabrication are partitioned. This has led to a large body of work attempting to secure designs sent to an untrusted third party through obfuscation methods. On the other hand, efficient de-obfuscation attacks have been proposed, such as Boolean Satisfiability attacks (SAT attacks). However, there is a lack of frameworks to validate the security and functionality of obfuscated designs. Additionally, unconventional obfuscated design flows, which vary from one obfuscation to another, have been key impending factors in realizing logic locking as a mainstream approach for securing designs. In this work, we address these two issues for Lookup Table-based obfuscation. We study both Volatile and Non-volatile versions of LUT-based obfuscation and develop a framework to validate SAT runtime using machine learning. We can achieve unparallel SAT-resiliency using LUT-based obfuscation while incurring 7% area and less than 1% power overheads. Following this, we discuss and implement a validation flow for obfuscated designs. We then fabricate a chip consisting of several benchmark designs and a RISC-V CPU in TSMC 65nm for post functionality validation. We show that the design flow and SAT-runtime validation can easily integrate LUT-based obfuscation into existing CAD tools while adding minimal verification overhead. Finally, we justify SAT-resilient LUT-based obfuscation as a promising candidate for securing designs.more » « less
An official website of the United States government

