skip to main content


Title: Evaluating Security of Executable Steganography for Digital Software Watermarking
Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction.  more » « less
Award ID(s):
1811560
NSF-PAR ID:
10176504
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
7th Annual Cybersecurity Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction. 
    more » « less
  2. Software watermarking schemes allow a user to embed an identifier into a piece of code such that the resulting program is nearly functionally-equivalent to the original program, and yet, it is difficult to remove the identifier without destroying the functionality of the program. Such schemes are often considered for proving software ownership or for digital rights management. Existing constructions of watermarking have focused primarily on watermarking pseudorandom functions (PRFs). In this work, we revisit the definitional foundations of watermarking, and begin by highlighting a major flaw in existing security notions. Existing security notions for watermarking only require that the identifier be successfully extracted from programs that preserve the exact input/output behavior of the original program. In the context of PRFs, this means that an adversary that constructs a program which computes a quarter of the output bits of the PRF or that is able to distinguish the outputs of the PRF from random are considered to be outside the threat model. However, in any application (e.g., watermarking a decryption device or an authentication token) that relies on PRF security, an adversary that manages to predict a quarter of the bits or distinguishes the PRF outputs from random would be considered to have defeated the scheme. Thus, existing watermarking schemes provide very little security guarantee against realistic adversaries. None of the existing constructions of watermarkable PRFs would be able to extract the identifier from a program that only outputs a quarter of the bits of the PRF or one that perfectly distinguishes. To address the shortcomings in existing watermarkable PRF definitions, we introduce a new primitive called a traceable PRF. Our definitions are inspired by similar definitions from public-key traitor tracing, and aim to capture a very robust set of adversaries: namely, any adversary that produces a useful distinguisher (i.e., a program that can break PRF security), can be traced to a specific identifier. We provide a general framework for constructing traceable PRFs via an intermediate primitive called private linear constrained PRFs. Finally, we show how to construct traceable PRFs from a similar set of assumptions previously used to realize software watermarking. Namely, we obtain a single-key traceable PRF from standard lattice assumptions and a fully collusion-resistant traceable PRF from indistinguishability obfuscation (together with injective one-way functions). 
    more » « less
  3. Obfuscation is an important technique to protect software from adversary analysis. Control flow obfuscation effectively prevents attackers from understanding the program structure, hence impeding a broad set of reverse engineering efforts. In this paper, we propose a novel control flow obfuscation method which employs Turing machines to simulate the computation of branch conditions. By weaving the orig- inal program with Turing machine components, program control flow graph and call graph can become much more complicated. In addition, due to the runtime computation complexity of a Turing machine, pro- gram execution flow would be highly obfuscated and become resilient to advanced reverse engineering approaches via symbolic execution and concolic testing. We have implemented a prototype tool for Turing obfuscation. Compar- ing with previous work, our control flow obfuscation technique delivers three distinct advantages. 1). Complexity: the complicated structure of a Turing machine makes it difficult for attackers to understand the program control flow. 2). Universality: Turing machines can encode any compu- tation and hence applicable to obfuscate any program component. 3). Resiliency: Turing machine brings in complex execution model, which is shown to withstand automated reverse engineering efforts. Our evalua- tion obfuscates control flow predicates of two widely-used applications, and the experimental results show that the proposed technique can ob- fuscate programs in stealth with good performance and robustness. 
    more » « less
  4. Perumalla, Kalyan ; Lopez Jr., Juan ; Siraj, Ambareen (Ed.)
    Executable steganography, the hiding of software machine code inside of a larger program, is a potential approach to introduce new software protection constructs such as watermarks or fingerprints. Software fingerprinting is, therefore, a process similar to steganography, hiding data within other data. The goal of fingerprinting is to hide a unique secret message, such as a serial number, into copies of an executable program in order to provide proof of ownership of that program. Fingerprints are a special case of watermarks, with the difference being that each fingerprint is unique to each copy of a program. Traditionally, researchers describe four aims that a software fingerprint should achieve. These include the fingerprint should be difficult to remove, it should not be obvious, it should have a low false positive rate, and it should have negligible impact on performance. In this research, we propose to extend these objectives and introduce a fifth aim: that software fingerprints should be machine independent. As a result, the same fingerprinting method can be used regardless of the architecture used to execute the program. Hence, this paper presents an approach towards the realization of machine-independent fingerprinting of executable programs. We make use of Low-Level Virtual Machine (LLVM) intermediate representation during the software compilation process to demonstrate both a simple static fingerprinting method as well as a dynamic method, which displays our aim of hardware independent fingerprinting. The research contribution includes a realization of the approach using the LLVM infrastructure and provides a proof of concept for both simple static and dynamic watermarks that are architecture neutral. 
    more » « less
  5. Securing applications on untrusted platforms can involve protection against legitimate end-users who act in the role of malicious reverse engineers and hackers. Such adversaries have access to the full execution environment of programs, whether the program comes in the form of software or hardware. In this paper, we consider the nature of obfuscating algorithms that perform iterative, step-wise transformation of programs into more complex forms that are intended to increase the complexity (time, resources) for malicious reverse engineers. We consider simple Boolean logic programs as the domain of interest and examine a specific transformation technique known as iterative sub-circuit selection and replacement (ISR), which represents a practical, syntactic approach for obfuscation. Specifically, we focus on improving the security of ISR by maximizing the flexibility and potential security of the replacement step of the algorithm which can be formulated in the following question: given a selection of Boolean logic gates (i.e., a sub-circuit), how can we produce a semantically equivalent (polymorphic) version of the sub-circuit such that the distribution of potential replacements represents a random, uniform distribution from the set of all possible replacements. This practical question is related to the theoretic study of indistinguishability obfuscation, where a transformer for a class of circuits guarantees that given any two semantically equivalent circuits from the class, the distribution of variants from their obfuscation are computationally indistinguishable. Ideally, polymorphic circuits that follow a random, uniform distribution provide stronger protection against malicious analyzers that target identification of distinct patterns as a basis for deobfuscation and simplification. In this paper, we introduce a novel approach for polymorphic circuit replacement called random Boolean logic expansion (RBLE), which applies Boolean logic laws (of reduction) in reverse. We compare this approach against another proposed method of polymorphic replacement that relies on static circuit libraries. As a contribution, we show the strengths and weaknesses of each approach, examine initial results from empirical studies to estimate the uniformity of polymorphic distributions, and provide the argument for how such algorithms can be readily applied in software contexts. RBLE provides a unique method to generate polymorphic variants of arbitrary input, output, and gate size. We report initial findings for studying variants produced by this method and, from empirical evaluation, show that RBLE has promise for generating distributions of unique, uniform circuits when size is unconstrained, but for targeted size distributions, the approach requires some adjustment in order to reach potential circuit variants. 
    more » « less