skip to main content


Title: Software Fingerprinting in LLVM
Executable steganography, the hiding of software machine code inside of a larger program, is a potential approach to introduce new software protection constructs such as watermarks or fingerprints. Software fingerprinting is, therefore, a process similar to steganography, hiding data within other data. The goal of fingerprinting is to hide a unique secret message, such as a serial number, into copies of an executable program in order to provide proof of ownership of that program. Fingerprints are a special case of watermarks, with the difference being that each fingerprint is unique to each copy of a program. Traditionally, researchers describe four aims that a software fingerprint should achieve. These include the fingerprint should be difficult to remove, it should not be obvious, it should have a low false positive rate, and it should have negligible impact on performance. In this research, we propose to extend these objectives and introduce a fifth aim: that software fingerprints should be machine independent. As a result, the same fingerprinting method can be used regardless of the architecture used to execute the program. Hence, this paper presents an approach towards the realization of machine-independent fingerprinting of executable programs. We make use of Low-Level Virtual Machine (LLVM) intermediate representation during the software compilation process to demonstrate both a simple static fingerprinting method as well as a dynamic method, which displays our aim of hardware independent fingerprinting. The research contribution includes a realization of the approach using the LLVM infrastructure and provides a proof of concept for both simple static and dynamic watermarks that are architecture neutral.  more » « less
Award ID(s):
1811560
NSF-PAR ID:
10211139
Author(s) / Creator(s):
; ; ;
Editor(s):
Perumalla, Kalyan; Lopez Jr., Juan; Siraj, Ambareen
Date Published:
Journal Name:
Journal of popular television
ISSN:
2046-987X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction. 
    more » « less
  2. Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction. 
    more » « less
  3. Malware authors make use of several techniques to obfuscate code from reverse engineering tools such as IdaPro. Typically, these techniques tend to be effective for about three to six instructions, but eventually the tools can properly disassemble the remaining code once the tool is again synchronized with the operation codes. But this loss of synchronization can be used to hide information within the instructions – steganography. Our research explores an approach to this by presenting “Weaver”, a framework for executable steganography. “Weaver” differs from other techniques in how it hides malicious instructions: the hiding instructions are prepared by generating an assembly listing of the program and finding candidate hiding locations, the steganography instructions are prepared by creating an assembly listing of the program to obtain the operation codes to be hidden, and the “weaving” process merges the two. This “weaving” attempts to place all the steganography instructions into candidate locations found in the hiding instructions. 
    more » « less
  4. Securing applications on untrusted platforms can involve protection against legitimate end-users who act in the role of malicious reverse engineers and hackers. Such adversaries have access to the full execution environment of programs, whether the program comes in the form of software or hardware. In this paper, we consider the nature of obfuscating algorithms that perform iterative, step-wise transformation of programs into more complex forms that are intended to increase the complexity (time, resources) for malicious reverse engineers. We consider simple Boolean logic programs as the domain of interest and examine a specific transformation technique known as iterative sub-circuit selection and replacement (ISR), which represents a practical, syntactic approach for obfuscation. Specifically, we focus on improving the security of ISR by maximizing the flexibility and potential security of the replacement step of the algorithm which can be formulated in the following question: given a selection of Boolean logic gates (i.e., a sub-circuit), how can we produce a semantically equivalent (polymorphic) version of the sub-circuit such that the distribution of potential replacements represents a random, uniform distribution from the set of all possible replacements. This practical question is related to the theoretic study of indistinguishability obfuscation, where a transformer for a class of circuits guarantees that given any two semantically equivalent circuits from the class, the distribution of variants from their obfuscation are computationally indistinguishable. Ideally, polymorphic circuits that follow a random, uniform distribution provide stronger protection against malicious analyzers that target identification of distinct patterns as a basis for deobfuscation and simplification. In this paper, we introduce a novel approach for polymorphic circuit replacement called random Boolean logic expansion (RBLE), which applies Boolean logic laws (of reduction) in reverse. We compare this approach against another proposed method of polymorphic replacement that relies on static circuit libraries. As a contribution, we show the strengths and weaknesses of each approach, examine initial results from empirical studies to estimate the uniformity of polymorphic distributions, and provide the argument for how such algorithms can be readily applied in software contexts. RBLE provides a unique method to generate polymorphic variants of arbitrary input, output, and gate size. We report initial findings for studying variants produced by this method and, from empirical evaluation, show that RBLE has promise for generating distributions of unique, uniform circuits when size is unconstrained, but for targeted size distributions, the approach requires some adjustment in order to reach potential circuit variants. 
    more » « less
  5. This research paper introduces a unique system called ZORQ that is a combination of a game development frame- work and a gamification framework (GDGF). The ZORQ GDGF acts as a catalyst to help motivate students by increasing student engagement and success within undergraduate Computer Science (CS) education, regardless of student experience and background. The dynamic gamification elements utilized within the GDGF make it an attractive learning method for students. After col- laborative game space customization, ZORQ gameplay sees each student tasked with designing a ship movement philosophy and then implementing their own code to autonomously control the ship in an interstellar game space filled with supplies, obstacles, and enemy ships. The particulars of engagements between ships can vary greatly by semester, along with the resources/objects present in the game, depending on the collaborative customization and the independent ship strategies implemented. A preliminary Z O R Q trial was conducted over five years in an undergraduate Data Structures and Algorithms (DSA) course. The ZORQ trial is designed to fulfill the following objectives: 1) implement DSA concepts discussed within the course, 2) identify appropriate problem-solving approaches, 3) apply one or more solutions, 4) build depth with a coding language, 5) bridge the gap between limited concept assignments and large, multi-developer software systems by allowing students to build code within a larger architecture, 6) introduce students to version control, 7) illustrate the use of prior mathematics coursework in practical applications, and 8) introduce unit testing in software systems.In exit surveys, students expressed overwhelming satisfaction with this approach. More than 84% of the students surveyed found the system useful in their educational experience and saw benefit to inspecting a completed software project. 82% of the students found that Z O R Q increased software development com- prehension. 80% enjoyed using their own personal creativity in designing a ship controller, 76% found ZORQ helped them learn how to implement and use DSAs. 71% found the system engaging and found the system interaction to be clear and understandable. Observations of student performance in later courses suggest better student maturity and comprehension in preparation for proposing and implementing their own independent projects. 
    more » « less