Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction.
more »
« less
Software Fingerprinting in LLVM
Executable steganography, the hiding of software machine code inside of a larger program, is a potential approach to introduce new software protection constructs such as watermarks or fingerprints. Software fingerprinting is, therefore, a process similar to steganography, hiding data within other data. The goal of fingerprinting is to hide a unique secret message, such as a serial number, into copies of an executable program in order to provide proof of ownership of that program. Fingerprints are a special case of watermarks, with the difference being that each fingerprint is unique to each copy of a program. Traditionally, researchers describe four aims that a software fingerprint should achieve. These include the fingerprint should be difficult to remove, it should not be obvious, it should have a low false positive rate, and it should have negligible impact on performance. In this research, we propose to extend these objectives and introduce a fifth aim: that software fingerprints should be machine independent. As a result, the same fingerprinting method can be used regardless of the architecture used to execute the program. Hence, this paper presents an approach towards the realization of machine-independent fingerprinting of executable programs. We make use of Low-Level Virtual Machine (LLVM) intermediate representation during the software compilation process to demonstrate both a simple static fingerprinting method as well as a dynamic method, which displays our aim of hardware independent fingerprinting. The research contribution includes a realization of the approach using the LLVM infrastructure and provides a proof of concept for both simple static and dynamic watermarks that are architecture neutral.
more »
« less
- Award ID(s):
- 1811560
- PAR ID:
- 10211139
- Editor(s):
- Perumalla, Kalyan; Lopez Jr., Juan; Siraj, Ambareen
- Date Published:
- Journal Name:
- Journal of popular television
- ISSN:
- 2046-987X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Man-at-the-end (MATE) attacks against software programs are difficult to protect. Adversaries have complete access to the binary program and can run it under both static and dynamic analysis to find and break any software protection mechanisms put in place. Even though full-proof protection is not possible practically or theoretically, the goal of software protection should be to make it more difficult for an adversary to find program secrets by increasing either their monetary cost or time. Protection mechanisms must be easy to integrate into the software development lifecycle, or else they are of little to no use. In this paper, we evaluate the practical security of a watermarking technique known as Weaver, which is intended to support software watermarking based on a new transformation technique called executable steganography. Weaver allows hiding of identification marks directly into a program binary in a way that makes it difficult for an adversary to find and remove. We performed instruction frequency analysis on 106 programs from the GNU coreutils package to understand and define Weaver’s limitations and strengths as a watermarking technique. Our evaluation revealed that the initial prototype version of Weaver suffers from limitations in terms of standard benchmarks for steganography evaluation, such as its stealth. We found that this initial prototype of Weaver relied heavily on one type of instruction that does not frequently occur in standard programs, namely the mov instruction with an 8-byte immediate operand. Our instruction frequency analysis revealed a negative impact due to Weaver’s over-reliance on this mov instruction.more » « less
-
Malware authors make use of several techniques to obfuscate code from reverse engineering tools such as IdaPro. Typically, these techniques tend to be effective for about three to six instructions, but eventually the tools can properly disassemble the remaining code once the tool is again synchronized with the operation codes. But this loss of synchronization can be used to hide information within the instructions – steganography. Our research explores an approach to this by presenting “Weaver”, a framework for executable steganography. “Weaver” differs from other techniques in how it hides malicious instructions: the hiding instructions are prepared by generating an assembly listing of the program and finding candidate hiding locations, the steganography instructions are prepared by creating an assembly listing of the program to obtain the operation codes to be hidden, and the “weaving” process merges the two. This “weaving” attempts to place all the steganography instructions into candidate locations found in the hiding instructions.more » « less
-
Limited data availability is a challenging problem in the latent fingerprint domain. Synthetically generated fingerprints are vital for training data-hungry neural network-based algorithms. Conventional methods distort clean fingerprints to generate synthetic latent fingerprints. We propose a simple and effective approach using style transfer and image blending to synthesize realistic latent fingerprints. Our evaluation criteria and experiments demonstrate that the generated synthetic latent fingerprints preserve the identity information from the input contact- based fingerprints while possessing similar characteristics as real latent fingerprints. Additionally, we show that the generated fingerprints exhibit several qualities and styles, suggesting that the proposed method can generate multiple samples from a single fingerprint.more » « less
-
Abstract A machine learning-based drug screening technique has been developed and optimized using convolutional neural network-derived fingerprints. The optimization of weights in the neural network-based fingerprinting technique was compared with fixed Morgan fingerprints in regard to binary classification on drug-target binding affinity. The assessment was carried out using six different target proteins using randomly chosen small molecules from the ZINC15 database for training. This new architecture proved to be more efficient in screening molecules that less favorably bind to specific targets and retaining molecules that favorably bind to it. Scientific contribution We have developed a new neural fingerprint-based screening model that has a significant ability to capture hits. Despite using a smaller dataset, this model is capable of mapping chemical space similar to other contemporary algorithms designed for molecular screening. The novelty of the present algorithm lies in the speed with which the models are trained and tuned before testing its predictive capabilities and hence is a significant step forward in the field of machine learning-embedded computational drug discovery.more » « less
An official website of the United States government

