Geometric Analysis and Metric Learning of Instruction Embeddings

Biswas, Sajib; Barao, Timothy; Lazzari, John; McCoy, Jeret; Liu, Xiuwen; Kostandarithes, Alexander

doi:10.1109/IJCNN55064.2022.9892426

Citation Details

Geometric Analysis and Metric Learning of Instruction Embeddings

Embeddings for instructions have been shown to be essential for software reverse engineering and automated program analysis. However, due to the complexity of dependencies and inherent variability of instructions, instruction embeddings using models that are successful for natural language processing may not be effective. In this paper, we perform geometric analysis of instruction embeddings at the token level and instruction family level, showing much greater variability and leading to degraded performance on intrinsic analyses. Then we propose to use metric learning to improve the relationships among instructions using triplet loss. Our results on a large dataset of instruction groups shows significant improvements. We also provide a theoretical analysis of the instruction embeddings by looking at the BERT components and characteristics of inner-product matrices for attention in the transformer blocks. The code will be available publicly after the paper is accepted for publication. more »

Award ID(s):: 1910486 2146354

PAR ID:: 10376712

Author(s) / Creator(s):: Biswas, Sajib; Barao, Timothy; Lazzari, John; McCoy, Jeret; Liu, Xiuwen; Kostandarithes, Alexander

Date Published:: 2022-07-18

Journal Name:: International Joint Conference on Neural Networks

Page Range / eLocation ID:: 1 to 8

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/IJCNN55064.2022.9892426

More Like this