Lost in Translation: How Intermediate Language Representations Affect Malware Classification

Cannan, Logan; Morris, Tommy

doi:10.5772/acrt.20250021

Citation Details

Lost in Translation: How Intermediate Language Representations Affect Malware Classification

Machine learning assisted binary analysis is an area of great interest in cybersecurity research. Training accurate machine learning models requires methods of binary lifting, which require binaries to be translated through an intermediate language representation. This study postulates that different intermediate language representations change the performance characteristics of these machine learning models. Taking a published machine learning framework as a control and modifying the input methodology to include different intermediate language representation transforms, this study compared the performance of models in the realm of malware classification. The contributions of this study are: verification and replication of a published machine learning framework, novel transforms and usage of a public malware dataset, a comparative study on the impact of performance of different intermediate language representations for opcode based malware classification, and a set of heatmaps that can be utilized as a reference lookup table to inform binary lifting choice. more »

Award ID(s):: 1753900

PAR ID:: 10642698

Author(s) / Creator(s):: Cannan, Logan ; Morris, Tommy

Publisher / Repository:: IntechOpen

Date Published:: 2025-01-01

Journal Name:: AI, Computer Science and Robotics Technology

Volume:: 4

ISSN:: 2754-6292

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.5772/acrt.20250021

More Like this