Cross-Regional Malware Detection via Model Distilling and Federated Learning

Botacin, Marcus; Gomes, Heitor

doi:10.1145/3678890.3678893

Citation Details

Cross-Regional Malware Detection via Model Distilling and Federated Learning

Machine Learning (ML) is a key part of modern malware detection pipelines, but its application is not straightforward. It involves multiple practical challenges that are frequently unaddressed by the literature works. A key challenge is the heterogeneity of scenarios. Antivirus (AV) companies for instance operate under different performance constraints in the backend and in the endpoint, and with a diversity of datasets according to the country they operate in. In this paper, we evaluate the impact of these heterogeneous aspects by developing a classification pipeline for 3 datasets of 10K malware samples each collected by an AV company in the USA, Brazil, and Japan in the same period. We characterize the different requirements for these datasets and we show that a different number of features is required to reach the optimal detection rate in each scenario. We show that a global model combining the three datasets increases the detection of the three individual datasets. We propose using Federated Learning (FL) to build the global model and a distilling process to generate the local versions. We order the samples temporally to show that although retraining on concept drift detection helps recover the detection rate, only a FL approach can increase the detection rate. more »

Award ID(s):: 2327427

PAR ID:: 10546486

Author(s) / Creator(s):: Botacin, Marcus; Gomes, Heitor

Publisher / Repository:: ACM

Date Published:: 2024-09-30

ISBN:: 9798400709593

Page Range / eLocation ID:: 97 to 113

Subject(s) / Keyword(s):: malware antivirus federated learning model distillation machine learning intrusion detection

Format(s):: Medium: X

Location:: Padua Italy

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3678890.3678893

More Like this