Distilling Contextual Embeddings Into A Static Word Embedding For Improving Hacker Forum Analytics

Ampel, Benjamin; Chen, Hsinchun

doi:10.1109/ISI53945.2021.9624848

Citation Details

Distilling Contextual Embeddings Into A Static Word Embedding For Improving Hacker Forum Analytics

Hacker forums provide malicious actors with a large database of tutorials, goods, and assets to leverage for cyber-attacks. Careful research of these forums can provide tremendous benefit to the cybersecurity community through trend identification and exploit categorization. This study aims to provide a novel static word embedding, Hack2Vec, to improve performance on hacker forum classification tasks. Our proposed Hack2Vec model distills contextual representations from the seminal pre-trained language model BERT to a continuous bag-of-words model to create a highly targeted hacker forum static word embedding. The results of our experimental design indicate that Hack2Vec improves performance over prominent embeddings in accuracy, precision, recall, and F1-score for a benchmark hacker forum classification task. more »

Award ID(s):: 1921485 1917117

PAR ID:: 10344545

Author(s) / Creator(s):: Ampel, Benjamin; Chen, Hsinchun

Date Published:: 2021-11-01

Journal Name:: Proceedings of 2021 IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2021)

Page Range / eLocation ID:: 1 to 3

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.1109/ISI53945.2021.9624848

More Like this