Title: Distilling Contextual Embeddings Into A Static Word Embedding For Improving Hacker Forum Analytics
Hacker forums provide malicious actors with a large database of tutorials, goods, and assets to leverage for cyber-attacks. Careful research of these forums can provide tremendous benefit to the cybersecurity community through trend identification and exploit categorization. This study aims to provide a novel static word embedding, Hack2Vec, to improve performance on hacker forum classification tasks. Our proposed Hack2Vec model distills contextual representations from the seminal pre-trained language model BERT to a continuous bag-of-words model to create a highly targeted hacker forum static word embedding. The results of our experimental design indicate that Hack2Vec improves performance over prominent embeddings in accuracy, precision, recall, and F1-score for a benchmark hacker forum classification task.  more » « less
Award ID(s):
1921485 1917117
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of 2021 IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2021)
Page Range / eLocation ID:
1 to 3
Medium: X
Sponsoring Org:
National Science Foundation
