NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Identifying and Categorizing Malicious Content on Paste Sites: A Neural Topic Modeling Approach

https://doi.org/10.1109/ISI53945.2021.9624765

Vahedi, T; Ampel, B; Samtani, S and (November 2021, 2021 IEEE International Conference on Intelligence and Security Informatics)

Malicious cyber activities impose substantial costs on the U.S. economy and global markets. Cyber-criminals often use information-sharing social media platforms such as paste sites (e.g., Pastebin) to share vast amounts of plain text content related to Personally Identifiable Information (PII), credit card numbers, exploit code, malware, and other sensitive content. Paste sites can provide targeted Cyber Threat Intelligence (CTI) about potential threats and prior breaches. In this research, we propose a novel Bidirectional Encoder Representation from Transformers (BERT) with Latent Dirichlet Allocation (LDA) model to categorize pastes automatically. Our proposed BERTLDA model leverages a neural network transformer architecture to capture sequential dependencies when representing each sentence in a paste. BERT-LDA replaces the Bag-of-Words (BoW) approach in the conventional LDA with a Bag-of-Labels (BoL) that encompasses class labels at the sequence level. We compared the performance of the proposed BERT-LDA against the conventional LDA and BERT-LDA variants (e.g., GPT2-LDA) on 4,254,453 pastes from three paste sites. Experiment results indicate that the proposed BERT-LDA outperformed the standard LDA and each BERT-LDA variant in terms of perplexity on each paste site. Results of our BERTLDA case study suggest that significant content relating to hacker community activities, malicious code, network and website vulnerabilities, and PII are shared on paste sites. The insights provided by this study could be used by organizations to proactively mitigate potential damage on their infrastructure.
more » « less
Full Text Available
Labeling Hacker Exploits for Proactive Cyber Threat Intelligence: A Deep Transfer Learning Approach

Ampel, B.; Samtani, S.; Zhu, H.; Chen, H. (January 2020, IEEE Intelligence and Security Informatics (ISI) 2020)
null (Ed.)
Full Text Available
Labeling Hacker Exploits for Proactive Cyber Threat Intelligence: A Deep Transfer Learning Approach

Ampel, B; Samtani, S; Zhu, H; Ullman, S; Chen, H. (January 2020, IEEE International Conference on Intelligence and Security Informatics (ISI))
null (Ed.)
Full Text Available
Smart Vulnerability Assessment for Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach

Ullman, S.; Samtani, S. Lazarine; Zhu, H.; Ampel, B.; Patton, M.; Chen, H. (January 2020, IEEE Intelligence and Security Informatics (ISI) 2020)
null (Ed.)
Full Text Available
Smart Vulnerability Assessment for Scientific Cyberinfrastructure: An Unsupervised Graph Embedding Approach

Ullman, S; Samtani, S; Lazarine, B; Zhu, H; Ampel, B; Patton, M; Chen, H. (January 2020, IEEE International Conference on Intelligence and Security Informatics (ISI))
null (Ed.)
Full Text Available

Search for: All records