NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Art of Cybercrime Community Research

https://doi.org/10.1145/3639362

Hughes, Jack; Pastrana, Sergio; Hutchings, Alice; Afroz, Sadia; Samtani, Sagar; Li, Weifeng; Santana Marin, Ericsson (February 2024, ACM Computing Surveys)

In the last decade, cybercrime has risen considerably. One key factor is the proliferation of online cybercrime communities, where actors trade products and services, and also learn from each other. Accordingly, understanding the operation and behavior of these communities is of great interest, and they have been explored across multiple disciplines with different, often quite novel, approaches. This survey explores the challenges inherent to the field and the methodological approaches researchers used to understand this space. We note that, in many cases, cybercrime research is more of an art than a science. We highlight the good practices and propose a list of recommendations for future cybercrime community scholars, including taking steps to verify and validate results, establishing privacy and ethical research practices, and mitigating the challenge of ground truth data.
more » « less
Full Text Available
Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach

https://doi.org/10.1109/ICDM58522.2023.00019

Etter, Brian; Hu, James Lee; Ebrahimi, Mohammadreza; Li, Weifeng; Li, Xin; Chen, Hsinchun (December 2023, IEEE)

Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
more » « less
Full Text Available
Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat Intelligence

https://doi.org/10.1145/3505226

Zhang, Ning; Ebrahimi, Mohammadreza; Li, Weifeng; Chen, Hsinchun (June 2022, ACM Transactions on Management Information Systems)

Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from the surface web, large-scale dark web data collection is often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as the most prevalent and prohibiting type of these measures in the dark web. Text-based CAPTCHA identifies and blocks automated crawlers by forcing the user to enter a combination of hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are meticulously designed with additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA breaking methods have difficulties in overcoming these dark web challenges. As such, solving dark web text-based CAPTCHA has been relying heavily on human involvement, which is labor-intensive and time-consuming. In this study, we propose a novel framework for automated breaking of dark web CAPTCHA to facilitate dark web data collection. This framework encompasses a novel generative method to recognize dark web text-based CAPTCHA with noisy background and variable character length. To eliminate the need for human involvement, the proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web background noise and leverages an enhanced character segmentation algorithm to handle CAPTCHA images with variable character length. Our proposed framework, DW-GAN, was systematically evaluated on multiple dark web CAPTCHA testbeds. DW-GAN significantly outperformed the state-of-the-art benchmark methods on all datasets, achieving over 94.4% success rate on a carefully collected real-world dark web dataset. We further conducted a case study on an emergent Dark Net Marketplace (DNM) to demonstrate that DW-GAN eliminated human involvement by automatically solving CAPTCHA challenges with no more than three attempts. Our research enables the CTI community to develop advanced, large-scale dark web monitoring. We make DW-GAN code available to the community as an open-source tool in GitHub.
more » « less
Full Text Available
Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces

https://doi.org/10.1109/SPW53761.2021.00021

Ebrahimi, Mohammadreza; Pacheco, Jason; Li, Weifeng; Hu, James Lee; Chen, Hsinchun (May 2021, 2021. IEEE S&P Workshop on Deep Learning and Security (DLS),)
null (Ed.)
Full Text Available
Detecting Cyber-Adversarial Videos in Traditional Social media

https://doi.org/10.1109/ISI49825.2020.9280476

Du, Bingyan; Singhal, Pranay; Benjamin, Victor; Li, Weifeng (November 2020, IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2020).)
null (Ed.)
Full Text Available
A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web

https://doi.org/10.1109/ISI49825.2020.9280537

Zhang, Ning; Ebrahimi, Mohammadreza; Li, Weifeng; Chen, Hsinchun (November 2020, IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2020).)
null (Ed.)
Full Text Available
Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach

https://doi.org/10.1109/ICDMW51313.2020.00072

Lin, Fangyu; Liu, Yizhi; Ebrahimi, Mohammadreza; Ahmad-Post, Zara; Hu, James Lee; Xin, Jingyu; Samtani, Sagar; Li, Weifeng; Chen, Hsinchun (November 2020, International Conference on Data Mining Workshops (ICDMW))
null (Ed.)
The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.
more » « less
Full Text Available
Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web

https://doi.org/10.1109/ISI49825.2020.9280540

Liu, Yizhi; Lin, Fang Yu; Ahmad-Post, Zara; Ebrahimi, Mohammadreza; Zhang, Ning; Hu, James Lee; Xin, Jingyu; Li, Weifeng; Chen, Hsinchun (November 2020, IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2020).)
null (Ed.)
Full Text Available
Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications

https://doi.org/10.1109/TKDE.2017.2786727

Li, Weifeng; Yin, Junming; Chen, Hsinchsun (June 2018, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
The rapid but “invisible” changes in urban greenspace: A comparative study of nine Chinese cities

https://doi.org/10.1016/j.scitotenv.2018.01.335

Zhou, Weiqi; Wang, Jing; Qian, Yuguo; Pickett, Steward T.A.; Li, Weifeng; Han, Lijian (June 2018, Science of The Total Environment)

Full Text Available

« Prev Next »

Search for: All records