skip to main content

Search for: All records

Award ID contains: 1936370

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from the surface web, large-scale dark web data collection is often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as the most prevalent and prohibiting type of these measures in the dark web. Text-based CAPTCHA identifies and blocks automated crawlers by forcing the user to enter a combination of hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are meticulously designed with additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA breaking methods have difficulties in overcoming these dark web challenges. As such, solving dark web text-based CAPTCHA has been relying heavily on human involvement, which is labor-intensive and time-consuming. In this study, we propose a novel framework for automated breaking of dark web CAPTCHA to facilitate dark web data collection. This framework encompasses a novel generative method to recognize dark web text-based CAPTCHA with noisy background and variable character length. To eliminate the need for human involvement, the proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web background noise andmore »leverages an enhanced character segmentation algorithm to handle CAPTCHA images with variable character length. Our proposed framework, DW-GAN, was systematically evaluated on multiple dark web CAPTCHA testbeds. DW-GAN significantly outperformed the state-of-the-art benchmark methods on all datasets, achieving over 94.4% success rate on a carefully collected real-world dark web dataset. We further conducted a case study on an emergent Dark Net Marketplace (DNM) to demonstrate that DW-GAN eliminated human involvement by automatically solving CAPTCHA challenges with no more than three attempts. Our research enables the CTI community to develop advanced, large-scale dark web monitoring. We make DW-GAN code available to the community as an open-source tool in GitHub.« less
    Free, publicly-accessible full text available June 30, 2023
  2. Events such as Facebook-Cambridge Analytica scandal and data aggregation efforts by technology providers have illustrated how fragile modern society is to privacy violations. Internationally recognized entities such as the National Science Foundation (NSF) have indicated that Artificial Intelligence (AI)-enabled models, artifacts, and systems can efficiently and effectively sift through large quantities of data from legal documents, social media, Dark Web sites, and other sources to curb privacy violations. Yet considerable efforts are still required for understanding prevailing data sources, systematically developing AI-enabled privacy analytics to tackle emerging challenges, and deploying systems to address critical privacy needs. To this end, we provide an overview of prevailing data sources that can support AI-enabled privacy analytics; a multi-disciplinary research framework that connects data, algorithms, and systems to tackle emerging AI-enabled privacy analytics challenges such as entity resolution, privacy assistance systems, privacy risk modeling, and more; a summary of selected funding sources to support high-impact privacy analytics research; and an overview of prevailing conference and journal venues that can be leveraged to share and archive privacy analytics research. We conclude this paper with an introduction of the papers included in this special issue.
  3. Cybersecurity has rapidly emerged as a grand societal challenge of the 21st century. Innovative solutions to proactively tackle emerging cybersecurity challenges are essential to ensuring a safe and secure society. Artificial Intelligence (AI) has rapidly emerged as a viable approach for sifting through terabytes of heterogeneous cybersecurity data to execute fundamental cybersecurity tasks, such as asset prioritization, control allocation, vulnerability management, and threat detection, with unprecedented efficiency and effectiveness. Despite its initial promise, AI and cybersecurity have been traditionally siloed disciplines that relied on disparate knowledge and methodologies. Consequently, the AI for Cybersecurity discipline is in its nascency. In this article, we aim to provide an important step to progress the AI for Cybersecurity discipline. We first provide an overview of prevailing cybersecurity data, summarize extant AI for Cybersecurity application areas, and identify key limitations in the prevailing landscape. Based on these key issues, we offer a multi-disciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. To help scholars and practitioners make significant headway in tackling these grand AI for Cybersecurity issues, we summarize promising funding mechanisms from the National Science Foundation (NSF) thatmore »can support long-term, systematic research programs. We conclude this article with an introduction of the articles included in this special issue.« less
  4. The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different onlinemore »platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.« less
  5. The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums.