The rapid proliferation of complex information systems has been met by an ever-increasing quantity of exploits that can cause irreparable cyber breaches. To mitigate these cyber threats, academia and industry have placed a significant focus on proactively identifying and labeling exploits developed by the international hacker community. However, prevailing approaches for labeling exploits in hacker forums do not leverage metadata from exploit darknet markets or public exploit repositories to enhance labeling performance. In this study, we adopted the computational design science paradigm to develop a novel information technology artifact, the deep transfer learning exploit labeler (DTL-EL). DTL-EL incorporates a pre-initialization design, multi-layer deep transfer learning (DTL), and a self-attention mechanism to automatically label exploits in hacker forums. We rigorously evaluated the proposed DTL-EL against state-of-the-art non-DTL benchmark methods based in classical machine learning and deep learning. Results suggest that the proposed DTL-EL significantly outperforms benchmark methods based on accuracy, precision, recall, and F1-score. Our proposed DTL-EL framework provides important practical implications for key stakeholders such as cybersecurity managers, analysts, and educators.
more »
« less
SYSML: StYlometry with Structure and Multitask Learning: Implications for Darknet Forum Migrant Analysis.
Darknet market forums are frequently used to exchange illegal goods and services between parties who use encryption to conceal their identities. The Tor network is used to host these markets, which guarantees additional anonymization from IP and location tracking, making it challenging to link across malicious users using multiple accounts (sybils). Additionally, users migrate to new forums when one is closed further increasing the difficulty of linking users across multiple forums. We develop a novel stylometry-based multitask learning approach for natural language and model interactions using graph embeddings to construct low-dimensional representations of short episodes of user activity for authorship attribution. We provide a comprehensive evaluation of our methods across four different darknet forums demonstrating its efficacy over the state-of-the-art, with a lift of up to 2.5X on Mean Retrieval Rank and 2X on Recall@10.
more »
« less
- PAR ID:
- 10317135
- Date Published:
- Journal Name:
- Empirical Methods in Natural Language Processing21
- Volume:
- `
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Scripting is a widely-used way to automate the execution of tasks. Despite the popularity of scripting, it remains difficult to use for both beginners and experts: because of the cryptic commands for the first group, and incompatible syntaxes across different systems, for the latter group. The authors introduce Natural Shell, an assistant for enabling end-users to generate commands and scripts for various purposes. Natural Shell automatically synthesizes scripts for different shell systems based on natural language descriptions. By interacting with Natural Shell, new users can learn the basics of scripting languages without the obstacles from the incomprehensible syntaxes. On the other hand, the authors’ tool frees more advanced users from manuals when they switch shell systems. The authors have developed a prototype system and demonstrate its effectiveness with a benchmark of 50 examples of popular shell commands collected from online forums. In addition, the authors analyzed the usage of Natural Shell in a lab study that involves 10 participants with different scripting skill levels. Natural Shell effectively assists the users to generate commands in assigned syntaxes and greatly streamlines their learning and using experience.more » « less
-
Online underground forums have been widely used by cybercriminals to trade the illicit products, resources and services, which have played a central role in the cybercrim-inal ecosystem. Unfortunately, due to the number of forums, their size, and the expertise required, it's infeasible to perform manual exploration to understand their behavioral processes. In this paper, we propose a novel framework named iDetector to automate the analysis of underground forums for the detection of cybercrime-suspected threads. In iDetector, to detect whether the given threads are cybercrime-suspected threads, we not only analyze the content in the threads, but also utilize the relations among threads, users, replies, and topics. To model this kind of rich semantic relationships (i.e., thread-user, thread-reply, thread-topic, reply-user and reply-topic relations), we introduce a structured heterogeneous information network (HIN) for representation, which is capable to be composed of different types of entities and relations. To capture the complex relationships (e.g., two threads are relevant if they were posted by the same user and discussed the same topic), we use a meta-structure based approach to characterize the semantic relatedness over threads. As different meta-structures depict the relatedness over threads at different views, we then build a classifier using Laplacian scores to aggregate different similarities formulated by different meta-structures to make predictions. To the best of our knowledge, this is the first work to use structural HIN to automate underground forum analysis. Comprehensive experiments on real data collections from underground forums (e.g., Hack Forums) are conducted to validate the effectiveness of our developed system iDetector in cybercrime-suspected thread detection by comparisons with other alternative methods.more » « less
-
Background The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media. Objective The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content. Methods We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups. Results We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength. Conclusions We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.more » « less
-
Online discussion forums have become an integral component of news, entertainment, information, and video-streaming websites, where people all over the world actively engage in discussions on a wide range of topics including politics, sports, music, business, health, and world affairs. Yet, little is known about their usability for blind users, who aurally interact with the forum conversations using screen reader assistive technology. In an interview study, blind users stated that they often had an arduous and frustrating interaction experience while consuming conversation threads, mainly due to the highly redundant content and the absence of customization options to selectively view portions of the conversations. As an initial step towards addressing these usability concerns, we designed PView - a browser extension that enables blind users to customize the content of forum threads in real time as they interact with these threads. Specifically, PView allows the blind users to explicitly hide any post that is irrelevant to them, and then PView automatically detects and filters out all subsequent posts that are substantially similar to the hidden post in real time, before the users navigate to those portions of the thread. In a user study with blind participants, we observed that compared to the status quo, PView significantly improved the usability, workload, and satisfaction of the participants while interacting with the forums.more » « less
An official website of the United States government

