skip to main content


Title: Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach
The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.  more » « less
Award ID(s):
1719477 1936370
PAR ID:
10218323
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
International Conference on Data Mining Workshops (ICDMW)
Page Range / eLocation ID:
488 to 495
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract—Personal Identifiable Information (PII) is any information that permits the identity of an individual to be directly or indirectly inferred. It should be protected against random access. This paper studies the extent of PII exposure on the Internet. It is hoped that the results of this study can help raise the Internet users’ awareness on privacy protection. 
    more » « less
  2. Dark patterns are user interface elements that can influence a person's behavior against their intentions or best interests. Prior work identified these patterns in websites and mobile apps, but little is known about how the design of platforms might impact dark pattern manifestations and related human vulnerabilities. In this paper, we conduct a comparative study of mobile application, mobile browser, and web browser versions of 105 popular services to investigate variations in dark patterns across modalities. We perform manual tests, identify dark patterns in each service, and examine how they persist or differ by modality. Our findings show that while services can employ some dark patterns equally across modalities, many dark patterns vary between platforms, and that these differences saddle people with inconsistent experiences of autonomy, privacy, and control. We conclude by discussing broader implications for policymakers and practitioners, and provide suggestions for furthering dark patterns research. 
    more » « less
  3. Privacy policies, despite the important information they provide about the collection and use of one's data, tend to be skipped over by most Internet users. In this paper, we seek to make privacy policies more accessible by automatically classifying web privacy. We use natural language processing techniques and multiple machine learning models to determine the effectiveness of each method in the classification method. We also explore the effectiveness of these methods to classify privacy policies of Internet of Things (IoT) devices. 
    more » « less
  4. Personally Identifiable Information (PII) leakage can lead to identity theft, financial loss, reputation damage, and anxiety. However, individuals remain largely unaware of their PII exposure on the Internet, and whether providing individuals with information about the extent of their PII exposure can trigger privacy protection actions requires further investigation. In this pilot study, grounded by Protection Motivation Theory (PMT), we examine whether receiving privacy alerts in the form of threat and countermeasure information will trigger senior citizens to engage in protective behaviors. We also examine whether providing personalized information moderates the relationship between information and individuals' perceptions. We contribute to the literature by shedding light on the determinants and barriers to adopting privacy protection behaviors. 
    more » « less
  5. This report will analyze issues related to web browser security and privacy. The web browser applications that will be looked at are Google Chrome, Bing, Mozilla Firefox, Internet Explorer, Microsoft Edge, Safari, and Opera. In recent months web browsers have increased the number of daily users. With the increase in daily users who may not be as well versed in data security and privacy, comes an increase in attacks. This study will discuss the pros and cons of each web browser, how many have been hacked, how often they have been hacked, why they have been hacked, security flaws, and more. The study utilizes research and a user survey to make a proper analysis and provide recommendations on the topic. 
    more » « less