skip to main content


Title: “It Feels Like Whack-a-mole”: User Experiences of Data Removal from People Search Websites
People Search Websites aggregate and publicize users’ Personal Identifiable Information (PII), previously sourced from data brokers. This paper presents a qualitative study of the perceptions and experiences of 18 participants who sought information removal by hiring a removal service or requesting removal from the sites. The users we interviewed were highly motivated and had sophisticated risk perceptions. We found that they encountered obstacles during the removal process, resulting in a high cost of removal, whether they requested it themselves or hired a service. Participants perceived that the successful monetization of users PII motivates data aggregators to make the removal more difficult. Overall, self management of privacy by attempting to keep information off the internet is difficult and its’ success is hard to evaluate. We provide recommendations to users, third parties, removal services and researchers aiming to improve the removal process.  more » « less
Award ID(s):
2016061
NSF-PAR ID:
10348963
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings on Privacy Enhancing Technologies
Volume:
2022
Issue:
3
ISSN:
2299-0984
Page Range / eLocation ID:
159 to 178
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Research on keystroke dynamics has the good potential to offer continuous authentication that complements conventional authentication methods in combating insider threats and identity theft before more harm can be done to the genuine users. Unfortunately, the large amount of data required by free-text keystroke authentication often contain personally identifiable information, or PII, and personally sensitive information, such as a user's first name and last name, username and password for an account, bank card numbers, and social security numbers. As a result, there are privacy risks associated with keystroke data that must be mitigated before they are shared with other researchers. We conduct a systematic study to remove PII's from a recent large keystroke dataset. We find substantial amounts of PII's from the dataset, including names, usernames and passwords, social security numbers, and bank card numbers, which, if leaked, may lead to various harms to the user, including personal embarrassment, blackmails, financial loss, and identity theft. We thoroughly evaluate the effectiveness of our detection program for each kind of PII. We demonstrate that our PII detection program can achieve near perfect recall at the expense of losing some useful information (lower precision). Finally, we demonstrate that the removal of PII's from the original dataset has only negligible impact on the detection error tradeoff of the free-text authentication algorithm by Gunetti and Picardi. We hope that this experience report will be useful in informing the design of privacy removal in future keystroke dynamics based user authentication systems. 
    more » « less
  2. Furnell, Steven (Ed.)
    A huge amount of personal and sensitive data is shared on Facebook, which makes it a prime target for attackers. Adversaries can exploit third-party applications connected to a user’s Facebook profile (i.e., Facebook apps) to gain access to this personal information. Users’ lack of knowledge and the varying privacy policies of these apps make them further vulnerable to information leakage. However, little has been done to identify mismatches between users’ perceptions and the privacy policies of Facebook apps. We address this challenge in our work. We conducted a lab study with 31 participants, where we received data on how they share information in Facebook, their Facebook-related security and privacy practices, and their perceptions on the privacy aspects of 65 frequently-used Facebook apps in terms of data collection, sharing, and deletion. We then compared participants’ perceptions with the privacy policy of each reported app. Participants also reported their expectations about the types of information that should not be collected or shared by any Facebook app. Our analysis reveals significant mismatches between users’ privacy perceptions and reality (i.e., privacy policies of Facebook apps), where we identified over-optimism not only in users’ perceptions of information collection, but also on their self-efficacy in protecting their information in Facebook despite experiencing negative incidents in the past. To the best of our knowledge, this is the first study on the gap between users’ privacy perceptions around Facebook apps and the reality. The findings from this study offer directions for future research to address that gap through designing usable, effective, and personalized privacy notices to help users to make informed decisions about using Facebook apps. 
    more » « less
  3. null (Ed.)
    The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores. 
    more » « less
  4. Mobile devices have access to personal, potentially sensitive data, and there is a growing number of mobile apps that have access to it and often transmit this personally identifiable information (PII) over the network. In this paper, we present an approach for detecting such PII “leaks” in network packets going out of the device, by first monitoring network packets on the device itself and then applying classifiers that can predict with high accuracy whether a packet contains a PII leak and of which type. We evaluate the performance of our classifiers using datasets that we collected and analyzed from scratch. We also report preliminary results that show that collaboration among users can further improve classification accuracy, thus motivating crowdsourcing and/or distributed learning of privacy leaks. 
    more » « less
  5. Abstract We present the design, implementation and evaluation of a system, called MATRIX, developed to protect the privacy of mobile device users from location inference and sensor side-channel attacks. MATRIX gives users control and visibility over location and sensor (e.g., Accelerometers and Gyroscopes) accesses by mobile apps. It implements a PrivoScope service that audits all location and sensor accesses by apps on the device and generates real-time notifications and graphs for visualizing these accesses; and a Synthetic Location service to enable users to provide obfuscated or synthetic location trajectories or sensor traces to apps they find useful, but do not trust with their private information. The services are designed to be extensible and easy for users, hiding all of the underlying complexity from them. MATRIX also implements a Location Provider component that generates realistic privacy-preserving synthetic identities and trajectories for users by incorporating traffic information using historical data from Google Maps Directions API, and accelerations using statistical information from user driving experiments. These mobility patterns are generated by modeling/solving user schedule using a randomized linear program and modeling/solving for user driving behavior using a quadratic program. We extensively evaluated MATRIX using user studies, popular location-driven apps and machine learning techniques, and demonstrate that it is portable to most Android devices globally, is reliable, has low-overhead, and generates synthetic trajectories that are difficult to differentiate from real mobility trajectories by an adversary. 
    more » « less