People Search Websites aggregate and publicize users’ Personal Identifiable Information (PII), previously sourced from data brokers. This paper presents a qualitative study of the perceptions and experiences of 18 participants who sought information removal by hiring a removal service or requesting removal from the sites. The users we interviewed were highly motivated and had sophisticated risk perceptions. We found that they encountered obstacles during the removal process, resulting in a high cost of removal, whether they requested it themselves or hired a service. Participants perceived that the successful monetization of users PII motivates data aggregators to make the removal more difficult. Overall, self management of privacy by attempting to keep information off the internet is difficult and its’ success is hard to evaluate. We provide recommendations to users, third parties, removal services and researchers aiming to improve the removal process.
more »
« less
What to Expect When You’re Accessing: An Exploration of User Privacy Rights in People Search Websites
People Search Websites, a category of data brokers, collect, catalog, monetize and often publicly display individuals' personally identifiable information (PII). We present a study of user privacy rights in 20 such websites assessing the usability of data access and data removal mechanisms. We combine insights from these two processes to determine connections between sites, such as shared access mechanisms or removal effects. We find that data access requests are mostly unsuccessful. Instead, sites cite a variety of legal exceptions or misinterpret the nature of the requests. By purchasing reports, we find that only one set of connected sites provided access to the same report they sell to customers. We leverage a multiple step removal process to investigate removal effects between suspected connected sites. In general, data removal is more streamlined than data access, but not very transparent; questions about the scope of removal and reappearance of information remain. Confirming and expanding the connections observed in prior phases, we find that four main groups are behind 14 of the sites studied, indicating the need to further catalog these connections to simplify removal.
more »
« less
- PAR ID:
- 10549421
- Publisher / Repository:
- Proceedings on Privacy Enhancing Technologies
- Date Published:
- Journal Name:
- Proceedings on Privacy Enhancing Technologies
- Volume:
- 2024
- Issue:
- 4
- ISSN:
- 2299-0984
- Page Range / eLocation ID:
- 311 to 326
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Tor provides low-latency anonymous and uncensored network access against a local or network adversary. Due to the design choice to minimize traffic overhead (and increase the pool of potential users) Tor allows some information about the client's connections to leak. Attacks using (features extracted from) this information to infer the website a user visits are called Website Fingerprinting (WF) attacks. We develop a methodology and tools to measure the amount of leaked information about a website. We apply this tool to a comprehensive set of features extracted from a large set of websites and WF defense mechanisms, allowing us to make more fine-grained observations about WF attacks and defenses.more » « less
-
Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be highly responsive, adaptive, and scalable. Fortunately, advances in the field of machine learning, coupled with access to vast amounts of data, have led to the adoption of various deep learning models for timely detection of these cyber crimes. This study focuses on the detection of phishing websites using deep learning models such as Multi-Head Attention, Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the phishing websites are treated as a sequence. The results demonstrate that Multi-Head Attention and BI-LSTM model outperform some other deep learning-based algorithms such as TCN and LSTM in producing better precision, recall, and F1-scores.more » « less
-
We report the first wide-scale measurement study of server-side geographic restriction, or geoblocking, a phenomenon in which server operators intentionally deny access to users from particular countries or regions. Many sites practice geoblocking due to legal requirements or other business reasons, but excessive blocking can needlessly deny valuable content and services to entire national populations. To help researchers and policymakers understand this phenomenon, we develop a semi-automated system to detect instances where whole websites were rendered inaccessible due to geoblocking. By focusing on detecting geoblocking capabilities offered by large CDNs and cloud providers, we can reliably distinguish the practice from dynamic anti-abuse mechanisms and network-based censorship. We apply our techniques to test for geoblocking across the Alexa Top 10K sites from thousands of vantage points in 177 countries. We then expand our measurement to a sample of CDN customers in the Alexa Top 1M. We find that geoblocking occurs across a broad set of countries and sites. We observe geoblocking in nearly all countries we study, with Iran, Syria, Sudan, Cuba, and Russia experiencing the highest rates. These countries experience particularly high rates of geoblocking for finance and banking sites, likely as a result of US economic sanctions. We also verify our measurements with data provided by Cloudflare, and find our observations to be accurate.more » « less
-
User recommendation in healthcare social media by assessing user similarity in heterogeneous networkObjective: The rapid growth of online health social websites has captured a vast amount of healthcare information and made the information easy to access for health consumers. E-patients often use these social websites for informational and emotional support. However, health consumers could be easily overwhelmed by the overloaded information. Healthcare information searching can be very difficult for consumers, not to mention most of them are not skilled information searcher. In this work, we investigate the approaches for measuring user similarity in online health social websites. By recommending similar users to consumers, we can help them to seek informational and emotional support in a more efficient way. Methods: We propose to represent the healthcare social media data as a heterogeneous healthcare information network and introduce the local and global structural approaches for measuring user similarity in a heterogeneous network. We compare the proposed structural approaches with the content-based approach. Results: Experiments were conducted on a data set collected from a popular online health social website,and the results showed that content-based approach performed better for inactive users, while structural approaches performed better for active users. Moreover, global structural approach outperformed local structural approach for all user groups. In addition, we conducted experiments on local and global structural approaches using different weight schemas for the edges in the network. Leverage performed the best for both local and global approaches. Finally, we integrated different approaches and demonstrated that hybrid method yielded better performance than the individual approach. Conclusion: The results indicate that content-based methods can effectively capture the similarity of inactive users who usually have focused interests, while structural methods can achieve better performance when rich structural information is available. Local structural approach only considers direct connections between nodes in the network, while global structural approach takes the indirect connections into account. Therefore, the global similarity approach can deal with sparse networks and capture the implicit similarity between two users. Different approaches may capture different aspects of the similarity relationship between two users. When we combine different methods together, we could achieve a better performance than using each individual method.more » « less
An official website of the United States government

