skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection of Open Data
Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals’ privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.  more » « less
Award ID(s):
2027789
PAR ID:
10420568
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 IEEE Symposium on Visualization for Cyber Security (VizSec)
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms that sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC) and the more recent differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at their core. For inference problems, either we may design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk measures, or we may adjust existing sanitized output to create new statistically valid and optimal estimators. Regardless of design or adjustment, in evaluating risk and utility, valid statistical inferences from mechanism outputs require uncertainty quantification that accounts for the effect of the sanitization mechanism that introduces bias and/or variance. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference. 
    more » « less
  2. Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment. 
    more » « less
  3. This work models the costs and benefits of per- sonal information sharing, or self-disclosure, in online social networks as a networked disclosure game. In a networked population where edges rep- resent visibility amongst users, we assume a leader can influence network structure through content promotion, and we seek to optimize social wel- fare through network design. Our approach con- siders user interaction non-homogeneously, where pairwise engagement amongst users can involve or not involve sharing personal information. We prove that this problem is NP-hard. As a solution, we develop a Mixed-integer Linear Programming algorithm, which can achieve an exact solution, and also develop a time-efficient heuristic algo- rithm that can be used at scale. We conduct nu- merical experiments to demonstrate the properties of the algorithms and map theoretical results to a dataset of posts and comments in 2020 and 2021 in a COVID-related Subreddit community where privacy risks and sharing tradeoffs were particularly pronounced. 
    more » « less
  4. Voluntary sharing of personal information is at the heart of user engagement on social media and central to platforms' business models. From the users' perspective, so-called self-disclosure is closely connected with both privacy risks and social rewards. Prior work has studied contextual influences on self-disclosure, from platform affordances and interface design to user demographics and perceived social capital. Our work takes a mixed-methods approach to understand the contextual information which might be integrated in the development of privacy-enhancing technologies. Through observational study of several Reddit communities, we explore the ways in which topic of discussion, group norms, peer effects, and audience size are correlated with personal information sharing. We then build and test a prototype privacy-enhancing tool that exposes these contextual factors. Our work culminates in a browser extension that automatically detects instances of self-disclosure in Reddit posts at the time of posting and provides additional context to users before they post to support enhanced privacy decision-making. We share this prototype with social media users, solicit their feedback, and outline a path forward for privacy-enhancing technologies in this space. 
    more » « less
  5. The rise in online health information seeking among older adults promises significant benefits but also presents potentially serious privacy risks. In light of these risks, we argue that ongoing research and advocacy aimed at promoting online health information seeking among older adults must be coupled with efforts to identify and address threats to their online privacy. We first detail how internet users reveal sensitive health information to third parties through seemingly innocuous web browsing. We then describe ethical concerns raised by the inadvertent disclosure of health information, which include the potential for dignitary harms, subjective injuries, online health scams, and discrimination. After reviewing ways in which existing privacy laws fail to meet the needs of older adults, we provide recommendations for individual and collective action to protect the online privacy of older adults. 
    more » « less