skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protecting Personally Identifiable Information (PII) in Critical Infrastructure Data Using Differential Privacy
In critical infrastructure (CI) sectors such as emergency management or healthcare, researchers can analyze and detect useful patterns in data and help emergency management personnel efficaciously allocate limited resources or detect epidemiology spread patterns. However, all of this data contains personally identifiable information (PII) that needs to be safeguarded for legal and ethical reasons. Traditional techniques for safeguarding, such as anonymization, have shown to be ineffective. Differential privacy is a technique that supports individual privacy while allowing the analysis of datasets for societal benefit. This paper motivates the use of differential privacy to answer a wide range of queries about CI data containing PII with better privacy guarantees than is possible with traditional techniques. Moreover, it introduces a new technique based on Multipleattribute Workload Partitioning, which does not depend on the nature of the underlying dataset and provides better protection for privacy than current differential privacy approaches.  more » « less
Award ID(s):
1922169 1433736
PAR ID:
10170816
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2019 IEEE International Symposium on Technologies for Homeland Security (HST)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The integration of connected autonomous vehicles (CAVs) has significantly enhanced driving convenience, but it has also raised serious privacy concerns, particularly regarding the personal identifiable information (PII) stored on infotainment systems. Recent advances in connected and autonomous vehicle control, such as multi-agent system (MAS)-based hierarchical architectures and privacy-preserving strategies for mixed-autonomy platoon control, underscore the increasing complexity of privacy management within these environments. Rental cars with infotainment systems pose substantial challenges, as renters often fail to delete their data, leaving it accessible to subsequent renters. This study investigates the risks associated with PII in connected vehicles and emphasizes the necessity of automated solutions to ensure data privacy. We introduce the Vehicle Inactive Profile Remover (VIPR), an innovative automated solution designed to identify and delete PII left on infotainment systems. The efficacy of VIPR is evaluated through surveys, hands-on experiments with rental vehicles, and a controlled laboratory environment. VIPR achieved a 99.5% success rate in removing user profiles, with an average deletion time of 4.8 s or less, demonstrating its effectiveness in mitigating privacy risks. This solution highlights VIPR as a critical tool for enhancing privacy in connected vehicle environments, promoting a safer, more responsible use of connected vehicle technology in society. 
    more » « less
  2. null (Ed.)
    Differential privacy protects an individual's privacy by perturbing data on an aggregated level (DP) or individual level (LDP). We report four online human-subject experiments investigating the effects of using different approaches to communicate differential privacy techniques to laypersons in a health app data collection setting. Experiments 1 and 2 investigated participants' data disclosure decisions for low-sensitive and high-sensitive personal information when given different DP or LDP descriptions. Experiments 3 and 4 uncovered reasons behind participants' data sharing decisions, and examined participants' subjective and objective comprehensions of these DP or LDP descriptions. When shown descriptions that explain the implications instead of the definition/processes of DP or LDP technique, participants demonstrated better comprehension and showed more willingness to share information with LDP than with DP, indicating their understanding of LDP's stronger privacy guarantee compared with DP. 
    more » « less
  3. null (Ed.)
    The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores. 
    more » « less
  4. Garcia-Alfaro, J; Kozik, R; Choraś, M; Katsikas, S (Ed.)
    Several prominent privacy regulation (e.g., CCPA and GDPR) require service providers to let consumers request access to, correct, or delete, their personal data. Compliance necessitates verification of consumer identity. This is not a problem for consumers who already have an account with a service provider since they can authenticate themselves via a successful account log-in. However, there are no such methods for accountless consumers, even though service providers routinely collect data about casual consumers, i.e., those without accounts. Currently, in order to access their collected data, accountless consumers are asked to provide Personally Identifiable Information (PII) to service providers, which is privacy-invasive. To address this problem, we propose PIVA: Privacy-Preserving Identity Verification for Accountless Users, a technique based on Private List Intersection (PLI) and its variants. First, we introduce PLI, a close relative of private set intersection (PSI), a well-known cryptographic primitive that allows two or more mutually suspicious parties to compute the intersection of their private input sets. PLI takes advantage of the (ordered and fixed) list structure of each party’s private set. As a result, PLI is more efficient than PSI. We also explore PLI variants: PLI-cardinality (PLI-CA), threshold-PLI (t-PLI), and threshold-PLI-cardinality (t-PLI-CA), all of which yield less information than PLI. These variants are progressively better suited for addressing the accountless consumer authentication problem. We prototype and compare its performance against techniques based on regular PSI and garbled circuits (GCs). Results show that proposed PLI and PLI-CA constructions are more efficient than GC-based techniques, in terms of both computation and communication overheads. While GC-based t-PLI and t-PLI-CA execute faster, proposed constructs greatly outperform the former in terms of bandwidth, e.g., our t-PLI protocol consumes less bandwidth. We also show that proposed protocols can be made secure against malicious adversaries, with only moderate increases in overhead. These variants outperform their GC-based counterparts by at least one order of magnitude. 
    more » « less
  5. Aggregating data is fundamental to data analytics, data exploration, and OLAP. Approximate query processing (AQP) techniques are often used to accelerate computation of aggregates using samples, for which confidence intervals (CIs) are widely used to quantify the associated error. CIs used in practice fall into two categories: techniques that are tight but not correct, i.e., they yield tight intervals but only offer asymptoticguarantees,makingthem unreliable, or techniques that are correct but not tight, i.e., they offer rigorous guarantees, but are overly conservative, leading to confidence intervals that are too loose to be useful. In this paper, we develop a CI technique that is both correct and tighter than traditional approaches. Starting from conservative CIs, we identify two issues they often face: pessimistic mass allocation (PMA) and phantom outlier sensitivity (PHOS). By developing a novel range-trimming technique for eliminating PHOS and pairing it with known CI techniques without PMA, we develop a technique for computing CIs with strong guarantees that requires fewer samples for the same width. We implement our techniques underneath a sampling-optimized in-memory column store and show how they accelerate queries involving aggregates on real datasets with typical speedups on the order of 10× over both traditional AQP-with-guarantees and exact methods, all while obeying accuracy constraints. 
    more » « less