skip to main content


This content will become publicly available on November 30, 2024

Title: Measures of Information Leakage for Incomplete Statistical Information: Application to a Binary Privacy Mechanism

Information leakageis usually defined as the logarithmic increment in the adversary’s probability of correctly guessing the legitimate user’s private data or some arbitrary function of the private data when presented with the legitimate user’s publicly disclosed information. However, this definition of information leakage implicitly assumes that both the privacy mechanism and the prior probability of the original data are entirely known to the attacker. In reality, the assumption of complete knowledge of the privacy mechanism for an attacker is often impractical. The attacker can usually have access to only an approximate version of the correct privacy mechanism, computed from a limited set of the disclosed data, for which they can access the corresponding un-distorted data. In this scenario, the conventional definition of leakage no longer has an operational meaning. To address this problem, in this article, we propose novel meaningful information-theoretic metrics for information leakage when the attacker hasincomplete informationabout the privacy mechanism—we call themaverage subjective leakage,average confidence boost, andaverage objective leakage, respectively. For the simplest, binary scenario, we demonstrate how to find an optimized privacy mechanism that minimizes the worst-case value of either of these leakages.

 
more » « less
Award ID(s):
2030249
NSF-PAR ID:
10488083
Author(s) / Creator(s):
; ;
Publisher / Repository:
Association for Computing Machinery
Date Published:
Journal Name:
ACM Transactions on Privacy and Security
Volume:
26
Issue:
4
ISSN:
2471-2566
Page Range / eLocation ID:
1 to 31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Impostors are attackers who take over a smartphone and gain access to the legitimate user’s confidential and private information. This paper proposes a defense-in-depth mechanism to detect impostors quickly with simple Deep Learning algorithms, which can achieve better detection accuracy than the best prior work which used Machine Learning algorithms requiring computation of multiple features. Different from previous work, we then consider protecting the privacy of a user’s behavioral (sensor) data by not exposing it outside the smartphone. For this scenario, we propose a Recurrent Neural Network (RNN) based Deep Learning algorithm that uses only the legitimate user’s sensor data to learn his/her normal behavior. We propose to use Prediction Error Distribution (PED) to enhance the detection accuracy. We also show how a minimalist hardware module, dubbed SID for Smartphone Impostor Detector, can be designed and integrated into smartphones for self-contained impostor detection. Experimental results show that SID can support real-time impostor detection, at a very low hardware cost and energy consumption, compared to other RNN accelerators. 
    more » « less
  2. Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility. 
    more » « less
  3. Context-based pairing solutions increase the usability of IoT device pairing by eliminating any human involvement in the pairing process. This is possible by utilizing on-board sensors (with same sensing modalities) to capture a common physical context (e.g., ambient sound via each device’s microphone). However, in a smart home scenario, it is impractical to assume that all devices will share a common sensing modality. For example, a motion detector is only equipped with an infrared sensor while Amazon Echo only has microphones. In this paper, we develop a new context-based pairing mechanism called Perceptio that uses time as the common factor across differing sensor types. By focusing on the event timing, rather than the specific event sensor data, Perceptio creates event fingerprints that can be matched across a variety of IoT devices. We propose Perceptio based on the idea that devices co-located within a physically secure boundary (e.g., single family house) can observe more events in common over time, as opposed to devices outside. Devices make use of the observed contextual information to provide entropy for Perceptio’s pairing protocol. We design and implement Perceptio, and evaluate its effectiveness as an autonomous secure pairing solution. Our implementation demonstrates the ability to sufficiently distinguish between legitimate devices (placed within the boundary) and attacker devices (placed outside) by imposing a threshold on fingerprint similarity. Perceptio demonstrates an average fingerprint similarity of 94.9% between legitimate devices while even a hypothetical impossibly well-performing attacker yields only 68.9% between itself and a valid device. 
    more » « less
  4. While security technology can be nearly impenetrable, the people behind the computer screens are often easily manipulated, which makes the human factor the biggest threat to cybersecurity. This study examined whether college students disclosed private information about themselves, and what type of information they shared. The study utilized pretexting, in which attackers impersonate individuals in certain roles and often involves extensive research to ensure credibility. The goal of pretexting is to create situations where individuals feel safe releasing information that they otherwise might not. The pretexts used for this study were based on the natural inclination to help, where people tend to want to help those in need, and reciprocity, where people tend to return favors given to them. Participants (N=51) answered survey questions that they thought were for a good cause or that would result in a reward. This survey asked for increasingly sensitive information that could be used maliciously to gain access to identification, passwords, or security questions. Upon completing the survey, participants were debriefed on the true nature of the study and were interviewed about why they were willing to share information via the survey. Some of the most commonly skipped questions included “Student ID number” and “What is your mother’s maiden name?”. General themes identified from the interviews included the importance of similarities between the researcher and the subject, the researcher’s adherence to the character role, the subject’s awareness of question sensitivity, and the overall differences between online and offline disclosure. Findings suggest that college students are more likely to disclose private information if the attacker shares a similar trait with the target or if the attacker adheres to the character role they are impersonating. Additionally, this study sheds light on the research limitations, emphasizes the relevance of the human factor in security and privacy, and offers recommendations for future research. 
    more » « less
  5. Distribution inference, sometimes called property inference, infers statistical properties about a training set from access to a model trained on that data. Distribution inference attacks can pose serious risks when models are trained on private data, but are difficult to distinguish from the intrinsic purpose of statistical machine learning—namely, to produce models that capture statistical properties about a distribution. Motivated by Yeom et al.’s membership inference framework, we propose a formal definition of distribution inference attacks general enough to describe a broad class of attacks distinguishing between possible training distributions. We show how our definition captures previous ratio-based inference attacks as well as new kinds of attack including revealing the average node degree or clustering coefficient of training graphs. To understand distribution inference risks, we introduce a metric that quantifies observed leakage by relating it to the leakage that would occur if samples from the training distribution were provided directly to the adversary. We report on a series of experiments across a range of different distributions using both novel black-box attacks and improved versions of the state-of-the-art white-box attacks. Our results show that inexpensive attacks are often as effective as expensive meta-classifier attacks, and that there are surprising asymmetries in the effectiveness of attacks.

     
    more » « less