skip to main content


Title: Formal specification and verification of user-centric privacy policies for ubiquitous systems
As our society has become more information oriented, each individual is expressed, defined, and impacted by information and information technology. While valuable, the current state-of-the-art mostly are designed to protect the enterprise/ organizational privacy requirements and leave the main actor, i.e., the user, un-involved or with the limited ability to have control over his/her information sharing practices. In order to overcome these limitations, algorithms and tools that provide a user-centric privacy management system to individuals with different privacy concerns are required to take into the consideration the dynamic nature of privacy policies which are constantly changing based on the information sharing context and environmental variables. This paper extends the concept of contextual integrity to provide mathematical models and algorithms that enables the creations and management of privacy norms for individual users. The extension includes the augmentation of environmental variables, i.e. time, date, etc. as part of the privacy norms, while introducing an abstraction and a partial relation over information attributes. Further, a formal verification technique is proposed to ensure privacy norms are enforced for each information sharing action.  more » « less
Award ID(s):
1657774
NSF-PAR ID:
10222641
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
the 23rd International Database Applications & Engineering Symposium
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In order to create user-centric and personalized privacy management tools, the underlying models must account for individual users’ privacy expectations, preferences, and their ability to control their information sharing activities. Existing studies of users’ privacy behavior modeling attempt to frame the problem from a request’s perspective, which lack the crucial involvement of the information owner, resulting in limited or no control of policy management. Moreover, very few of them take into the consideration the aspect of correctness, explainability, usability, and acceptance of the methodologies for each user of the system. In this paper, we present a methodology to formally model, validate, and verify personalized privacy disclosure behavior based on the analysis of the user’s situational decision-making process. We use a model checking tool named UPPAAL to represent users’ self-reported privacy disclosure behavior by an extended form of finite state automata (FSA), and perform reachability analysis for the verification of privacy properties through computation tree logic (CTL) formulas. We also describe the practical use cases of the methodology depicting the potential of formal technique towards the design and development of user-centric behavioral modeling. This paper, through extensive amounts of experimental outcomes, contributes several insights to the area of formal methods and user-tailored privacy behavior modeling. 
    more » « less
  2. The computer science literature on identification of people using personal information paints a wide spectrum, from aggregate information that doesn’t contain information about individual people, to information that itself identifies a person. However, privacy laws and regulations often distinguish between only two types, often called personally identifiable information and de-identified information. We show that the collapse of this technological spectrum of identifiability into only two legal definitions results in the failure to encourage privacy-preserving practices. We propose a set of legal definitions that spans the spectrum. We start with anonymous information. Computer science has created anonymization algorithms, including differential privacy, that provide mathematical guarantees that a person cannot be identified. Although the California Consumer Privacy Act (CCPA) defines aggregate information, it treats aggregate information the same as de-identified information. We propose a definition of anonymous information based on the technological possibility of logical association of the information with other information. We argue for the exclusion of anonymous information from notice and consent requirements. We next consider de-identified information. Computer science has created de-identification algorithms, including generalization, that minimize (but not eliminate) the risk of re-identification. GDPR defines anonymous information but not de-identified information, and CCPA defines de-identified information but not anonymous information. The definitions do not align. We propose a definition of de-identified information based on the reasonableness of association with other information. We propose legal controls to protect against re-identification. We argue for the inclusion of de-identified information in notice requirements, but the exclusion of de-identified information from choice requirements. We next address the distinction between trackable and non-trackable information. Computer science has shown how one-time identifiers can be used to protect reasonably linkable information from being tracked over time. Although both GDPR and CCPA discuss profiling, neither formally defines it as a form of personal information, and thus both fail to adequately protect against it. We propose definitions of trackable information and non-trackable information based on the likelihood of association with information from other contexts. We propose a set of legal controls to protect against tracking. We argue for requiring stronger forms of user choice for trackable information, which will encourage the use of non-trackable information. Finally, we address the distinction between pseudonymous and reasonably identifiable information. Computer science has shown how pseudonyms can be used to reduce identification. Neither GDPR nor CCPA makes a distinction between pseudonymous and reasonable identifiable information. We propose definitions based on the reasonableness of identifiability of the information, and we propose a set of legal controls to protect against identification. We argue for requiring stronger forms of user choice for reasonably identifiable information, which will encourage the use of pseudonymous information. Our definitions of anonymous information, de-identified information, non-trackable information, trackable information, and reasonably identifiable information can replace the over-simplified distinction between personally identifiable information versus de-identified information. We hope that this full spectrum of definitions can be used in a comprehensive privacy law to tailor notice and consent requirements to the characteristics of each type of information. 
    more » « less
  3. null (Ed.)
    Cities have circumvented privacy norms and deployed sensors to track vehicles via toll transponders (like E-Zpass tags). The ethical problems regarding these practices have been highlighted by various privacy advocacy groups. The industry however, has yet to implement a standard privacy protection regime to protect users’ data. Further, existing risk management models do not adequately address user-controlled data sharing requirements. In this paper, we consider the challenges of protecting private data in the Internet of Vehicles (IoV) and mobile edge networks. Specifically, we present a privacy risk reduction model for electronic toll transponder data. We seek to preserve driver privacy while contributing to intelligent transportation infrastructure congestion automation schemes. We thus propose TollsOnly, a fully homomorphic encryption protocol. TollsOnly is expected to be a post-quantum privacy preservation scheme. It enables users to share specific data with smart cities via blockchain technology. TollsOnly protects driver privacy in compliance with the European General Data Protection Regulation (GDPR) and the California Consumer Privacy Act. 
    more » « less
  4. A large amount of data is often needed to train machine learning algorithms with confidence. One way to achieve the necessary data volume is to share and combine data from multiple parties. On the other hand, how to protect sensitive personal information during data sharing is always a challenge. We focus on data sharing when parties have overlapping attributes but non-overlapping individuals. One approach to achieve privacy protection is through sharing differentially private synthetic data. Each party generates synthetic data at its own preferred privacy budget, which is then released and horizontally merged across the parties. The total privacy cost for this approach is capped at the maximum individual budget employed by a party. We derive the mean squared error bounds for the parameter estimation in common regression analysis based on the merged sanitized data across parties. We identify through theoretical analysis the conditions under which the utility of sharing and merging sanitized data outweighs the perturbation introduced for satisfying differential privacy and surpasses that based on individual party data. The experiments suggest that sanitized HOMM data obtained at a practically reasonable small privacy cost can lead to smaller prediction and estimation errors than individual parties, demonstrating the benefits of data sharing while protecting privacy. 
    more » « less
  5. Abstract

    Determining which species are at greatest risk, where they are most vulnerable, and what are the trajectories of their communities and populations is critical for conservation and management. Globally distributed, wide-ranging whales and dolphins present a particular challenge in data collection because no single research team can record data over biologically meaningful areas. Flukebook.org is an open-source web platform that addresses these gaps by providing researchers with the latest computational tools. It integrates photo-identification algorithms with data management, sharing, and privacy infrastructure for whale and dolphin research, enabling the global collaborative study of these global species. With seven automatic identification algorithms trained for 15 different species, resulting in 37 species-specific identification pipelines, Flukebook is an extensible foundation that continually incorporates emerging AI techniques and applies them to cetacean photo identification through continued collaboration between computer vision researchers, software engineers, and biologists. With over 2.0 million photos of over 52,000 identified individual animals submitted by over 250 researchers, the platform enables a comprehensive understanding of cetacean populations, fostering international and cross-institutional collaboration while respecting data ownership and privacy. We outline the technology stack and architecture of Flukebook, its performance on real-world cetacean imagery, and its development as an example of scalable, extensible, and reusable open-source conservation software. Flukebook is a step change in our ability to conduct large-scale research on cetaceans across biologically meaningful geographic ranges, to rapidly iterate population assessments and abundance trajectories, and engage the public in actions to protect them.

     
    more » « less