skip to main content


Title: PReVer: Towards Private Regulated Verified Data
Data privacy has garnered significant attention recently due to diverse applications that store sensitive data in untrusted infrastructure. From a data management point of view, the focus has been on the privacy of stored data and the privacy of querying data at a large scale. However, databases are not solely query engines on static data, they must support updates on dynamically evolving datasets. In this paper, we lay out a vision for privacy-preserving dynamic data. In particular, we focus on dynamic data that might be stored remotely on untrusted providers. Updates arrive at a provider and are verified and incorporated into the database based on predefined constraints. Depending on the application, the content of the stored data, the content of the updates and the constraints may be private or public. We then propose PReVer, a universal framework for managing regulated dynamic data in a privacy-preserving manner. We explore a set of research challenges that PReVer needs to address in order to guarantee the privacy of data, updates, and/or constraints and address the consistent and verifiable execution of updates. This opens the space of privacy-preserving data management from the narrow perspective of private queries on static datasets to the larger space of private management of dynamic data.  more » « less
Award ID(s):
1703560
NSF-PAR ID:
10331213
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 25th International Conference on Extending Database Technology, (EDBT'2022)
Page Range / eLocation ID:
454-461
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Database driven dynamic spectrum sharing is one of the most promising dynamic spectrum access (DSA) solution to address the spectrum scarcity issue. In such a database driven DSA system, the centralized spectrum management infrastructure, called spectrum access system (SAS), makes its spectrum allocation decisions to secondary users (SUs) according to sensitive operational data of incumbent users (IUs). Since both SAS and SUs are not necessarily fully trusted, privacy protection against untrusted SAS and SUs become critical for IUs that have high operational privacy requirements. To address this problem, many IU privacy preserving solutions emerge recently. However, there is a lack of understanding and comparison of capability in protecting IU operational privacy under these existing approaches. In this paper, thus, we fill in the void by providing a comparative study that investigates existing solutions and explores several existing metrics to evaluate the strength of privacy protection. Moreover, we propose two general metrics to evaluate privacy preserving level and evaluate existing works with them. 
    more » « less
  2. Abstract Organizations often collect private data and release aggregate statistics for the public’s benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms—and in turn the disclosure risk for people described in the dataset—requires a data curator to choose a value for a privacy budget parameter, ɛ . However, there is little formal guidance for choosing ɛ , a task that requires reasoning about the probabilistic privacy–utility tradeoff. Furthermore, choosing ɛ in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise. We present Vi sualizing P rivacy (ViP), an interactive interface that visualizes relationships between ɛ , accuracy, and disclosure risk to support setting and splitting ɛ among queries. As a user adjusts ɛ , ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting ɛ using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals. 
    more » « less
  3. In this paper, we consider privacy-preserving update strategies for secure outsourced growing databases. Such databases allow appendonly data updates on the outsourced data structure while analysis is ongoing. Despite a plethora of solutions to securely outsource database computation, existing techniques do not consider the information that can be leaked via update patterns. To address this problem, we design a novel secure outsourced database framework for growing data, DP-Sync, which interoperate with a large class of existing encrypted databases and supports efficient updates while providing differentially-private guarantees for any single update. We demonstrate DP-Sync's practical feasibility in terms of performance and accuracy with extensive empirical evaluations on real world datasets. 
    more » « less
  4. Due to the large computational cost of data classification using deep learning, resource-limited devices, e.g., smart phones, PCs, etc., offload their classification tasks to a cloud server, which offers extensive hardware resources. Unfortunately, since the cloud is an untrusted third-party, users may be reluctant to share their private data with the cloud for data classification. Differential privacy has been proposed as a way of securely classifying data at the cloud using deep learning. In this approach, users conceal their data before uploading it to the cloud using a local obfuscation deep learning model, which is based on a data classification model hosted by the cloud. However, as the obfuscation model assumes that the pre-trained model at the cloud is static, it leads to significant performance degradation under realistic classification models that are constantly being updated. In this paper, we investigate the performance of differentially-private data classification under a dynamic pre-trained model, and a constant obfuscation model. We find that the classification performance decreases as the pre-trained model evolves. We then investigate the classification performance under an obfuscation model that is updated alongside the pre-trained model. We find that with a modest computational effort the obfuscation model can be updated to significantly improve the classification performance. under a dynamic pre-trained model. 
    more » « less
  5. Federated learning (FL) is an increasingly popular approach for machine learning (ML) when the training dataset is highly distributed. Clients perform local training on their datasets and the updates are then aggregated into the global model. Existing protocols for aggregation are either inefficient or don’t consider the case of malicious actors in the system. This is a major barrier to making FL an ideal solution for privacy-sensitive ML applications. In this talk, I will present ELSA, a secure aggregation protocol for FL that breaks this barrier - it is efficient and addresses the existence of malicious actors (clients + servers) at the core of its design. Similar to prior work Prio and Prio+, ELSA provides a novel secure aggregation protocol built out of distributed trust across two servers that keeps individual client updates private as long as one server is honest, defends against malicious clients, and is efficient end-to-end. Compared to prior works, the distinguishing theme in ELSA is that instead of the servers generating cryptographic correlations interactively, the clients act as untrusted dealers of these correlations without compromising the protocol’s security. This leads to a much faster protocol while also achieving stronger security at that efficiency compared to prior work. We introduce new techniques that retain privacy even when a server is malicious at a small added cost of 7-25% in runtime with a negligible increase in communication over the case of a semi-honest server. ELSA improves end-to-end runtime over prior work with similar security guarantees by big margins - single-aggregator RoFL by up to 305x (for the models we consider), and distributed-trust Prio by up to 8x (with up to 16x faster server-side protocol). Additionally, ELSA can be run in a bandwidth-saver mode for clients who are geographically bandwidth-constrained - an important property that is missing from prior works. 
    more » « less