Employing Differential Privacy (DP), the state-of-the-art privacy standard, to answer aggregate database queries poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? We propose to demonstrate DPXPlain, the first system for explaining group-by aggregate query answers with DP. DPXPlain allows users to compare values of two groups and receive a validity check, and further provides an explanation table with an interactive visualization, containing the approximately 'top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps.
more »
« less
Statistical Privacy and Consent in Data Aggregation
As new laws governing management of personal data are introduced, e.g., the European Union’s General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an increasingly important aspect of data management. An important component of many data privacy laws is that they require companies to only use an individual’s data for a purpose the individual has explicitly consented to. Prior methods for enforcing consent for aggregate queries either use access control to eliminate data without consent from query evaluation or apply differential privacy algorithms to inject synthetic noise into the outcomes of queries (or input data) to ensure that the anonymity of non-consenting individuals is preserved with high probability. Both approaches return query results that differ from the ground truth results corresponding to the full input containing data from both consenting and non-consenting individuals. We present an alternative frame- work for group-by aggregate queries, tailored for applications, e.g., medicine, where even a small deviation from the correct answer to a query cannot be tolerated. Our approach uses provenance to determine, for each output tuple of a group-by aggregate query, which individual’s data was used to derive the result for this group. We then use statistical tests to determine how likely it is that the presence of data for a non-consenting individual will be revealed by such an output tuple. We filter out tuples for which this test fails, i.e., which are deemed likely to reveal non-consenting data. Thus, our approach always returns a subset of the ground truth query answers. Our experiments successfully return only 100% accurate results in instances where access control or differential privacy would have either returned less total or less accurate results.
more »
« less
- Award ID(s):
- 2016548
- PAR ID:
- 10528465
- Publisher / Repository:
- The 36th International Conference on Scientific and Statistical Database Management (SSDBM)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)A private data federation is a set of autonomous databases that share a unified query interface offering in-situ evaluation of SQL queries over the union of the sensitive data of its members. Owing to privacy concerns, these systems do not have a trusted data collector that can see all their data and their member databases cannot learn about individual records of other engines. Federations currently achieve this goal by evaluating queries obliviously using secure multiparty computation. This hides the intermediate result cardinality of each query operator by exhaustively padding it. With cascades of such operators, this padding accumulates to a blow-up in the output size of each operator and a proportional loss in query performance. Hence, existing private data federations do not scale well to complex SQL queries over large datasets. We introduce Shrinkwrap, a private data federation that offers data owners a differentially private view of the data held by others to improve their performance over oblivious query processing. Shrinkwrap uses computational differential privacy to minimize the padding of intermediate query results, achieving up to a 35X performance improvement over oblivious query processing. When the query needs differentially private output, Shrinkwrap provides a trade-off between result accuracy and query evaluation performance.more » « less
-
null (Ed.)Differential privacy offers a formal privacy guarantee for individuals, but many deployments of differentially private systems require a trusted third party (the data curator). We propose DuetSGX, a system that uses secure hardware (Intel’s SGX) to eliminate the need for a trusted data curator. Data owners submit encrypted data that can be decrypted only within a secure enclave running the DuetSGX system, ensuring that sensitive data is never available to the data curator. Analysts submit queries written in the Duet language, which is specifically designed for verifying that programs satisfy differential privacy; DuetSGX uses the Duet typechecker to verify that each query satisfies differential privacy before running it. DuetSGX therefore provides the benefits of local differential privacy and central differential privacy simultaneously: noise is only added to final results, and there is no trusted third party. We have implemented a proof-of-concept implementation of DuetSGX and we release it as open-source.more » « less
-
Data privacy policy requirements are a quickly evolving part of the data management domain. Healthcare (e.g., HIPAA), financial (e.g., GLBA), and general laws such as GDPR or CCPA impose controls on how personal data should be managed. Relational databases do not offer built-in features to support data management features to comply with such laws. As a result, many organizations implement ad-hoc solutions or use third party tools to ensure compliance with privacy policies. However, external compliance framework can conflict with the internal activity in a database (e.g., trigger side-effects or aborted transactions). In our prior work, we introduced a framework that integrates data retention and data purging compliance into the database itself, requiring only the support for triggers and encryption, which are already available in any mainstream database engine. In this demonstration paper, we introduce DBCompliant – a tool that demonstrates how our approach can seamlessly integrate comprehensive policy compliance (defined via SQL queries). Although we use PostgreSQL as our back-end, DBCompliant could be adapted to any other relational database. Finally, our approach imposes low (less than 5%) user query overhead.more » « less
-
Differential privacy is the dominant standard for formal and quantifiable privacy and has been used in major deployments that impact millions of people. Many differentially private algorithms for query release and synthetic data contain steps that reconstruct answers to queries from answers to other queries that have been measured privately. Reconstruction is an important subproblem for such mecha- nisms to economize the privacy budget, minimize error on reconstructed answers, and allow for scalability to high-dimensional datasets. In this paper, we introduce a principled and efficient postprocessing method ReM (Residuals-to-Marginals) for reconstructing answers to marginal queries. Our method builds on recent work on efficient mechanisms for marginal query release, based on making measurements using a residual query basis that admits efficient pseudoinversion, which is an important primitive used in reconstruction. An extension GReM-LNN (Gaussian Residuals-to-Marginals with Local Non-negativity) reconstructs marginals under Gaussian noise satisfying consistency and non-negativity, which often reduces error on reconstructed answers. We demonstrate the utility of ReM and GReM-LNN by applying them to improve existing private query answering mechanisms.more » « less
An official website of the United States government

