skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy
In this paper, we consider privacy-preserving update strategies for secure outsourced growing databases. Such databases allow appendonly data updates on the outsourced data structure while analysis is ongoing. Despite a plethora of solutions to securely outsource database computation, existing techniques do not consider the information that can be leaked via update patterns. To address this problem, we design a novel secure outsourced database framework for growing data, DP-Sync, which interoperate with a large class of existing encrypted databases and supports efficient updates while providing differentially-private guarantees for any single update. We demonstrate DP-Sync's practical feasibility in terms of performance and accuracy with extensive empirical evaluations on real world datasets.  more » « less
Award ID(s):
2029853 2016393
PAR ID:
10340690
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
Page Range / eLocation ID:
1892 to 1905
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we consider secure outsourced growing databases (SOGDB) that support view-based query answering. These databases allow untrusted servers to privately maintain a materialized view. This allows servers to use only the materialized view for query processing instead of accessing the original data from which the view was derived. To tackle this, we devise a novel view-based SOGDB framework, Incshrink. The key features of this solution are: (i) Incshrink maintains the view using incremental MPC operators which eliminates the need for a trusted third party upfront, and (ii) to ensure high performance, Incshrink guarantees that the leakage satisfies DP in the presence of updates. To the best of our knowledge, there are no existing systems that have these properties. We demonstrate Incshrink's practical feasibility in terms of efficiency and accuracy with extensive experiments on real-world datasets and the TPC-ds benchmark. The evaluation results show that Incshrink provides a 3-way trade-off in terms of privacy, accuracy and efficiency, and offers at least a 7,800x performance advantage over standard SOGDB that do not support view-based query paradigm. 
    more » « less
  2. In this work, we propose Longshot, a novel design for secure outsourced database systems that supports ad-hoc queries through the use of secure multi-party computation and differential privacy. By combining these two techniques, we build and maintain data structures (i.e., synopses, indexes, and stores) that improve query execution efficiency while maintaining strong privacy and security guarantees. As new data records are uploaded by data owners, these data structures are continually updated by Longshot using novel algorithms that leverage bounded information leakage to minimize the use of expensive cryptographic protocols. Furthermore, Long-shot organizes the data structures as a hierarchical tree based on when the update occurred, allowing for update strategies that provide logarithmic error over time. Through this approach, Longshot introduces a tunable three-way trade-off between privacy, accuracy, and efficiency. Our experimental results confirm that our optimizations are not only asymptotic improvements but also observable in practice. In particular, we see a 5x efficiency improvement to update our data structures even when the number of updates is less than 200. Moreover, the data structures significantly improve query runtimes over time, about ~103x faster compared to the baseline after 20 updates.

     
    more » « less
  3. Many applications deployed to public clouds are concerned about the confidentiality of their outsourced data, such as financial services and electronic patient records. A plausible solution to this problem is homomorphic encryption (HE), which supports certain algebraic operations directly over the ciphertexts. The downside of HE schemes is their significant, if not prohibitive, performance overhead for data-intensive workloads that are very common for outsourced databases, or database-as-a-serve in cloud computing. The objective of this work is to mitigate the performance overhead incurred by the HE module in outsourced databases. To that end, this paper proposes a radix-based parallel caching optimization for accelerating the performance of homomorphic encryption (HE) of outsourced databases in cloud computing. The key insight of the proposed optimization is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. We design the radix HE algorithm and apply it to both batch- and incremental-HE schemes; we demonstrate the security of those radix-based HE schemes by showing that the problem of breaking them can be reduced to the problem of breaking their base HE schemes that are known IND-CPA (i.e. Indistinguishability under Chosen-Plaintext Attack). We implement the radix-based schemes as middleware of a 10-node Cassandra cluster on CloudLab; experiments on six workloads show that the proposed caching can boost state-of-the-art HE schemes, such as Paillier and Symmetria, by up to five orders of magnitude.

     
    more » « less
  4. Vector databases have recently gained significant attention due to the emergence of large language models that produce vector embeddings for text. Existing vector databases can be broadly categorized into two types: specialized and generalized. Specialized vector databases are explicitly designed and optimized for managing vector data, while generalized ones support vector data management within a general purpose database. While specialized vector databases are interesting, there is a substantial customer base interested in generalized vector databases for various reasons, e.g., a reluctance to move data out of relational databases to reduce data silos and costs, the desire to use SQL, and the need for more sophisticated query processing of vector and non-vector data. However, generalized vector databases face two main challenges: performance and interoperability of vector search with SQL, such as combining vector search with filters, joins, or even fulltext search.

    In this paper, we present SingleStore-V, a full-fledged generalized vector database integrated into SingleStore, a modern distributed relational database optimized for both OLAP and OLTP workloads. SingleStore-V achieves high performance and interoperability via a suite of optimizations. Experiments on standard vector benchmarks show that SingleStore-V performs comparably to Milvus, a highly-optimized specialized vector database, and significantly outperforms pgvector, a popular generalized vector database in PostgreSQL. We believe this paper will shed light on integrating vector search into relational databases in general, as many design concepts and optimizations apply to other databases.

     
    more » « less
  5. There is great demand for scalable, secure, and efficient privacy-preserving machine learning models that can be trained over distributed data. While deep learning models typically achieve the best results in a centralized non-secure setting, different models can excel when privacy and communication constraints are imposed. Instead, tree-based approaches such as XGBoost have attracted much attention for their high performance and ease of use; in particular, they often achieve state-of-the-art results on tabular data. Consequently, several recent works have focused on translating Gradient Boosted Decision Tree (GBDT) models like XGBoost into federated settings, via cryptographic mechanisms such as Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC). However, these do not always provide formal privacy guarantees, or consider the full range of hyperparameters and implementation settings. In this work, we implement the GBDT model under Differential Privacy (DP). We propose a general framework that captures and extends existing approaches for differentially private decision trees. Our framework of methods is tailored to the federated setting, and we show that with a careful choice of techniques it is possible to achieve very high utility while maintaining strong levels of privacy. 
    more » « less