skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Secrecy: Secure collaborative analytics in untrusted clouds
We present Secrecy, a system for privacy-preserving collaborative analytics as a service. Secrecy allows multiple data holders to contribute their data towards a joint analysis in the cloud, while keeping the data siloed even from the cloud providers. At the same time, it enables cloud providers to offer their services to clients who would have otherwise refused to perform a computation altogether or insisted that it be done on private infrastructure. Secrecy ensures no information leakage and provides provable security guarantees by employing cryptographically secure Multi-Party Computation (MPC). In Secrecy we take a novel approach to optimizing MPC execution by co-designing multiple layers of the system stack and exposing the MPC costs to the query engine. To achieve practical performance, Secrecy applies physical optimizations that amortize the inherent MPC overheads along with logical optimizations that dramatically reduce the computation, communication, and space requirements during query execution. Our multi-cloud experiments demonstrate that Secrecy improves query performance by over 1000x compared to existing approaches and computes complex analytics on millions of data records with modest use of resources.  more » « less
Award ID(s):
1718135 1801564 1931714
PAR ID:
10358603
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
20th USENIX Symposium on Networked Systems Design and Implementation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present TVA, a multi-party computation (MPC) system for secure analytics on secret-shared time series data. TVA achieves strong security guarantees in the semi-honest and malicious settings, and high expressivity by enabling complex analytics on inputs with unordered and irregular timestamps. TVA is the first system to support arbitrary composition of oblivious window operators, keyed aggregations, and multiple filter predicates, while keeping all data attributes private, including record timestamps and user-defined values in query predicates. At the core of the TVA system lie novel protocols for secure window assignment: (i) a tumbling window protocol that groups records into fixed-length time buckets and (ii) two session window protocols that identify periods of activity followed by periods of inactivity. We also contribute a new protocol for secure division with a public divisor, which may be of independent interest. We evaluate TVA on real LAN and WAN environments and show that it can efficiently compute complex window-based analytics on inputs of 2^22 records with modest use of resources. When compared to the state-of-the-art, TVA achieves up to 5.8× lower latency in queries with multiple filters and two orders of magnitude better performance in window aggregation. 
    more » « less
  2. Harguess, Joshua D; Bastian, Nathaniel D; Pace, Teresa L (Ed.)
    Outsourcing computational tasks to the cloud offers numerous advantages, such as availability, scalability, and elasticity. These advantages are evident when outsourcing resource-demanding Machine Learning (ML) applications. However, cloud computing presents security challenges. For instance, allocating Virtual Machines (VMs) with varying security levels onto commonly shared servers creates cybersecurity and privacy risks. Researchers proposed several cryptographic methods to protect privacy, such as Multi-party Computation (MPC). Attackers unfortunately can still gain unauthorized access to users’ data if they successfully compromise a specific number of the participating MPC nodes. Cloud Service Providers (CSPs) can mitigate the risk of such attacks by distributing the MPC protocol over VMs allocated to separate physical servers (i.e., hypervisors). On the other hand, underutilizing cloud servers increases operational and resource costs, and worsens the overhead of MPC protocols. In this ongoing work, we address the security, communication and computation overheads, and performance limitations of MPC. We model this multi-objective optimization problem using several approaches, including but not limited to, zero-sum and non-zero-sum games. For example, we investigate Nash Equilibrium (NE) allocation strategies that reduce potential security risks, while minimizing response time and performance overhead, and/or maximizing resource usage. 
    more » « less
  3. Secure Multi-Party Computation (MPC) allows mutually distrusting parties to run joint computations without revealing private data. Current MPC algorithms scale poorly with data size, which makes MPC on "big data" prohibitively slow and inhibits its practical use. Many relational analytics queries can maintain MPC's end-to-end security guarantee without using cryptographic MPC techniques for all operations. Conclave is a query compiler that accelerates such queries by transforming them into a combination of data-parallel, local cleartext processing and small MPC steps. When parties trust others with specific subsets of the data, Conclave applies new hybrid MPC-cleartext protocols to run additional steps outside of MPC and improve scalability further. Our Conclave prototype generates code for cleartext processing in Python and Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave scales to data sets between three and six orders of magnitude larger than state-of-the-art MPC frameworks support on their own. Thanks to its hybrid protocols, Conclave also substantially outperforms SMCQL, the most similar existing system. 
    more » « less
  4. Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based big data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four big data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based big data analytics. 
    more » « less
  5. Modern cloud-native OLAP databases adopt a storage-disaggregation architecture that separates the management of compu- tation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. This paper presents FlexPushdownDB (FPDB), where we revisit the design of computation pushdown in a storage-disaggregation architecture, and then introduce several optimizations to further accelerate query pro- cessing. First, FPDB supports hybrid query execution, which combines local computation on cached data and computation pushdown to cloud storage at a fine granularity. Within the cache, FPDB uses a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Second, we design adaptive pushdown as a new mecha- nism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Finally, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems, and further propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation on SSB and TPC-H shows each optimization can improve the performance by 2.2×, 1.9×, and 3× respectively. 
    more » « less