skip to main content


Title: IncShrink: Architecting Efficient Outsourced Databases using Incremental MPC and Differential Privacy
In this paper, we consider secure outsourced growing databases (SOGDB) that support view-based query answering. These databases allow untrusted servers to privately maintain a materialized view. This allows servers to use only the materialized view for query processing instead of accessing the original data from which the view was derived. To tackle this, we devise a novel view-based SOGDB framework, Incshrink. The key features of this solution are: (i) Incshrink maintains the view using incremental MPC operators which eliminates the need for a trusted third party upfront, and (ii) to ensure high performance, Incshrink guarantees that the leakage satisfies DP in the presence of updates. To the best of our knowledge, there are no existing systems that have these properties. We demonstrate Incshrink's practical feasibility in terms of efficiency and accuracy with extensive experiments on real-world datasets and the TPC-ds benchmark. The evaluation results show that Incshrink provides a 3-way trade-off in terms of privacy, accuracy and efficiency, and offers at least a 7,800x performance advantage over standard SOGDB that do not support view-based query paradigm.  more » « less
Award ID(s):
2029853 2016393
NSF-PAR ID:
10340691
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
Page Range / eLocation ID:
818 to 832
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We introduce theReverseSpatial Top-kKeyword (RSK)query, which is defined as:given a query term q, an integer k and a neighborhood size find all the neighborhoods of that size where q is in the top-k most frequent terms among the social posts in those neighborhoods. An obvious approach would be to partition the dataset with a uniform grid structure of a given cell size and identify the cells where this term is in the top-k most frequent keywords. However, this answer would be incomplete since it only checks for neighborhoods that are perfectly aligned with the grid. Furthermore, for every neighborhood (square) that is an answer, we can define infinitely more result neighborhoods by minimally shifting the square without including more posts in it. To address that, we need to identify contiguous regions where any point in the region can be the center of a neighborhood that satisfies the query. We propose an algorithm to efficiently answer an RSK query using an index structure consisting of a uniform grid augmented by materialized lists of term frequencies. We apply various optimizations that drastically improve query latency against baseline approaches. We also provide a theoretical model to choose the optimal cell size for the index to minimize query latency. We further examine a restricted version of the problem (RSKR) that limits the scope of the answer and propose efficientapproximatealgorithms. Finally, we examine how parallelism can improve performance by balancing the workload using a smartload slicingtechnique. Extensive experimental performance evaluation of the proposed methods using real Twitter datasets and crime report datasets, shows the efficiency of our optimizations and the accuracy of the proposed theoretical model.

     
    more » « less
  2. null (Ed.)
    Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries , we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different . We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements. 
    more » « less
  3. Advances in technology coupled with the availability of low-cost sensors have resulted in the continuous generation of large time series from several sources. In order to visually explore and compare these time series at different scales, analysts need to execute online analytical processing (OLAP) queries that include constraints and group-by's at multiple temporal hierarchies. Effective visual analysis requires these queries to be interactive. However, while existing OLAP cube-based structures can support interactive query rates, the exponential memory requirement to materialize the data cube is often unsuitable for large data sets. Moreover, none of the recent space-efficient cube data structures allow for updates. Thus, the cube must be re-computed whenever there is new data, making them impractical in a streaming scenario. We propose Time Lattice, a memory-efficient data structure that makes use of the implicit temporal hierarchy to enable interactive OLAP queries over large time series. Time Lattice is a subset of a fully materialized cube and is designed to handle fast updates and streaming data. We perform an experimental evaluation which shows that the space efficiency of the data structure does not hamper its performance when compared to the state of the art. In collaboration with signal processing and acoustics research scientists, we use the Time Lattice data structure to design the Noise Profiler, a web-based visualization framework that supports the analysis of noise from cities. We demonstrate the utility of Noise Profiler through a set of case studies. 
    more » « less
  4. null (Ed.)
    Deep Convolutional Neural Networks (CNNs) now match human accuracy in many image prediction tasks, resulting in a growing adoption in e-commerce, radiology, and other domains. Naturally, "explaining" CNN predictions is a key concern for many users. Since the internal workings of CNNs are unintuitive for most users, occlusion-based explanations (OBE) are popular for understanding which parts of an image matter most for a prediction. One occludes a region of the image using a patch and moves it around to produce a heatmap of changes to the prediction probability. This approach is computationally expensive due to the large number of re-inference requests produced, which wastes time and raises resource costs. We tackle this issue by casting the OBE task as a new instance of the classical incremental view maintenance problem. We create a novel and comprehensive algebraic framework for incremental CNN inference combining materialized views with multi-query optimization to reduce computational costs. We then present two novel approximate inference optimizations that exploit the semantics of CNNs and the OBE task to further reduce runtimes. We prototype our ideas in a tool we call Krypton. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5x (resp. 35x) to produce exact (resp. high-quality approximate) results without raising resource requirements. 
    more » « less
  5. The recent development of three-dimensional graphic statics (3DGS) has greatly increased the ease of designing complex and efficient spatial funicular structural forms [1]. The reciprocal diagram based 3DGS approaches not only generate highly efficient funicular structures [2], but also result in planarity constraints due to the polyhedron nature of the reciprocal diagrams [3]. Our previous research has shown the feasibility of leveraging this planarity by using planar glass sheets to materialize the 10m-span, double-layer glass bridge [3]. This paper is framed as a proof of concept for the 10m bridge and explores the form-finding, detail configuration, fabrication constraints, and assembly logic by designing and constructing a small-scale bridge prototype with a span of 2.5m. The prototype is designed in a modular approach, where each polyhedral cell of the form is materialized using a hollow glass unit (HGU) (Figure 1a), which can be prefabricated and preassembled, and therefore, greatly simplifies the assembly of the whole bridge. The compression-only form of the prototype is generated using the PolyFrame beta [4] plug-in for Rhinoceros [5]. The form-finding is carried out with a comprehensive consideration of a variety of parameters, including fabrication constraints, assembly ease, construction cost, and practicality. To start the form-finding process, a group of closed convex force polyhedrons is aggregated, controlling the topology of the form diagram and the orientations of the form elements. By manipulating the face tilting angles of the force diagram, the supported edges at the end of the bridge are all made horizontal, reducing the difficulty of the support design. Then, vertex locations and edge lengths of the form diagram are constrained, determining the final dimensions of both the bridge and the cells. After getting the geometry of the bridge, the detail developments are streamlined. Each of the 13 HGUs consists of two flat deck plates and a series of side plates (Figure 1b). To interlock the adjacent cells and prevent possible sliding, a male-female connection mechanism is introduced to the conjoint side plates of the HGUs (Figure 1b). Additionally, to eliminate the direct contact of the glass parts and prevent the stress concentration, two softer transparent materials are involved for connecting purposes. Within each HGU, silicon-based binding agent is used to hold the glass parts together; between the neighboring HGUs, plastic sheets are placed as interface materials (Figure 1b). Figure 1. a) The 2.5m-span small-scale prototype dome, b) Exploded view showing deck plates, side plates, male-female connection, and interface material For the fabrication of the glass parts, 5-axis Waterjet cutting techniques are applied. While the glass sheets for the deck plates can be purchased from the market, the irregular side plates with male-female connections need to be made from kiln-cast glass. In terms of the Waterjet cutting constraints, there is a max cutting angle of 60 degrees from vertical. With respect to this, all the glass parts are examined during the design process to ensure they all satisfy the cutting angle requirements. Aiming to achieve a fast and precise assembly, several assistant techniques are developed. On the local HGU level, assembly connectors are designed and 3D-printed to help locate the glass parts. On the global prototype level, the assembly sequence of the HGUs are simulated to avoid interference. Besides, a labeling system is also established to organize the fabricated parts and guide the entire assembly process. The design and construction of this small-scale prototype provide important information for the future development of the full-scale bridge regarding the interlocking detail design, the fabrication constraints, and assembly logic. The actual structural performance of the prototype awaits further investigation through-loading experiments. 
    more » « less