<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>IncShrink: Architecting Efficient Outsourced Databases using Incremental MPC and Differential Privacy</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/10/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10340691</idno>
					<idno type="doi">10.1145/3514221.3526151</idno>
					<title level='j'>SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Chenghong Wang</author><author>Johes Bater</author><author>Kartik Nayak</author><author>Ashwin Machanavajjhala</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this paper, we consider secure outsourced growing databases (SOGDB) that support view-based query answering. These databases allow untrusted servers to privately maintain a materialized view. This allows servers to use only the materialized view for query processing instead of accessing the original data from which the view was derived. To tackle this, we devise a novel view-based SOGDB framework, Incshrink. The key features of this solution are: (i) Incshrink maintains the view using incremental MPC operators which eliminates the need for a trusted third party upfront, and (ii) to ensure high performance, Incshrink guarantees that the leakage satisfies DP in the presence of updates. To the best of our knowledge, there are no existing systems that have these properties. We demonstrate Incshrink's practical feasibility in terms of efficiency and accuracy with extensive experiments on real-world datasets and the TPC-ds benchmark. The evaluation results show that Incshrink provides a 3-way trade-off in terms of privacy, accuracy and efficiency, and offers at least a 7,800x performance advantage over standard SOGDB that do not support view-based query paradigm.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>outsourced databases are designed to help organizations outsource their data to untrusted cloud servers while providing secure query functionalities without sacri cing data con dentiality and privacy. The main idea is to have the data owners upload the encrypted or secret-shared data to the outsourcing servers. Moreover, the servers are empowered with secure protocols which allow them to process queries over such securely provisioned data. A series of works such as CryptDB <ref type="bibr">[69]</ref>, Cipherbase <ref type="bibr">[4]</ref>, EnclaveDB <ref type="bibr">[70]</ref>, and HardIDX <ref type="bibr">[33]</ref> took the rst step in the exploration of this scope by leveraging strong cryptographic primitives or secure hardware to accomplish the aforementioned design goals. Unfortunately, these solutions fail to provide strong security guarantees, as recent works on leakageabuse attacks <ref type="bibr">[10,</ref><ref type="bibr">19,</ref><ref type="bibr">49,</ref><ref type="bibr">88]</ref> have found that they are vulnerable to a variety of reconstruction attacks that exploit side-channel leakages. For instance, an adversary can fully reconstruct the data distribution after observing the query processing transcripts.</p><p>Although some recent e orts, such as <ref type="bibr">[7,</ref><ref type="bibr">11,</ref><ref type="bibr">28,</ref><ref type="bibr">32,</ref><ref type="bibr">46,</ref><ref type="bibr">65,</ref><ref type="bibr">67,</ref><ref type="bibr">68,</ref><ref type="bibr">87,</ref><ref type="bibr">89]</ref>, have shown potential countermeasures against leakageabuse attacks, the majority of these works focus primarily on static databases. A more practical system often requires the support of updates to the outsourced data <ref type="bibr">[1,</ref><ref type="bibr">35,</ref><ref type="bibr">78,</ref><ref type="bibr">83]</ref>, which opens up new challenges. Wang et al. <ref type="bibr">[83]</ref> formulate a new type of leakage called update pattern that a ects many existing outsourced database designs when the underlying data is dynamically growing. To mitigate such weakness, their solution dictates the data owners' update behavior to private record synchronization strategies, with which it perturbs the owners' logical update pattern. However, their solution only considers a na&#239;ve query answering mode such that each query is processed independently and evaluated directly over the entire outsourced data. This inevitably leads to a substantial amount of redundant computation by the servers. For example, consider the following use case where a courier company partners with a local retail store to help deliver products. The retail store has its sales data, and the courier company has its delivery records, both of which are considered to be the private property of each. Now assume the retail store owner wants to know how many of her products are delivered on time (i.e., within 48 hours of the courier accepting the package). With secure outsourced databases, the store owner and the courier company have the option to securely outsource their data and its corresponding computations to cloud servers. However, in a na&#239;ve query processing mode, the servers have to recompute the entire join relation between the outsourced data whenever a query is posted, which raises performance concerns.</p><p>In this work, we take the next step towards designing a secure outsourced growing database (SOGDB) architecture with a more e cient query answering mechanism. Our proposed framework employs a novel secure query processing method in which the Session 11: Database Security, Privacy and Control SIGMOD <ref type="bibr">'22, June 12-17, 2022</ref>, Philadelphia, PA, USA servers maintain a growing size materialized view corresponding to the owner's outsourced data. The upcoming queries will be properly answered using only the materialized view object. This brings in inherent advantages of view-based query answering <ref type="bibr">[77]</ref> paradigm, such as allowing the servers to cache important intermediate outputs, thus preventing duplicated computation. For instance, with our view-based SOGDB architecture, one can address the performance issues in the aforementioned use case by requiring the servers to maintain a materialized join table between the outsourced sales and delivery data. Moreover, whenever the underlying data changes, the materialized join table is updated accordingly. To this end, the servers only need to perform secure ltering over the materialized join table for processing queries, which avoids duplicated computation of join relations.</p><p>There is no doubt one can bene t in many aspects from the viewbased query answering paradigm. However, designing a practical view-based SOGDB is fraught with challenges. First, the servers that maintain the materialized view is considered to be potentially untrusted. Hence, we must explore the possibility of updating such materialized view without a trusted curator. A typical way is to leverage secure multi-party computation (MPC). However, na&#239;vely applying MPC for updating view instances over growing data would expose extra information leakage (i.e., update pattern <ref type="bibr">[83]</ref>). For example, consider the use case we mentioned before where the servers maintain a join table over the sales and the delivery data. Even with MPC, one can still obtain the true cardinality of newly inserted entries to the join table by looking at the output size from MPC. This would allow the adversary to learn the exact number of packages requested for delivery by the local retail store at any given time. Na&#239;ve methods, such as always padding newly generated view tuples to the maximum possible size or choosing never to update the materialized view, could prevent the aforementioned leakage. However, such an approach either introduces a large performance burden or does not provide us with the functionality of database updates. To combat this, we propose a novel view update methodology that leverages incremental MPC and di erential privacy (DP), which hides the corresponding update leakage using DP while balancing between the e ciency and accuracy.</p><p>This design pattern helps us to address the extra leakage, but also raises new challenges. The transformation from outsourced data to a view instance may have unbounded stability, i.e., an input record may contribute to the generation of multiple rows in the transformed output, which could cause unbounded privacy loss. To address this, we enforce that any individual data outsourced by the owner only contributes to the generation of a xed number of view tuples. As the transformation after applying this constraint has bounded stability, thus we obtain a xed privacy loss with respect to each insertion (logical update) to the owner's logical data.</p><p>Putting all these building blocks together, a novel view-based SOGDB framework, IncShrink, falls into place. We summarize our contributions as follows:</p><p>&#8226; IncShrink is a rst of its kind, secure outsourced growing database framework that supports view-based query processing paradigm. Comparing with the standard SOGDB <ref type="bibr">[83]</ref> that employs na&#239;ve query answering setting, IncShrink improves query e ciency, striking a balance between the guarantees of privacy, e ciency and accuracy, at the same time.</p><p>&#8226; IncShrink integrates incremental MPC and DP to construct the view update functionality which (i) allows untrusted entities to securely build and maintain the materialized view instance (ii) helps to reduce the performance overhead of view maintenance, and (iii) provides a rigorous DP guarantee on the leakage revealed to the untrusted servers. &#8226; IncShrink imposes constraints on the record contribution to view tuples which ensures the entire transformation from outsourced data to the view object over time has bounded stability. This further implies a bounded privacy loss with respect to each individual logical update.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226; We evaluate IncShrink on use cases inspired by the Chicago</head><p>Police Data and the TPC-ds benchmark. The evaluation results show at least 7800&#215; and up to 1.5e+5&#215; query e ciency improvement over standard SOGDB. Moreover, our evaluation shows that IncShrink provides a 3-way trade-o between privacy, e ciency, and utility while allowing users to adjust the con guration to obtain their desired guarantees.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">OVERVIEW</head><p>We design IncShrink to meet three main goals:</p><p>&#8226; View-based query answering. IncShrink enables viewbased query answering for a class of speci ed queries over secure outsourced growing data. &#8226; Privacy against untrusted server. Our framework allows untrusted servers to continuously update the materialized view while ensuring that the privacy of the owners' data is preserved against outsourcing servers. &#8226; Bounded privacy loss. The framework guarantees an unlimited number of updates under a xed privacy loss.</p><p>In this section, we rst outline the key ideas that allow IncShrink to support the our research goals (Section 2.1). Then brie y review the framework components in Section 2.2. We provide a running example in Section 2.3 to illustrate the overall framework work ow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Key Ideas</head><p>KI-1. View-based query processing over secure outsourced growing data. IncShrink employs materialized view for answering pre-speci ed queries over secure outsourced growing data. The framework allows untrusted outsourcing servers to securely build and maintain a growing-size materialized view corresponding to the selected view de nition. A typical materialized view can be either transformed solely based on the data provisioned by the owners, i.e., a join table over the outsourced data, or in combination with public information, i.e., a join table between the outsourced data and public relations. Queries posed to the servers are rewritten as queries over the de ned view and answered using only the view object. Due to the existence of materialization, the outsourcing servers are exempted from performing redundant computations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>KI-2. Incremental MPC with DP update leakage.</head><p>A key design goal of IncShrink is to allow the untrusted servers to privately update the view instance while also ensuring the privacy of owners' data. As mentioned before, compiling the view update functionality into the MPC protocol is not su cient to ensure data privacy, as it still leaks the true cardinality of newly inserted view entries at each time, which is directly tied with the owners' record update patterns <ref type="bibr">[83]</ref>. Although na&#239;ve approaches such as exhaustive padding (EP) of MPC outputs or maintaining a one-time materialized view (OTM) could alleviate the aforementioned privacy risk. They are known to either incorporate a large amount of dummy data or provide poor query accuracy due to the lack of updates to the materialized view. This motivates our second key idea to design an incremental MPC protocol for view updates while balancing the privacy, accuracy, and e ciency guarantees.</p><p>In our design, we adopt an innovative "Transform-and-Shrink" paradigm, where the protocol is composed of two sub-protocols, Transform and Shrink, that coordinate with each other. Transform generates corresponding view entries based on newly outsourced data, and places them to an exhaustively padded secure cache to avoid information leakage. Shrink, runs independently, periodically synchronizes the cached data to the materialized view. To prevent the inclusion of a large amount of dummy data, Shrink shrinks the cached data to a DP-sized secure array such that a subset of the dummy data is removed whereas the true cardinality is still preserved. As a result, the resulting protocol ensures any entity's knowledge with respect to the view instance is bounded by DP. KI-3. Fixed privacy loss through constraints on record contributions. When IncShrink releases noisy cardinalities, it ensures -DP with respect to the view instance. However, this does not imply -DP to the logical data where the view is derived. This is because an individual data point in the logical database may contribute to generating multiple view entries. As a result, the framework either incurs an unbounded privacy loss or has to stop updating the materialized view after su ciently many synchronizations. This leads us to our third key idea, where we bound the privacy loss by imposing constraints on the contributions made by each individual record to the generation of the view object. Each data point in the logical database is allocated with a contribution budget, which is consumed whenever the data is used to generate a new view entry. Once the contribution budget for a certain record is exhausted, IncShrink retires this data and will no longer use it to generate view entries. With such techniques, IncShrink is able to constantly update the materialized view with a bounded privacy loss. On the other hand, despite such constraints, IncShrink is still able to support a rich class of queries with small errors (Section 7).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Framework Components</head><p>Underlying database. IncShrink does not create a new secure outsourced database but rather builds on top of it. Therefore, as one of the major components, we assume the existence of an underlying secure outsourced database scheme. Typically, secure outsourced databases can be implemented according to di erent architectural settings, such as the models utilizing server-aided MPC <ref type="bibr">[6,</ref><ref type="bibr">7,</ref><ref type="bibr">47,</ref><ref type="bibr">64,</ref><ref type="bibr">79]</ref>, homomorphic encryption <ref type="bibr">[22]</ref>, symmetric searchable encryption <ref type="bibr">[3,</ref><ref type="bibr">9,</ref><ref type="bibr">20,</ref><ref type="bibr">26,</ref><ref type="bibr">35,</ref><ref type="bibr">48,</ref><ref type="bibr">78]</ref> or trusted hardware <ref type="bibr">[32,</ref><ref type="bibr">70,</ref><ref type="bibr">81,</ref><ref type="bibr">89]</ref>. For the ease of demonstration, we focus exclusively on the outsourced databases built upon the server-aided MPC model, where a set of data owners secretly share their data to two untrusted but non-colluding servers S 0 and S 1 . The two outsourcing servers are able to perform computations (i.e., query processing) over the secret shared data by jointly evaluating a 2party secure computation protocol. More details about this setting and its corresponding security de nitions are provided in Section 4. We stress that, although the protocols described in this paper assumes an underlying database architected under the server-aided MPC setup, these protocols can be adapted to other settings as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Materialized view.</head><p>A materialized view is a subset of a secure outsourced database, which is typically generated from a query and stored as an independent object (i.e., an encrypted or secretshared data structure). The servers can process queries over the view instance just as they would in a persistent secure database. Additionally, changes to the underlying data are re ected in the entries shown in subsequent invocations of the materialized view.</p><p>View update protocol. The view update protocol is an incremental MPC protocol jointly evaluated by the outsourcing servers. It allows untrusted servers to privately update the materialized view with bounded leakage. More details can be found in Section 5</p><p>Secure outsourced cache. The secure outsourced cache is a secure array (i.e., memory blocks that are encrypted, secret-shared, or stored inside trusted hardware) denoted as [1, 2, 3, . . .], which is used to temporarily store newly added view entries that will later be synchronized to the materialized view. In this work, as we focus on the server-aided MPC model, thus is considered as a secret shared memory block across two non-colluding servers. Each [ ] represents a (secret-shared) view entry or a dummy tuple. Details on how our view update protocol interacts with the secure cache (i.e., read, write, and ush cache) are provided in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">IncShrink Work ow</head><p>We now brie y review the framework architecture and its work ow with a running example (Figure <ref type="figure">1</ref>), where an analyst is interested in a join query over the outsourced data from two data owners.  Initially, the analyst obtains authentications from the owners and registers the query with the outsourcing servers. The servers decide the view de nition as a join table, set up the initial materialization structure, the secure cache, and compile the corresponding secure protocols Transform and Shrink for maintaining the view instance. Since then, the owners periodically update the outsourced data, securely provisioning the newly received data since the last update (through the functionality de ned by the underlying database). For demonstration purposes, we assume the owners submit a xed-size data block (possibly padded with dummy records) at predetermined intervals. We discuss potential extensions to support other update behaviors in a later section. Whenever owners submit new data, the servers invoke Transform to securely compute new joins. The outputs will be padded to the maximum size then placed in a secure cache. Next, Shrink periodically synchronizes data from the cache to the join table with DP resized cardinalities. Note that the DP noise used to distort the true cardinality can be either positive or negative. If the noise is negative, some of the real tuples in the cache are not fetched by Shrink. We refer these tuples as the "deferred data". On the contrary, if the noise is positive, some deferred data or additional dummy tuples will be included to synchronize with the view. On the other hand, the analyst can issue query requests to the servers, which process the issued queries over the materialized join table and return the resulting outputs back to the analyst.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">PRELIMINARIES</head><p>Multi-party secure computation (MPC). MPC utilizes cryptographic primitives to enable a set of participants 1 , 2 , ..., to jointly compute a function over private input data supplied by each party , without using a trusted third party. The theory of MPC o ers strong security guarantee similar as what can be achieved with a trusted third party <ref type="bibr">[36]</ref>, i.e., absolutely no information leak to each participant beyond the desired output of ( 1 , 2 , ..., ) and their input . In this work we focus mainly on the 2-party secure computing setting.</p><p>( , )-secret sharing. Given ring Z , and = 2 &#8467; . A ( , )-secret sharing ( -out-of-) over Z shares a secret value &#8712; Z with parties such that the sharing satis es the following property:</p><p>&#8226; Availability Any of the parties such that &#8805; can recover the secret value . &#8226; Con dentiality Any of the parties such that &lt; have no information of .</p><p>For any value &#8712; Z , we denote it's secret sharing as &#8592; ( 1 , 2 , ..., ). There are many existing e orts to implement such secret sharing design <ref type="bibr">[8]</ref>, we focus on XOR-based (2, 2)-secret sharing over Z 2 32 with the following speci cations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">PRIVACY MODEL</head><p>In general, we consider our framework supports dynamic updating of the materialized view while hiding the corresponding update leakage. More speci cally, we consider the participants involved in the outsourcing phase are a set of data owners and two servers S 0 , and S 1 . We assume there exists a semi-honest probabilistic polynomial time (p.p.t.) adversary A who can corrupt any subset of the owners and at most one of the two servers. Previous work <ref type="bibr">[64]</ref> refers to this type of adversary as the admissible adversary, which captures the property of two non-colluding servers, i.e., if one is compromised by the adversary, the other one behaves honestly. Our privacy de nition requires that the knowledge A can obtain about any single data of the remaining honest owners, by observing the view updates, is bounded by di erential privacy. In this section, we rst provide key terminologies and notations (Section 4.1) then formalize our privacy model (Section 4.2) using simulation-based computational di erential privacy (SIM-CDP) <ref type="bibr">[63]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Notations</head><p>Growing database. A growing database is a dynamic relational dataset with insertion only updates, thus we de ne it as a collection of (logical) updates, D = { } &#8805;0 , where is a time stamped data. We write D = {D } &#8805;0 , such that D denotes the database instance of the growing database D at time and &#8704; D , D &#8838; D.</p><p>Outsourced data. The outsourced data is denoted as DS, which stores the secret-shared entries corresponding to the records in the logical database, with the possibility to include additional dummy data. Similarly, we write DS = {DS } &#8805;0 , where DS &#8838; DS is the outsourced data at time .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Materialized view.</head><p>We use V to denote the materialized view instance which is a collection of secret-shared tuples. Each tuple in V is transformed from the outsourced data DS or in combination with public information. We de ne V = {V } &#8805;0 , where V denotes the materialized view structure at time , and &#916;V denotes the changes between (newly generated view entries) V and V -1</p><p>Query. Given a growing database D and a corresponding materialized view V, we de ne the logical query posted at time as (D ) and the re-written view-based query as &#732; (V ). We refer the L1 norm of the di erence between &#732; (V ) and (D ) as the L1 query error, denoted as</p><p>, which measures the di erence between the server responded outputs and their corresponding logical results. Additionally, we call the elapsed time for processing &#732; (V ) as the query execution time (QET) of . In this work, we use L1 error and QET as the main metrics to evaluate the accuracy and e ciency of our framework, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Privacy De nition</head><p>Based on the formalization of update pattern in <ref type="bibr">[83]</ref>, we rst provide a more generalized de nition of update pattern that captures updates to view instances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D 1 (U ).</head><p>Given a growing database D, the update pattern for outsourcing D is the function family of UpdtPa (D) = {UpdtPa (D)} &#8712;N + , with:</p><p>where is a transformation function that outputs a set of tuples (i.e., new view entries) been outsourced to the server at time .</p><p>In general, De nition 1 de nes the transcript of entire update history for outsourcing a growing database D. It may include information about the volume of the outsourced data and their corresponding insertion times. Moreover, if (D) &#8592; D -D -1 , then this simply indicates the record insertion pattern <ref type="bibr">[83]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D 2 (N</head><p>). Given a pair of growing databases D and D , such that there exists some parameter &#8805; 0, the following holds: (i) &#8704; &#8804; , D = D (ii) &#8704; &gt; , D and D di er by the addition or removal of a single logical update.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D 3 (DP ). Let to be a mechanism applied over a growing database. is said to satisfy -DP if for any neighboring growing databases D and D , and any</head><p>&#8712; O, where O is the universe of all possible outputs, it satis es:</p><p>De nition 3 ensures that, by observing the output of , the information revealed by any single logical update posted by the owner is di erentially private. Moreover, if the logical update corresponds to di erent entity's (owner's) data, the same holds for over each owner's logical database (privacy is guaranteed for each entity). Additionally, in this work, we assume each logical update &#8712; D as a secret event, therefore such mechanism achieves -event level DP <ref type="bibr">[30]</ref>. However, due to the group-privacy property <ref type="bibr">[53,</ref><ref type="bibr">58,</ref><ref type="bibr">86]</ref> of DP, one can achieve privacy for properties across multiple updates as well as at the user level as long as the property depends on a nite number of updates. An overall -user level DP can be achieved by setting the privacy parameter in De nition 3 to &#8467; , where &#8467; is the maximum number of tuples in the growing database owned by a single user. In practice, if the number of tuples owned by each user is unknown, a pessimistic large value can be chosen as . Moreover, recent works <ref type="bibr">[18,</ref><ref type="bibr">76]</ref> have provided methods for deriving an &lt; &#8804; &#8467; &#215; , such that -event DP algorithms provide an bound on privacy loss when data are correlated. For certain correlations, can be even close to and much smaller than &#8467; &#215; . In general, we emphasize that for the remainder of the paper, we focus exclusively on developing algorithms that ensure event-level privacy with parameter , while simultaneously satisfying all the aforementioned privacy guarantees, with possibly a di erent privacy parameter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D 4 (SIM CDP ). A view update protocol &#928; is said to satisfy -SIM-CDP if there exists a p.p.t. simulator S with only access to a set of public parameters pp and the output of a mechanism that satis es De nition 3. Then for any growing database instance D, and any p.p.t. adversary A, the adversary's advantage satis es:</head><p>where VIEW &#928; , and VIEW S denotes the adversary's view against the protocol execution and the simulator's outputs, respectively. And negl( ) is a negligible function related to a security parameter .</p><p>De nition 4 de nes the secure protocol for maintaining the materialized view such that as long as there exists at least one honest owner, the privacy of her data's individual records is guaranteed. In addition, the remaining entities' knowledge about honest owner's data is preserved by di erential privacy. In other words, any p.p.t. adversary's knowledge of such protocol &#928; is restricted to the outputs of an -DP mechanism . We refer to the mechanism as the leakage pro le of protocol &#928;, and a function related to the update pattern, i.e., (D) = (UpdtPa (D)). In the rest of this paper, we focus mainly on developing view update protocols that satisfy this de nition. Moreover, De nition 4 is derived from the SIM-CDP de nition which is formulated under the Universal Composition (UC) framework <ref type="bibr">[17]</ref>. Thus De nition 4 leads to a composability property such that if other protocols (i.e., ery protocol) de ned by the underlying databases also satisfy UC security, then privacy guarantee holds under the composed system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">PROTOCOL DESIGN</head><p>In general, our view update protocol is implemented as an incremental MPC across two non-colluding outsourcing servers. Speci cally, this incremental MPC is composed of two sub-protocols, Transform, and Shrink that operate independently but collaborate with each other. The reason for having this design pattern is that we can decouple the view transformation functionality and its update behavior, which provides exibility in the choice of di erent view update strategies. For example, one may want to update the materialized view at a xed interval or update it when there are enough new view entries. Each time when one needs to switch between these two strategies, she only needs to recompile the Shrink protocol without making any changes to the Transform protocol. In this section, we discuss the implementation details of these two protocols, in Section 5.1and 5.2, respectively. Due to space concerns, for theorems in this section, we defer the proofs to the appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Transform Protocol</head><p>Whenever owners submit new data, Transform is invoked to convert the newly outsourced data to its corresponding view instance based on a prede ned view de nition. Although, one could simply reuse the query capability of the underlying database to generate the corresponding view tuples. There are certain challenges in order to achieve our targeted design objectives. Here are two examples: (i) The view transformation might have unbounded stability which further leads to an unbounded privacy loss; (ii) While existing work <ref type="bibr">[7]</ref> implements a technique similar to rst padding the output and then reducing its size, they compile the functionality as a one-time MPC protocol, which makes it di cult for them to handle dynamic data. Our design of constructing "Transform" and "Shrink" as independent MPC protocols overcome this problem and introduce exibility in the choice of view update policy, but it raises a new challenge in that the two independently operating protocols still need to collaborate with each other. By default, the Shrink protocol is unaware of how much data can be removed from the secure cache, therefore Transform must privately inform Shrink how to eliminate the dummy records while ensuring DP. To address these challenges, we employ the following techniques:</p><p>&#8226; We adopt a truncated view transformation functionality to ensure that each outsourced data contributes to a bounded number of rows in the transformed view instance. &#8226; We track important parameters in the Transform phase, secretly share them inside the Transform protocol and store the corresponding shares onto each server. The parameters are later passed to Shrink protocol as auxiliary input and used to generate the DP resized cardinalities.</p><p>Algorithm 1 provides an overview of Transform protocol. At the very outset, Transform is tasked to: (i) convert the newly submitted data into its corresponding view entry from time to time; (ii) cache the new view entries to an exhaustively padded secure array; and (iii) maintain a cardinality counter of how many new view entries have been cached since the last view update. This counter is then privately passed (through secret shares) to the Shrink protocol.</p><p>At the very beginning, Transform initializes the cardinality counter = 0 and secret shares it to both servers (Alg 1:1-2). Transform uses trans_truncate (Alg 1:3) operator to obliviously compute the  &#8592; ||&#916;V new view tuples and truncate the contribution of each record at the same time. More speci cally, we assume the output of this operator is stored in an exhaustively padded secure array &#916;V, where each</p><p>&#916;V [ ] is a transformed tuple with an extra isView bit to indicate whether the corresponding tuple is a view entry (isView=1) or a dummy tuple (isView=0). Additionally, the operator ensures that</p><p>where (&#8226;) &#8592; truncate (new_entry(&#8226;), ). This indicates any input data only a ects at most rows in the truncated &#916;V. Once the truncated outputs are available, Transform updates and re-shares the cardinality counter , then appends the truncated outputs to the secure cache (Alg 1:5-7).</p><p>-stable transformation. We now provide the following lemmas with respect to the -stable transformation.</p><p>Given is -stable , and an -DP mechanism M. The composite computation M &#8226; implies -DP <ref type="bibr">[62,</ref><ref type="bibr">Theorem 2]</ref>.</p><p>According to Lemma 1, it's clear that protocol Transform isstable, and thus by Lemma 2, applying an -DP mechanism over the outputs of Transform protocol (the cached data) implies -DP over the original data. Therefore, if is constant, then the total privacy loss with respect to the input of Transform is bounded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Contribution over time.</head><p>According to the overall architecture, Transform is invoked repeatedly for transforming outsourced data into view entries at each time step. Thus having a -stable Transform does not immediately imply bounded privacy loss with respect to the logical database. There are certain cases where one record may contribute multiple times over time as the input to Transform. For example, suppose the servers maintain a join table on both Alice's and Bob's data. When Alice submits new data, the servers need to compute new join tuples between her new data and Bob's entire database, which results in some of Bob's data being used multiple times. This could eventually lead to unbounded privacy loss. Theorem 3 shows that the overall privacy loss may still be in nite when applying the DP mechanisms over a composition of -stable transformations (i.e. repeatedly invoke Transform). However, if the composed transformation is also -stable, then one can still obtain bounded privacy loss as max ,D : ( ) &gt;0 &#8804; max( ). Inspired by this, the following steps could help to obtain xed privacy loss over time: First we assign a total contribution budget to each outsourced data &#8712; DS. As long as a record is used as input to Transform (regardless of whether it contributes to generating a real view entry), it is consumed with a xed amount of budget (equal to the truncation limit ). Then Transform keeps track of the available contribution budget for each record over time and ensures that only data with a remaining budget is used. According to this design, the "life time" contribution of each outsourced data to the materialized view object is bounded by .</p><p>Implementation of trans_truncate operator. Na&#239;vely, this operator can be implemented via two separate steps. For example, one may rst apply an oblivious transformation (i.e. oblivious lter <ref type="bibr">[7,</ref><ref type="bibr">22]</ref>, join <ref type="bibr">[32,</ref><ref type="bibr">89]</ref>, etc.) without truncation over the input data. The results are stored to an exhaustively padded array. Next, truncation can be implemented by linearly scanning the array and resets the isView bit from 1 to 0 for a subset of the data in the array such that the resulting output satis es Eq 3. In practice, truncation can be integrated with the view transformation so that the protocol does not have to run an extra round of linear scan. In what follows, we provide an instantiated implementation of oblivious sort-merge join where the output is truncated with a contribution bound , and we continue to provide more implementations for other operators such as lter, nested-loop join, etc. in our complete full version. E 5.1 ( ). Assume two tables 1 , 2 to be joined, the algorithm outputs the join table between 1 and 2 such that each data owns at most rows in the resulting join table. The rst step is to union the two tables and then obliviously sort <ref type="bibr">[5]</ref> them based on the join attributes. To break the ties, we consider 1 records are ordered before 2 records. Then similar to a normal sort-merge join, where the operator linearly scans the sorted merged table then joins 1 records with the corresponding records in 2 . There are some variations to ensure obliviousness and bounded contribution. First, the operator keeps track of the contribution of each individual tuple. If a tuple has already produced join entries, then any subsequent joins with this tuple will be automatically discarded. Second, for linear scan, the operator outputs entries after accessing each tuple in the merged table, regardless of how many true joins are generated. If there are fewer joins, then pad them with additional dummy data, otherwise truncate the true joins and keep only the tuples. Figure <ref type="figure">2</ref> illustrates the aforementioned computation work ow.</p><p>Secret-sharing inside MPC. When re-sharing the new cardinalities (Alg 1:5-6), we must ensure none of the two servers can tamper with or predict the randomness for generating secret shares. This can be done with the following approach: each outsourcing server S chooses a value uniformly at random from the ring Z 2 32 , and contributes it as the input to Transform. The protocol then computes &#8592; { 0 &#8592; 0 &#8853; 1 , 1 &#8592; 0 &#8853; } internally. By applying this, S 0 's knowledge of the secret shared value is subject to the two</p><p>Linear scan Truncate or pad to " tuples " dummy tuples " dummy tuples " dummy tuples</p><p>" dummy tuples random values 0 , 1 while S 1 's knowledge is bounded to &#8853; 0 which is masked with a random value unknown to S 1 . A complete security proof of this technique can be found in our full version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Shrink Protocol</head><p>We propose two secure protocols that synchronize tuples from the secure cache to the materialized view. Our main idea originates from the private synchronization strategy proposed in DP-Sync <ref type="bibr">[83]</ref>, but with non-trivial variations and additional design. Typically, DP-Sync enforces trusted entities to execute private synchronization strategies, whereas in our scenario, the framework is supposed to allow untrusted servers to evaluate the view update protocol. Therefore, na&#239;vely adopting their techniques could lead to additional leakage and exposure to untrusted servers, such as the internal states (i.e., randomness) during protocol execution. Furthermore, DP-Sync considers that the subjects evaluating the synchronization strategies have direct access to a local cache in the clear, whereas in our setup the protocol must synchronize cached view tuples from an exhaustively padded (with dummy entries) secure array without knowing how the real data is distributed. To address these problems, we incorporate the following techniques:</p><p>&#8226; We utilize a joint noise-adding mechanism to generate DP noise, which ensures that no admissible adversary can obtain or tamper with the randomness used to generate the noise. &#8226; We implement a secure cache read operation that enforces the real data is always fetched before the dummy tuples when a cache read is performed. In what follows, we review the technical details of two view update protocols, sDPTimer and sDPANT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Timer-based approach (sDPTimer)</head><p>. sDPTimer is a 2PC protocol among S 0 , and S 1 , parameterized by , and , where it updates the materialized view every time units with a batch of DP-sized tuples. Algorithm 2 shows the corresponding details. </p><p>&#8882; sz &#8592; Lap( )</p><p>reset = 0 and re-share it to both servers.</p><p>For every time steps sDPTimer obtains the secret-shared cardinality counter from both servers, and recovers the counter internally. The protocol then distort this cardinality with Laplace noise sampled from Lap( ). To prevent information leakage, we must ensure none of the entities can control or predict the randomness used to generate this noise. To this end, inspired by the idea in <ref type="bibr">[29]</ref>, we implement a joint noise generation approach (Alg 2:4-6), where each server generates a random value &#8712; Z 2 32 uniformly at random and contributes it as an input to sDPTimer. The protocol computes &#8592; 0 &#8853; 1 internally, and converts to a xed-point random seed &#8712; (0, 1). Finally, sDPTimer computes Lap( ) &#8592; ln &#215; sign, using one extra bit of randomness to determine the sign, i.e. the most-signi cant bit of . By applying this, as long as one server honestly chooses the value uniformly at random and does not share it with any others (which is captured by the non-colluding server setting), she can be sure that no other entity can know anything about the resulting noise. In our design, this joint noise adding technique is used whenever the protocol involves DP noise generation. For ease of notation, we denote this approach as &#732; &#8592; JointNoise(S 0 , S 1 , &#916;, , ), where &#732; &#8592; Lap( &#916; ).  bit, moving all real tuples to the head and all dummy data to the tail, then cuts o the rst sz elements from the sorted array and stores them as a separate structure o. sDPTimer then updates the materialized view V by appending o to the old view instance. Figure <ref type="figure">3</ref> shows an example of the aforementioned operation. Such secure array operation ensures that the real data is always fetched before the dummy elements, which allows us to eliminate a subset of the dummy data and shrink the size of the updated view entries. Finally, after each update, sDPTimer resets to 0 and re-shares it secretly to both participating servers.</p><p>Note that query errors of IncShrink are caused by two factors, namely, the valid data discarded by the Transform due to truncation constraints and the total amount of unsynchronized data in the cache. When the truncation bound is set too small, then a large number of valid view entries are dropped by Transform, resulting in inaccurate query answers. We further investigate how truncation bound would a ect query errors in a later section (Section 7.4).</p><p>However, one could still choose a relatively large truncation bound to ensure that no data is discarded by Transform. As a result, query errors under such circumstances will be caused primarily by unsynchronized cached view tuples. Typically, less unsynchronized cached data that satisfy the requesting query implies better accuracy and vice versa. For instance, when IncShrink has just completed a new round of view update, the amount of cached data tends to be relatively small, and thus less data is missing from the materialized view. Queries issued at this time usually have better accuracy.  Therefore large amounts of cached data can lead to inaccurate query results. Although one may also adopt strategies such as returning the entire cache to the client, or scanning the cache while processing the query so that no data is missing. This will not break the security promise, however, will either increase the communication cost or the runtime overhead as the secure cache is exhaustively padded with dummy data. Moreover, in later evaluations (Section 7.1), we observe that even without using the cache for query processing, the relative query errors are small (i.e. bounded by 0.04). It's not hard to bound * by picking a smaller interval , however the upper bound for deferred data accumulates when increases. In addition, at each update, the server only fetches a batch of DP-sized data, leaving a large number of dummy tuples in the cache. Thus, to ensure bounded query errors and prevent the cache from growing too large, we apply an independent cache ushing mechanism to periodically clean the cache. To ush the cache, the protocol rst sorts it, then fetches a set of data by cutting o a xed number of tuples from the head of the sorted array. The fetched data is updated to the materialized view immediately and the remaining array is recycled (i.e. freeing the memory space). As per Theorem 4, we can set a proper ush size, such that with at most (a relatively small) probability there is non-dummy data been recycled. T 5. Given , , and &#8805; 4 log 1 , where &#8712; (0, 1). Suppose the cache ush interval is with size . Then the number of data inserted to the view after the -th update is bounded by ( 2 &#8730; ) + .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">Above noisy threshold. (sDPANT).</head><p>The above noise threshold protocol (Algorithm 3) takes , and as parameters and updates the materialized view whenever the number of new view entries is approximately equal to a threshold .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 3 sDPANT</head><p>Input: ; ; (sync threshold); ; V; sz &#8592; JointNoise(S 0 , S 1 , , 2 , )</p><p>&#710; &#8592; ObliSort( , key = )</p><p>&#732; &#8592; share( &#732; ), &#732; = &#8658; (S 0 , S 1 )</p><p>13: reset = 0 and re-share it to both servers.</p><p>Initially, the protocol splits the overall privacy budget to two parts 1 , and 2 , where 1 is used to construct the noisy condition check (Alg 3:7) and 2 is used to distort the true cardinalities (Alg 3:8). The two servers then involve a joint noise adding protocol that securely distort with noise Lap( <ref type="formula">2</ref>1 ). This noisy threshold will remain unchanged until sDPANT issues a new view update. An important requirement of this protocol is that such noisy threshold must remain hidden from untrusted entities. Therefore, to cache this value, sDPANT generates secret shares of &#732; internally and disseminates the corresponding shares to each server (Alg 3:3).</p><p>From then on, for each time step, the protocol gets the secret shares (true cardinality) and &#732; from S 0 and S 1 , which are subsequently recovered inside the protocol. sDPANT distorts the recovered cardinality &#732; &#8592; Lap( <ref type="formula">4</ref>1 ) and compares the noisy cardinality &#732; with &#732; . A view update is posted if &#732; &#8805; &#732; . By issuing updates, sDPANT distorts with another Laplace noise Lap( <ref type="formula">2</ref>) to obtain the read size sz. Similar to sDPTimer, it obliviously sorts the secure cache and fetches as many tuples as speci ed by sz from the head of the sorted cache. Note that, each time when an update is posted, sDPANT must re-generate the noisy threshold with fresh randomness. Therefore, after each updates, sDPANT resets = 0, produces a new &#732; , and updates the corresponding secret shares stored on the two servers (Alg 3:11-13). In addition, the same cache ush method in sDPTimer can be adopted by sDPANT as well. The following theorem provides an upper bound on the cached data at each time, which can be used to determine the cache ush size. T 6. Given , , and let the cache ushes every time steps with xed ushing size , the number of deferred data at any time is bounded by ( 16 log ) and the total number of dummy data inserted to the materialized view is bounded by ( 16 log ) + .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">SECURITY PROOF</head><p>We provide the security and privacy proofs in this section. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P .</head><p>We prove this theorem by rst providing a mechanism M that simulates the update pattern leakage of the view update protocol and proving that M satis es -DP. Second, we construct a p.p.t simulator that accepts as input only the output of M which can simulate the outputs that are computationally indistinguishable compared to the real protocol execution. In what follows, we provide the M timer that simulates the update pattern of sDPTimer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>M timer &#8704; :</head><p>return count( -&lt; (D)) + Lap( ), if 0 &#8801; (mod ) return 0, otherwise where denotes the time stamp when tuple is inserted to D, and -&lt; is a lter operator that selects all tuples inserted within the time interval ( -, ]. In general, M timer can be formulated as a series of -DP Laplace mechanisms that applies over disjoint data (tuples inserted in non-overlapping intervals). Thus by parallel composition theorem <ref type="bibr">[31]</ref>, M timer satis es -DP. Moreover, by Lemma 2 given a -stable transformation &#710; such that = (i.e. the Transform protocol), then M timer ( &#710; (D)) achieves &#215; = -DP. We abstract M timer 's output as {( , )} &#8805;0 , where is the number released by M timer at time . Also we assume the following parameters are publicly available: , Z , (batch size of owner uploaded data), (contribution bound), (cache ush size), (cache ush rate), (view update interval). In what follows, we construct a p.p.t. simulator S that simulates the protocol execution with only access to M timer 's outputs and public parameters (Table <ref type="table">1</ref>).</p><p>Initially, S initializes the internal storage . Then for each time step, S randomly samples 2 batches of data 1 , 2 from Z , where the cardinality of 1 , and 2 equals to , and , respectively.</p><p>Simulator S ( , , , , , , , ) 1. Initialize internal storage &#8592; {&#8709;}.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">&#8704; &gt;</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1: Simulator construction (sDPTimer)</head><p>These two batches simulate the secret shared data uploaded by owners and the transformed tuples placed in cache at time . Next, S appends 2 to and then samples 3 from such that the resulting cardinality of 3 equals to (if = 0 then S sets 3 = {&#8709;}). This step simulates the data synchronized by Shrink protocol. Finally, if (mod ) = 0, S generates one additional random value to simulate the secret share of the cardinality counter and reveals together with the generated data batches to the adversary (2.iii). If 0 &#8801; (mod ), then S performs another random sample from internal storage to fetch such that | | = , followed by resetting to empty. S reveals 1 , 2 , 3 , and one additional random value to the adversary. These steps (2.iv) simulate the cache ush. Otherwise S only reveals 1 , 2 and 3 . The computational indistinguishability between the 1 , 2 , 3 , and the messages the adversary can obtain from the real protocol follows the security of (2, 2)-secret-sharing scheme and the security of secure 2PC <ref type="bibr">[57]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>. Following the same proof paradigm, we rst provide M ant that simulates the view update pattern under sDPANT.</p><p>where * &lt; &#8804; is a lter that selects all data received since last update. In general, M ant is a mechanism utilizes sparse vector techniques (SVT) to periodically release a noisy count. According to <ref type="bibr">[83]</ref> (Theorem 11), this mechanism satis es -DP (interested readers may refer to our full version for complete privacy proof of M ant ). As per Lemma 2 we can obtain the same conclusion that M ant ( &#710; (D)) achieves -DP, if &#710; is -stable and = . Similarly, we abstract M ant 's output as {( , )} &#8805;0 and we assume the following parameters are publicly available: <ref type="figure">,</ref><ref type="figure">,</ref><ref type="figure">,</ref><ref type="figure">,</ref><ref type="figure">,</ref><ref type="figure">(threshold)</ref>. For simulator construction one can reuse most of the components in Table <ref type="table">1</ref> but with the following modi cations: (i) For step 2.iii, one should replace the condition check as whether &gt; 0. (ii) For step 2.iii and 2.iv, the simulator outputs one additional random value rd &#8592; --Z ) which simulates the secret shares of refreshed noisy threshold. Similar, the indistinguishability follows the security property of XOR-based secret-sharing scheme.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">EXPERIMENTS</head><p>In this section, we present evaluation results of our proposed framework. Speci cally, we address the following questions:</p><p>&#8226; Question-1: Do the view-based query answering approaches have e ciency advantages over the non-materialization (NM) approach? Also, how does the DP-based view update protocol compare with the na&#239;ve ones? &#8226; Question-2: For DP-protocols, is there a trade-o between privacy, e ciency and accuracy? Can we adjust the privacy parameters to achieve di erent e ciency or accuracy goals? &#8226; Question-3: How do sDPTimer compare to the sDPANT?</p><p>Under what circumstances is one better than the other one?</p><p>Implementation and con guration. We build the prototype Inc-Shrink based on Shrinkwrap <ref type="bibr">[7]</ref>, a typical secure outsourced database under the server-aided MPC setting. Shrinkwrap only supports static data and a standard query answering mode. IncShrink extends it to a view-based SOGDB. In addition, we also implement client programs that consume data from a given dataset and outsource them to the server. This simulates how real-world data owner devices would receive and outsource new data. We implement all secure 2PC protocols using EMP-Toolkit-0.2.1 and conduct all experiments on the GCP instance with 3.8GHz Xeon CPU, 32Gb RAM. Data. We evaluate the system using two datasets: TPC Data Stream (TPC-ds) <ref type="bibr">[71]</ref>, and Chicago Police Database (CPDB) <ref type="bibr">[23]</ref>. TPC-ds collects the retail records for several product suppliers over a veyear period. In our evaluation, we mainly use two relational tables, the Sales and the Return table. After eliminating invalid data points with incomplete or missing values, the Sales and Return tables contain 2.2 million and 270,000 records, respectively. CPDB is a living repository of public data about Chicago's police o cers and their interactions with the public. We primarily use two relations, the Allegation table, which documents the results of investigations into allegations of police misconduct, and the Award table, which collects awards given to certain o cers. The cleaned data contains 206,000 and 656,000 records for Allegation and Award, respectively. Execution scenario &amp; Testing query. For TPC-ds data, we delegate each relational table to a client program, which then independently outsources the data to the servers. We multiplex the sales time (Sales table) or return time (Return table) associated with each data as an indication of when the client received it. In addition, we assume that the client program uploads a batch of data every single day and the uploaded data is populated to the maximum size. In addition, we pick the following query for evaluating with TPC-ds.</p><p>&#8226; Q1-Count the total number of products returned within 10 days after purchasing: "SELECT COUNT(*) FROM Sales INNER JON Returns ON Sales.PID = Returns.PID WHERE Returns.ReturnDate -Sales.SaleDate &lt;= 10" Moreover, we set the materialized view as a join table for all products that returned within 10 days. As Q1 has multiplicity 1, thus we set = 1 (truncation bound), = 10 (total contribution budget).</p><p>For CPDB data, we consider only Allegation table is private and is delegated to a client program. The Award table will be treated as a public relation. Again, we use the investigation case end time to indicate when the client program received this record, and we assume that the client outsource data once every 5 days (minimum time span is 5 days), and the data is padded to maximum possible size as well. For evaluation, we select the following query.</p><p>&#8226; Q2-Count how many times has an o cer received an award from the department despite the fact that the o cer had been found to have misconduct in the past 10 days: "SELECT  Similarly, the materialized view is a join table that process Q2. We set the truncation bound = 10 and budget for each data as = 20. Default setting. Unless otherwise speci ed, we assume the following default con gurations. For both DP protocols, we set the default privacy parameter = 1.5, and cache ush parameters as = 2000 ( ush interval) and = 15 ( ush size). We x the sDPANT threshold as 30 for evaluating both datasets. Since the average number of new view entries added at each time step is 2.7 and 9.8, respectively for TPC-ds and CPDB, thus for consistency purpose we set the timer to 10 &#8592; 30 2.7 and 3 &#8592; 30 9.8 . For each test group, we issue one test query at each time step and report the average L1 error and query execution time (QET) for all issued testing queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">End-to-end Comparison</head><p>We address Question-1 by performing a comparative analysis between the DP (sDPTimer, sDPANT), na&#239;ve (one-time materialization and exhaustive padding), and the non-materialization <ref type="bibr">[83]</ref> approaches. The results are summarized in Table <ref type="table">2</ref> and Figure <ref type="figure">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Observation 1. View-based query answering provides a signi cant performance improvements over NM method.</head><p>As per Table <ref type="table">2</ref>, we observe that the non-materialization method is the least e cient group among all groups. In terms of the average QET, the DP protocols achieve performance improvements of up to 1.5e+5&#215; and 7888&#215; on TPC-ds and CPDB data, respectively, in contrast to the NM approach. Even the EP method provides a performance edge of up to 1366&#215; over NM approach. This result further demonstrates the necessity of adopting view-based query answering mechanism. A similar observation can be learned from Figure <ref type="figure">4</ref> as well, where in each gure we compare all test candidates along with the two dimensions of accuracy (x-axis) and e ciency (y-axis). In all gures, the view-based query answering groups lie beneath the NM approach, which indicates better performance. Observation 2. DP protocols provide a balance between the two dimensions of accuracy and e ciency. According to Table 2, the DP protocols demonstrate at least 50&#215; and 107&#215; accuracy advantages (in terms of L1-error), respectively for TPC-ds and CPDB, over the OTM. Meanwhile, in terms of performance, the DP protocols show a signi cant improvement in contrast to the EP. For example, in TPC-ds group, the average QETs of both sDPTimer (0.051s) and sDPANT (0.052s) are almost 120&#215; smaller than that of EP method (5.84s). Such performance advantage is even evident (up to 302&#215;) over the CPDB data as testing query Q2 has join multiplicity greater than 1. Although, the DP approaches cannot achieve a complete accuracy guarantee, the average relative errors of all tested queries under DP protocols are below 4.3%. These results are su cient to show that DP approaches do provide a balance between accuracy and e ciency. This conclusion can be better illustrated with Figure <ref type="figure">4</ref>, where EP and OTM are located in the upper left and lower right corners of each plot, respectively, which indicates that they either completely sacri ce e ciency (EP) or accuracy (OTM) guarantees. However, both DP methods lie at the bottom-middle position of both gures, which further reveals that the DP protocols are optimized for the dual objectives of accuracy and e ciency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">3-Way Trade-o</head><p>We address Question-2 by evaluating the DP protocols with different ranging from 0.01 to 50. ), where * denotes the cached new entries since last update, and</p><p>( 2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8730;</head><p>) is the upper bound for the deferred data (Theorem 4). As sDPTimer has a xed update frequency, thus * is independent of . However, the amount of deferred data is bounded by ( 2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8730;</head><p>), which leads to a decreasing trend in the L1 error of sDPTimer as increases. On the other hand, the update frequency of sDPANT is variable and will be a ected accordingly when changes. For example, a relatively small (large noise) will result in more frequent updates. As large noises can cause sDPANT to trigger an update early before enough data has been placed in the secure cache. As a result, a relatively small will lead to a correspondingly small * , which essentially produces smaller query errors. Additionally, when reaches a relatively large level, its e ect on sDPANT's update frequency becomes less signi cant. Increasing does not a ect * much, but causes a decrease in the amount of deferred data (bouned by ( 16 log ) as shown in Theorem 6). This explains why there is a decreasing trend of sDPANT's L1 error after reaches a relatively large level. Nevertheless, both protocols show a privacy-accuracy trade-o , meaning that users can actually adjust privacy parameters to achieve their desired accuracy goals. Observation 4. DP protocols have similar privacy-e ciency trade-o . Both DP protocols show similar trends in terms of efciency metrics (Figure <ref type="figure">5b</ref> and 5d), that is when increases, the QET decreases. It is because with a relatively large , the number of dummy data included in the view will be reduced, thus resulting in a subsequent improvement in query e ciency. As such, DP protocols also allow users to tune the privacy parameter in order to obtain their desired performance goals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3">Comparison Between DP Protocols</head><p>We address Question-3 by comparing the two DP protocols over di erent type of workloads. In addition to the standard one, for each dataset, we create two additional datasets. First, we sample data from the original data and create a Sparse one, where the total number of view entries is 10% of the standard one. Second, we process Burst data by adding data points to the original dataset, where the resulting data has 2&#215; more view entries. Observation 5. sDPTimer and sDPANT show accuracy advantages in processing Sparse and Burst data, respectively. According to Figure <ref type="figure">6</ref>, sDPTimer shows a relatively lower L1 error in the Sparse group than sDPANT. It is because it can take a relatively long time to have a new view entry when processing Sparse data. Applying sDPANT will cause some data to be left in the secure cache for a relatively long time. However, sDPTimer's update schedule is independent of the data workload type, so when the load becomes very sparse, the data will still be synchronized on time. This explains why sDPTimer shows a better accuracy guarantee against sDPANT for sparse data. On the contrary, when the data becomes very dense, i.e., there is a burst workload, the xed update rate of sDPTimer causes a large amount of data to be stagnant in the secure cache. And thus causes signi cant degradation of the accuracy guarantee. However, sDPANT can adjust the update frequency according to the data type, i.e., the denser the data, the faster the update. This feature gives sDPANT an accuracy edge over sDPTimer when dealing with burst workloads. On the other hand, both methods show similar e ciency for all types of test datasets. </p><p>(c) TPC-ds, =</p><p>(d) CPDB, = 0.</p><p>(e) CPDB, =</p><p>(f) CPDB, = 10</p><p>Figure <ref type="figure">7</ref>: DP approaches under di erent workload. Additionally, we also compare the two protocols with varying non-privacy parameters, i.e. and , where we x the then change from 1-100, and correspondingly set according to (As mentioned before, the average new view entries per moment are 2.7 and 9.8 for TPC-ds and CPDB data, respectively, thus we set to 3 and 10 ). We test the protocols with three privacy levels = 0.1, 1 and 10 and report their comparison results in Figure <ref type="figure">7</ref>. Observation 6. When is small, two DP protocols have di erent biases in terms of accuracy and performance. According to Figure <ref type="figure">7a</ref> and<ref type="figure">7d</ref>, when = 0.1, the data points for the sDPANT locate in the upper left corner of both gures, while the sDPTimer results fall on the opposite side, in the lower right corner. This implies that when is relatively small (privacy level is high), sDPANT tends to favor accuracy guarantees more, but at the expense of a certain level of e ciency. On the contrary, sDPTimer biases the e ciency guarantee. As per this observation, if users have strong demands regarding privacy and accuracy, then they should adopt sDPANT. However, if they have restrictive requirements for both privacy and performance, then sDPTimer is a better option. Moreover, the aforementioned deviations decrease when increases (Figure <ref type="figure">7b</ref>). In addition, when reaches a relatively large value, i.e = 10, both DP protocols essentially o er the same level of accuracy and e ciency guarantees.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.4">Evaluation with Di erent</head><p>In this section, we investigate the e ect of truncation bounds by evaluating IncShrink under di erent values. Since the multiplicity of Q1 is 1, the for answering Q1 is xed to 1. Hence, in this evaluation, we focus on Q2 over the CPDB data. We pick di erent values from the range of 2 to 32 and set the contribution budget as = 2 . The result is reported in Figure <ref type="figure">8</ref>. As per Figure <ref type="figure">8</ref>, the average L1 error decreases when grows from = 2. This is because when is small, many true view entries are dropped by the Transform protocol due to truncation constraint, which leads to larger L1 query errors. However, when reaches a relatively large value, i.e., greater than the maximum record contribution, then no real entries are discarded. At this point, increasing only leads to the growth of injected DP noises. As we have analyzed before, the accuracy under sDPANT can be better for relatively large noise, but the accuracy metric will be worse under sDPTimer method. On the other hand, dropping a large number of real entries (when is small) leads to a smaller materialized view, which consequently improves query e ciency. When is greater than the maximum record contribution, based on our analysis in Observation 4, keep increasing leads to both methods to introduce more dummy data to the view and causes its size to keep growing. As such, the e ciency continues decreasing. Observation 8. The average Shrink execution time increases along with the growth of , while the average execution time of Transform tends to be approximately the same. The reason for this tendency is fairly straightforward. The execution time of both Transform and Shrink are dominated by the oblivious input sorting. The input size of Transform only relates to the size of data batches submitted by the users. Thus, changing does not a ect the e ciency of Transform execution. However, the input size of Shrink is tied to , so as grows, the execution time increases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.5">Scaling Experiments</head><p>We continue to evaluate our framework with scaling experiments. To generate data with di erent scales, we randomly sample or replicate the original TPC-ds and CPDB data (We assign new primary key values to the replicated rows to prevent con icts). According to Figure <ref type="figure">9</ref>, for the largest dataset, i.e., the 4&#215; groups, the total MPC time are around 24 and 6 hours, respectively for TPC-ds and CPDB. However, it is worth mentioning that for the 4&#215; group, TPC-ds has 8.8 million and 1.08 million records in the two testing tables, and CPDB has 800K and 2.6 million records for and tables, respectively. This shows the practical scalability of our framework. In addition, the total query time for 4&#215; TPC-ds and 4&#215; CPDB groups are within 400 and 630 seconds, respectively. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">EXTENSIONS</head><p>We discuss potential extensions of the original IncShrink design.</p><p>Connecting with DP-Sync. For ease of demonstration, in the prototype design, we assume that data owners submit a xed amount of data at xed intervals. However, IncShrink is not subject to this particular record synchronization strategy. Owners can choose other private update policies such as the ones proposed in DP-Sync, and can also adapt our framework. Additionally, the view update protocol requires no changes or recompilation as long as the view de nition does not change. On the other hand, privacy will still be ensured under the composed system that connects IncShrink with DP-Sync. For example, assume the owner adopts a record synchronization strategy that ensures 1 -DP and the server is deployed with IncShrink that guarantees 2 -DP with respect to the owner's data. By sequential composition theorem <ref type="bibr">[31]</ref>, revealing their combined leakage ensures ( 1 + 2 )-DP over the owner's data. Similarly, such composability can also be obtained in terms of the accuracy guarantee. For instance, let's denote the error bound for the selected record synchronization policy as (total number of records not uploaded in time). Then by Theorem 4 and 6, the combined system ensures error bounds ( + 2 &#8730; ) and ( + 16 log ) under sDPTimer and sDPANT protocol, respectively. Support for complex query workloads. We discuss how to generalize the view update protocol for complex query workloads, i.e. queries that can be written as a composite of multiple relational algebra operators. Apparently, one can directly replicate the design of this paper to support complex queries, by re-designing Transform protocol so that it computes the view tuples based on the speci ed query plan. However, there exists another design pattern that utilizes multi-level "Transform-and-Shrink" protocol. For example, we can disassemble a query into a series of operators and then construct an independent "Transform-and-Shrink" protocol for each individual operator. Moreover, the output of one "Transform-and-Shrink" protocol can be the input of another one, which eventually forms a multi-level view update protocol. There are certain bene ts of the multi-level design, for instance, one can optimize the system e ciency via operator level privacy allocation <ref type="bibr">[7]</ref>. Recall that in Section 5.2 we discussed that the choice of privacy budget a ects the number of dummy records processed by the system, with a higher proportion of dummy records reducing overall performance and vice versa. To maximize performance, one can construct an optimization problem that maximizes the e ciency of all operators in a given query, while maintaining the desired accuracy level. With a privacy budget allocation determined by the optimization problem, each operator can carry out its own instance of IncShrink, minimizing the overall computation cost while satisfying desired privacy and accuracy constraints. Note that optimization details are beyond the scope of this paper but may be of independent interest and we leave the design of these techniques to future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Expanding to multiple servers.</head><p>In what follows, we summarize the major modi cations that would extend our current design to a servers setup such that &#8805; 2. Firstly, the owners need to share their local data using the ( , )-secret-sharing scheme, and disseminate one share per participating server. In addition, for all outsourced objects, such as the secure cache, the materialized view, and parameters passed between view update protocols, must be stored on the servers in a secret shared manner. Secondly, both Transform and Shrink protocol will be compiled as a general MPC protocol where parties (servers) provide their con dential input and evaluate the protocol altogether. Finally, when generating DP noises, each server needs to contribute a random bit string to the MPC protocol, which subsequently aggregates the random strings to obtain the randomness used for noise generation. Note that our joint noise addition mechanism ensures to produce only one instance of DP noise, thus expanding to servers setting does not lead to injecting more noise. According to <ref type="bibr">[51,</ref><ref type="bibr">52]</ref>, such design can tolerate up to -1 server corruptions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">RELATED WORK</head><p>Secure outsourced database and leakage abuse attacks. There have been a series of e orts under the literature of secure outsourcing databases. Existing solutions utilize bucketization <ref type="bibr">[40]</ref><ref type="bibr">[41]</ref><ref type="bibr">[42]</ref>, predicate encryption <ref type="bibr">[59,</ref><ref type="bibr">75]</ref>, property and order preserving encryption <ref type="bibr">[2,</ref><ref type="bibr">9,</ref><ref type="bibr">12,</ref><ref type="bibr">13,</ref><ref type="bibr">66,</ref><ref type="bibr">68,</ref><ref type="bibr">69]</ref>, symmetric searchable encryption (SSE) <ref type="bibr">[3,</ref><ref type="bibr">11,</ref><ref type="bibr">20,</ref><ref type="bibr">26,</ref><ref type="bibr">35,</ref><ref type="bibr">45,</ref><ref type="bibr">46,</ref><ref type="bibr">48,</ref><ref type="bibr">50,</ref><ref type="bibr">67,</ref><ref type="bibr">78]</ref>, functional encryption <ref type="bibr">[15,</ref><ref type="bibr">74]</ref>, oblivious RAM <ref type="bibr">[6,</ref><ref type="bibr">24,</ref><ref type="bibr">28,</ref><ref type="bibr">43,</ref><ref type="bibr">65,</ref><ref type="bibr">89]</ref>, multi-party secure computation (MPC) <ref type="bibr">[6,</ref><ref type="bibr">7,</ref><ref type="bibr">14,</ref><ref type="bibr">79]</ref>, trusted execution environments (TEE) <ref type="bibr">[32,</ref><ref type="bibr">70,</ref><ref type="bibr">81,</ref><ref type="bibr">87]</ref> and homomorphic encryption <ref type="bibr">[16,</ref><ref type="bibr">22,</ref><ref type="bibr">34,</ref><ref type="bibr">72]</ref>. These designs di er in the types of supported queries and the provided security guarantees. Although the initial goal was to conceal the record values <ref type="bibr">[2,</ref><ref type="bibr">6,</ref><ref type="bibr">9,</ref><ref type="bibr">12,</ref><ref type="bibr">13,</ref><ref type="bibr">15,</ref><ref type="bibr">40,</ref><ref type="bibr">42,</ref><ref type="bibr">59,</ref><ref type="bibr">66,</ref><ref type="bibr">69,</ref><ref type="bibr">69,</ref><ref type="bibr">75]</ref>, researchers soon discovered the shortcomings of this security assurance. Recent works have revealed that these methods may be subject to certain leakage through query patterns <ref type="bibr">[84,</ref><ref type="bibr">88]</ref>, access patterns <ref type="bibr">[27,</ref><ref type="bibr">49]</ref> and query response volume <ref type="bibr">[37]</ref><ref type="bibr">[38]</ref><ref type="bibr">[39]</ref><ref type="bibr">49]</ref>, which makes them vulnerable to leakage-abuse attacks <ref type="bibr">[10,</ref><ref type="bibr">19]</ref>. Thus, more recent works on secure outsourced databases not only consider concealing record values but also hiding associated leakages <ref type="bibr">[3,</ref><ref type="bibr">6,</ref><ref type="bibr">7,</ref><ref type="bibr">11,</ref><ref type="bibr">20,</ref><ref type="bibr">24,</ref><ref type="bibr">28,</ref><ref type="bibr">32,</ref><ref type="bibr">35,</ref><ref type="bibr">43,</ref><ref type="bibr">45,</ref><ref type="bibr">46,</ref><ref type="bibr">50,</ref><ref type="bibr">65,</ref><ref type="bibr">67,</ref><ref type="bibr">78,</ref><ref type="bibr">87,</ref><ref type="bibr">89]</ref>. Unfortunately, few of the aforementioned e orts consider the potential leakage when data is dynamic <ref type="bibr">[3,</ref><ref type="bibr">20,</ref><ref type="bibr">35,</ref><ref type="bibr">50]</ref>. Wang et al. <ref type="bibr">[83]</ref> formalize a general leakage named update pattern that may a ect many existing secure databases when outsourcing dynamic data.</p><p>Di erentially-private leakage. Existing studies on hiding database leakage with DP can be divided into two main categories: (i) safeguarding the query results from revealing sensitive information <ref type="bibr">[1,</ref><ref type="bibr">22,</ref><ref type="bibr">25,</ref><ref type="bibr">56,</ref><ref type="bibr">60]</ref>, and (ii) obscuring side-channel leakages such as access pattern <ref type="bibr">[7,</ref><ref type="bibr">21,</ref><ref type="bibr">50,</ref><ref type="bibr">61,</ref><ref type="bibr">73,</ref><ref type="bibr">82]</ref>, query volume <ref type="bibr">[11,</ref><ref type="bibr">67]</ref> and update patterns <ref type="bibr">[83]</ref>. The rst category consists of works that enable DP query answering over securely provisioned (and potentially dynamic) data. Since these e orts typically focus solely on query outputs, side-channel leakages are not considered or assumed to be eliminable by existing techniques. Works in the second group focus on hiding side-channel information with DP, which is pertinent to our study. Among those, <ref type="bibr">[7]</ref> and <ref type="bibr">[83]</ref> are the two most relevant works to our study. <ref type="bibr">[7]</ref> extends the work of <ref type="bibr">[6]</ref>, both of which use MPC as the main tool to architect secure outsourced databases. However, <ref type="bibr">[6]</ref> fails to address some important leakages associated with intermediate computation results (i.e., the size of some intermediate outputs may leak sensitive information about the underlying data). Thus, <ref type="bibr">[7]</ref> is proposed to ll this gap. <ref type="bibr">[7]</ref> implements a similar resizing technique as IncShrink that ensures the volume leakage per secure operator is bounded by di erential privacy, however, their system is restrictively focused on processing static data. <ref type="bibr">[83]</ref> considers hiding update patterns when outsourcing growing data with private update strategies. However, they mandate that the update strategies must be enforced by trusted entities, while IncShrink allows untrusted servers to privately synchronize the materialized view. Additionally, <ref type="bibr">[83]</ref> considers the standard mode that processes queries directly over outsourced data, which inevitably incurs additional performance overhead. Interested readers may refer to Sections 5.1 and 5.2, where we provide more in-depth comparisons between IncShrink and <ref type="bibr">[7,</ref><ref type="bibr">83]</ref>, and highlight our technical contributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bounding privacy loss.</head><p>There is a series of work investigating approaches to constrain the privacy loss of queries or transformations with unbounded stability <ref type="bibr">[44,</ref><ref type="bibr">54,</ref><ref type="bibr">55,</ref><ref type="bibr">62,</ref><ref type="bibr">80,</ref><ref type="bibr">85]</ref>. However these works are conducted under the scope of standard databases rather than secure outsourced databases. Moreover, most of the works consider to bound the privacy loss of a single query or one-time transformation <ref type="bibr">[44,</ref><ref type="bibr">62,</ref><ref type="bibr">80,</ref><ref type="bibr">85]</ref>. In this work, we consider constraining the privacy loss of a composed transformation, which may contain an in nite number of sub-transformations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10">CONCLUSION</head><p>In this paper, we present a framework IncShrink for outsourcing growing data on untrusted servers while retaining the query functionalities over the outsourced data. IncShrink not only supports an e cient view-based query answering paradigm but also ensures bounded leakage in the maintenance of materialized view. This is achieved by (i) utilizing incremental MPC and di erential privacy to architect the view update protocol and (ii) imposing constraints on record contributions to the transformation of view instances.</p></div></body>
		</text>
</TEI>
