skip to main content


Title: Coeus: A System for Oblivious Document Ranking and Retrieval
Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system.  more » « less
Award ID(s):
1815733 1703560
NSF-PAR ID:
10316017
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many relational data in our daily life are represented as graphs, making graph application an important workload. Because of the large scale of graph datasets, moving graph data to the cloud becomes a popular option. To keep the confidential and private graph secure from an untrusted cloud server, many cryptographic techniques are leveraged to hide the content of the data. However, protecting only the data content is not enough for a graph database. Because the structural information of the graph can be revealed through the database accessing track.

    In this work, we study the graph neural network (GNN), an important graph workload to mine information from a graph database. We find that the server is able to infer which node is processing during the edge retrieving phase and also learn its neighbor indices during GNN's aggregation phase. This leads to the leakage of the information of graph structure data. In this work, we present SPG, a structure-private graph database with SqueezePIR. Our SPG is built on top of Private Information Retrieval (PIR), which securely hides which nodes/neighbors are accessed. In addition, we propose SqueezePIR, a compression technique to overcome the computation overhead of PIR. Based on our evaluation, our SqueezePIR achieves 11.85× speedup on average with less than 2% accuracy loss when compared to the state-of-the-art FastPIR protocol.

     
    more » « less
  2. Abstract We study both the practical and theoretical efficiency of private information retrieval (PIR) protocols in a model wherein several untrusted servers work to obliviously service remote clients’ requests for data and yet no pair of servers colludes in a bid to violate said obliviousness. In exchange for such a strong security assumption, we obtain new PIR protocols exhibiting remarkable efficiency with respect to every cost metric—download, upload, computation, and round complexity—typically considered in the PIR literature. The new constructions extend a multiserver PIR protocol of Shah, Rashmi, and Ramchandran (ISIT 2014), which exhibits a remarkable property of its own: to fetch a b -bit record from a collection of r such records, the client need only download b + 1 bits total. We find that allowing “a bit more” download (and optionally introducing computational assumptions) yields a family of protocols offering very attractive trade-offs. In addition to Shah et al.’s protocol, this family includes as special cases (2-server instances of) the seminal protocol of Chor, Goldreich, Kushilevitz, and Sudan (FOCS 1995) and the recent DPF-based protocol of Boyle, Gilboa, and Ishai (CCS 2016). An implicit “folklore” axiom that dogmatically permeates the research literature on multiserver PIR posits that the latter protocols are the “most efficient” protocols possible in the perfectly and computationally private settings, respectively. Yet our findings soundly refute this supposed axiom: These special cases are (by far) the least performant representatives of our family, with essentially all other parameter settings yielding instances that are significantly faster. 
    more » « less
  3. We show that it is possible to achieve information theoretic location privacy for secondary users (SUs) in database-driven cognitive radio networks (CRNs) with an end-to-end delay less than a second, which is significantly better than that of the existing alternatives offering only a computational privacy. This is achieved based on a keen observation that, by the requirement of Federal Communications Commission (FCC), all certified spectrum databases synchronize their records. Hence, the same copy of spectrum database is available through multiple (distinct) providers. We harness the synergy between multi-server private information retrieval (PIR) and database-driven CRN architecture to offer an optimal level of privacy with high efficiency by exploiting this observation. We demonstrated, analytically and experimentally with deployments on actual cloud systems that, our adaptations of multi-server PIR outperform that of the (currently) fastest single-server PIR by a magnitude of times with information-theoretic security, collusion resiliency, and fault-tolerance features. Our analysis indicates that multi-server PIR is an ideal cryptographic tool to provide location privacy in database-driven CRNs, in which the requirement of replicated databases is a natural part of the system architecture, and therefore SUs can enjoy all advantages of multi-server PIR without any additional architectural and deployment costs. 
    more » « less
  4. Matthieu Bloch (Ed.)
    Motivated by an open problem and a conjecture, this work studies the problem of single server private information retrieval with private coded side information (PIR-PCSI) that was recently introduced by Heidarzadeh et al. The goal of PIR-PCSI is to allow a user to efficiently retrieve a desired message Wθ, which is one of K independent messages that are stored at a server, while utilizing private side information of a linear combination of a uniformly chosen size-M subset (S ⊂ [K]) of messages. The settings PIR-PCSI-I and PIR-PCSI-II correspond to the constraints that θ is generated uniformly from [K]\S, and S, respectively. In each case, (θ, S) must be kept private from the server. The capacity is defined as the supremum over message and field sizes, of achievable rates (number of bits of desired message retrieved per bit of download) and is characterized by Heidarzadeh et al. for PIR-PCSI-I in general, and for PIR- PCSI-II for M > (K + 1)/2 as (K − M + 1)−1. For 2 ≤ M ≤ (K + 1)/2 the capacity of PIR-PCSI-II remains open, and it is conjectured that even in this case the capacity is (K − M + 1)−1. We show the capacity of PIR-PCSI-II is equal to 2/K for 2 ≤ M ≤ K+1, which is strictly larger 2 than the conjectured value, and does not depend on M within this parameter regime. Remarkably, half the side-information is found to be redundant. We also characterize the infimum capacity (infimum over fields instead of supremum), and the capacity with private coefficients. The results are generalized to PIR-PCSI-I (θ ∈ [K] \ S) and PIR-PCSI (θ ∈ [K]) settings. 
    more » « less
  5. In the coded caching literature, the notion of privacy is considered only against demands. On the motivation that multi-round transmissions almost appear everywhere in real communication systems, this paper formulates the coded caching problem with private demands and caches. Only one existing private caching scheme, which is based on introducing virtual users, can preserve the privacy of demands and caches simultaneously, but at the cost of an extremely large subpacketization exponential in the product of the number of users (K) and files (N) in the system. In order to reduce the subpacketization while satisfying the privacy constraints, we propose a novel approach which constructs private coded caching schemes through private information retrieval (PIR). Based on this approach, we propose novel schemes with private demands and caches which have a subpacketization level in the order exponential with K instead of NK in the virtual user scheme. As a by-product, for the coded caching problem with private demands, a private coded caching scheme could be obtained from the proposed approach, which generally improves the memory-load tradeoff of the private coded caching scheme by Yan and Tuninetti. 
    more » « less