skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Encapsulated Search Index: Public-Key, Sub-linear, Distributed, and Delegatable
We build the first sub-linear (in fact, potentially constant-time) public-key searchable encryption system: server can publish a public key PK. anybody can build an encrypted index for document D under PK. client holding the index can obtain a token ๐‘ง๐‘ค from the server to check if a keyword w belongs to D. search using ๐‘ง๐‘ค is almost as fast (e.g., sub-linear) as the non-private search. server granting the token does not learn anything about the document D, beyond the keyword w. yet, the token ๐‘ง๐‘ค is specific to the pair (D, w): the client does not learn if other keywords ๐‘คโ€ฒโ‰ ๐‘ค belong to D, or if w belongs to other, freshly indexed documents ๐ทโ€ฒ. server cannot fool the client by giving a wrong token ๐‘ง๐‘ค. We call such a primitive Encapsulated Search Index (ESI). Our ESI scheme can be made (t, n)-distributed among n servers in the best possible way: non-interactive, verifiable, and resilient to any coalition of up to (๐‘กโˆ’1) malicious servers. We also introduce the notion of delegatable ESI and show how to extend our construction to this setting. Our solution โ€” including public indexing, sub-linear search, delegation, and distributed token generation โ€” is deployed as a commercial application by a real-world company.  more » « less
Award ID(s):
1925288
PAR ID:
10353480
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Hanaoka, Goichiro; Shikata, Junji; Watanabe, Yohei
Date Published:
Journal Name:
PKC 2022 - 25th IACR International Conference on Practice and Theory of Public-Key Cryptography
Volume:
13178
Page Range / eLocation ID:
256-285
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the "authentic" record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100 more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties. 
    more » « less
  2. Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system. 
    more » « less
  3. R-tree is a foundational data structure used in spatial databases and scientific databases. With the advancement of networks and computer architectures, in-memory data processing for R-tree in distributed systems has become a common platform. We have observed new performance challenges to process R-tree as the amount of multidimensional datasets become increasingly high. Specifically, an R-tree server can be heavily overloaded while the network and client CPU are lightly loaded, and vice versa. In this article, we present the design and implementation of Catfish, an RDMA-enabled R-tree for low latency and high throughput by adaptively utilizing the available network bandwidth and computing resources to balance the workloads between clients and servers. We design and implement two basic mechanisms of using RDMA for a client-server R-tree data processing system. First, in the fast messaging design, we use RDMA writes to send R-tree requests to the server and let server threads process R-tree requests to achieve low query latency. Second, in the RDMA offloading design, we use RDMA reads to offload tree traversal from the server to the client, which rescues the server as it is overloaded. We further develop an adaptive scheme to effectively switch an R-tree search between fast messaging and RDMA offloading, maximizing the overall performance. Our experiments show that the adaptive solution of Catfish on InfiniBand significantly outperforms R-tree that uses only fast messaging or only RDMA offloading in both latency and throughput. Catfish can also deliver up to one order of magnitude performance over the traditional schemes using TCP/IP on 1 and 40 Gbps Ethernet. We make a strong case to use RDMA to effectively balance workloads in distributed systems for low latency and high throughput. 
    more » « less
  4. R-tree is a foundational data structure used in spatial databases and scientific databases. With the advancement of Internet and computer architectures, in-memory data processing for R-tree in distributed systems has become a common platform. We have observed new performance challenges to process R-tree as the amount of multidimensional datasets become increasingly huge. Specifically, an R-tree server can be heavily overloaded while the network and client CPU are lightly loaded, and vice versa. In this paper, we present the design and implementation of Catfish, an RDMA enabled R-tree for low latency and high throughput by adaptively utilizing the available network bandwidth and computing resources to balance the workloads between clients and servers. We design and implement two basic mechanisms of using RDMA for the client-server R-tree. First, in the fast messaging design, we use RDMA writes to send R-tree requests to the server and let server threads process R-tree requests to achieve low query latency. Second, in the RDMA offloading design, we use RDMA reads to offload tree traversal from the server to the client, which rescues the server as it is overloaded. We further develop an adaptive scheme to effectively switch an R-tree search between fast messaging and RDMA offloading, maximizing the overall performance. Our experiments show that the adaptive solution of Catfish on InfiniBand significantly outperforms R-tree that uses only fast messaging or only RDMA offloading in both latency and throughput. Catfish can also deliver up to one order of magnitude performance over the traditional schemes using TCP/IP on 1 Gbps and 40 Gbps Ethernet. We make a strong case to use RDMA to effectively balance workloads in distributed systems for low latency and high throughput. 
    more » « less
  5. Keyword spotting (KWS) is a key technology in smart devices. However, privacy issues in these devices have been constantly raised. To solve this problem, this paper applies homomorphic encryption (HE) to a previous small-footprint convolutional neural network (CNN)-based KWS algorithm. This allows for a trustless system in which a command word can be securely identified by a remote cloud server without exposing client data. To alleviate the burden on an edge device of a client, a novel packing technique is proposed that reduces the number of ciphertexts for an input keyword to one. Our HE-based KWS shows a prediction accuracy of 72% for Google's Speech Commands Dataset with 12 labels. This is almost identical to the accuracy of the non-HE-based implementation that has the same CNN layers and approximates a rectified linear unit in the same manner. On a workstation, it takes 19 seconds to process one keyword on average, which can be improved in the future through parallelization, HE parameter optimization, and/or the use of custom hardware accelerators. 
    more » « less