skip to main content


Title: Amortizing Rate-1 OT and Applications to PIR and PSI
Recent new constructions of rate-1 OT [Döttling, Garg, Ishai, Malavolta, Mour, and Ostrovsky, CRYPTO 2019] have brought this primitive under the spotlight and the techniques have led to new feasibility results for private-information retrieval, and homomorphic encryption for branching programs. The receiver communication of this construction consists of a quadratic (in the sender's input size) number of group elements for a single instance of rate-1 OT. Recently [Garg, Hajiabadi, Ostrovsky, TCC 2020] improved the receiver communication to a linear number of group elements for a single string-OT. However, most applications of rate-1 OT require executing it multiple times, resulting in large communication costs for the receiver. In this work, we introduce a new technique for amortizing the cost of multiple rate-1 OTs. Specifically, based on standard pairing assumptions, we obtain a two-message rate-1 OT protocol for which the amortized cost per string-OT is asymptotically reduced to only four group elements. Our results lead to significant communication improvements in PSI and PIR, special cases of SFE for branching programs. - PIR: We obtain a rate-1 PIR scheme with client communication cost of $O(\lambda\cdot\log N)$ group elements for security parameter $\lambda$ and database size $N$. Notably, after a one-time setup (or one PIR instance), any following PIR instance only requires communication cost $O(\log N)$ number of group elements. - PSI with unbalanced inputs: We apply our techniques to private set intersection with unbalanced set sizes (where the receiver has a smaller set) and achieve receiver communication of $O((m+\lambda) \log N)$ group elements where $m, N$ are the sizes of the receiver and sender sets, respectively. Similarly, after a one-time setup (or one PSI instance), any following PSI instance only requires communication cost $O(m \cdot \log N)$ number of group elements. All previous sublinear-communication non-FHE based PSI protocols for the above unbalanced setting were also based on rate-1 OT, but incurred at least $O(\lambda^2 m \log N)$ group elements.  more » « less
Award ID(s):
2055358
NSF-PAR ID:
10323377
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Nissim, K.; Waters, B.
Date Published:
Journal Name:
19th Theory of Cryptography Conference (TCC)
Volume:
13044
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the binder.sqlite file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the events table, which has the following schema.

    CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
    • version indicates the version of the record as assigned by Binder. The origin field became available with version 3, and the ref field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.
    • timestamp is the ISO timestamp of the launch
    • provider gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.
    • spec gives the particular branch/release/commit being built. It consists of <github-id>/<repo>/<branch>.
    • origin indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.
    • ref specifies the git commit that was actually used, rather than the named branch referenced by spec. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.
    • For records where ref is not available, we attempted to clone the named reference given by spec rather than the specific commit (see below). The guessed_ref field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-null ref) will exclude these guessed commits. May be null.

    The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both ref and guessed_ref in the events table). The schema is as follows.

    CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );

    Here ref corresponds to ref and/or guessed_ref from the events table. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.

    • runtime: runtime.txt
    • apt: apt.txt
    • conda: environment.yml
    • pip: requirements.txt
    • pipfile: Pipfile.lock or Pipfile
    • julia: Project.toml or REQUIRE
    • r: install.R
    • nix: default.nix
    • docker: Dockerfile
    • setup: setup.py
    • postbuild: postBuild
    • start: start

    The ls field gives a metadata listing of the repo contents (excluding the .git directory). This field is JSON encoded with the following structure based on JSON types:

    • Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
    • String: symlink. The string value gives the link target.
    • Number: regular file. The number value gives the file size in bytes.
    CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );

    The clean_specs table provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using the conda.models.match_spec.MatchSpec class included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.

    The missing table gives the repos that were not accessible, and event_logs records which log files have already been added. These tables are used for updating the dataset and should not be of interest to users.

     
    more » « less
  2. Error-correcting codes that admit {\em local} decoding and correcting algorithms have been the focus of much recent research due to their numerous theoretical and practical applications. An important goal is to obtain the best possible tradeoffs between the number of queries the algorithm makes to its oracle (the {\em locality} of the task), and the amount of redundancy in the encoding (the {\em information rate}). In Hamming's classical adversarial channel model, the current tradeoffs are dramatic, allowing either small locality, but superpolynomial blocklength, or small blocklength, but high locality. However, in the computationally bounded, adversarial channel model, proposed by Lipton (STACS 1994), constructions of locally decodable codes suddenly exhibit small locality and small blocklength, but these constructions require strong trusted setup assumptions e.g., Ostrovsky, Pandey and Sahai (ICALP 2007) construct private locally decodable codes in the setting where the sender and receiver already share a symmetric key. We study variants of locally decodable and locally correctable codes in computationally bounded, adversarial channels, in a setting with no public-key or private-key cryptographic setup. The only setup assumption we require is the selection of the {\em public} parameters (seed) for a collision-resistant hash function. Specifically, we provide constructions of {\em relaxed locally correctable} and {\em relaxed locally decodable codes} over the binary alphabet, with constant information rate, and poly-logarithmic locality. Our constructions, which compare favorably with their classical analogues in the computationally unbounded Hamming channel, crucially employ {\em collision-resistant hash functions} and {\em local expander graphs}, extending ideas from recent cryptographic constructions of memory-hard functions. 
    more » « less
  3. In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of communication with contextual uncertainty, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function f and an input string x, and Bob is given a function g and an input string y. The pair (x,y) comes from a known distribution mu and f and g are guaranteed to be close under this distribution. Alice and Bob wish to compute g(x,y) with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with one-way communication complexity k in the standard model (i.e., without uncertainty, in other words, under the promise that f=g) has public-coin communication at most O(k(1+I)) bits in the uncertain case, where I is the mutual information between x and y. Moreover, a lower bound of Omega(sqrt{I}) bits on the public-coin uncertain communication was also shown. However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are public-coin protocols in overcoming uncertainty? Motivated by these two questions: - We prove the first separation between private-coin uncertain communication and public-coin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the public-coin uncertain communication are O(1) while the private-coin uncertain communication is a growing function of n (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of public-coin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between public-coin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob imperfectly share a sequence of random bits (a setup weaker than public randomness), then achieving a constant blow-up in communication is still possible. - We improve the lower-bound of the previous work on public-coin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information I approx n) for which the one-way certain communication is k bits but the one-way public-coin uncertain communication is at least Omega(sqrt{k}*sqrt{I}) bits. Our proofs introduce new problems in the standard communication complexity model and prove lower bounds for these problems. Both the problems and the lower bound techniques may be of general interest. 
    more » « less
  4. We present gOTzilla, a protocol for interactive zero-knowledge proofs for very large disjunctive statements of the following format: given publicly known circuit C, and set of values Y = {y1 , . . . , yn }, prove knowledge of a witness x such that C(x) = y1 ∨ C(x) = y2 ∨ · · · ∨ C(x) = yn . These type of statements are extremely important for the proof of assets (PoA) problem in cryptocurrencies where a prover wants to prove the knowledge of a secret key sk that associates with the hash of a public key H(pk) posted on the ledger. We note that the size of n in popular cryptocurrencies, such as Bitcoin, is estimated to 80 million. For the construction of gOTzilla, we start by observing that if we restructure the proof statement to an equivalent of proving knowledge of (x, y) such that (C(x) = y) ∧ (y = y1 ∨ · · · ∨ y = yn )), then we can reduce the disjunction of equalities to 1-out-of-N oblivious transfer (OT). Our overall protocol is based on the MPC in the head (MPCitH) paradigm. We additionally provide a concrete, efficient extension of our protocol for the case where C combines algebraic and non-algebraic statements (which is the case in the PoA application). We achieve an asymptotic communication cost of O(log n) plus the proof size of the underlying MPCitH protocol. While related work has similar asymptotic complexity, our approach results in concrete performance improvements. We implement our protocol and provide benchmarks. Concretely, for a set of size 1 million entries, the total run-time of our protocol is 14.89 seconds using 48 threads, with 6.18 MB total communication, which is about 4x faster compared to the state of the art when considering a disjunctive statement with algebraic and non-algebraic elements. 
    more » « less
  5. Abstract We obtain new quantitative estimates on Weyl Law remainders under dynamical assumptions on the geodesic flow. On a smooth compact Riemannian manifold ( M ,  g ) of dimension n , let $$\Pi _\lambda $$ Π λ denote the kernel of the spectral projector for the Laplacian, $$\mathbb {1}_{[0,\lambda ^2]}(-\Delta _g)$$ 1 [ 0 , λ 2 ] ( - Δ g ) . Assuming only that the set of near periodic geodesics over $${W}\subset M$$ W ⊂ M has small measure, we prove that as $$\lambda \rightarrow \infty $$ λ → ∞ $$\begin{aligned} \int _{{W}} \Pi _\lambda (x,x)dx=(2\pi )^{-n}{{\,\textrm{vol}\,}}_{_{{\mathbb {R}}^n}}\!(B){{\,\textrm{vol}\,}}_g({W})\,\lambda ^n+O\Big (\frac{\lambda ^{n-1}}{\log \lambda }\Big ), \end{aligned}$$ ∫ W Π λ ( x , x ) d x = ( 2 π ) - n vol R n ( B ) vol g ( W ) λ n + O ( λ n - 1 log λ ) , where B is the unit ball. One consequence of this result is that the improved remainder holds on all product manifolds, in particular giving improved estimates for the eigenvalue counting function in the product setup. Our results also include logarithmic gains on asymptotics for the off-diagonal spectral projector $$\Pi _\lambda (x,y)$$ Π λ ( x , y ) under the assumption that the set of geodesics that pass near both x and y has small measure, and quantitative improvements for Kuznecov sums under non-looping type assumptions. The key technique used in our study of the spectral projector is that of geodesic beams. 
    more » « less