- Award ID(s):
- 2055358
- NSF-PAR ID:
- 10323377
- Editor(s):
- Nissim, K.; Waters, B.
- Date Published:
- Journal Name:
- 19th Theory of Cryptography Conference (TCC)
- Volume:
- 13044
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Error-correcting codes that admit {\em local} decoding and correcting algorithms have been the focus of much recent research due to their numerous theoretical and practical applications. An important goal is to obtain the best possible tradeoffs between the number of queries the algorithm makes to its oracle (the {\em locality} of the task), and the amount of redundancy in the encoding (the {\em information rate}). In Hamming's classical adversarial channel model, the current tradeoffs are dramatic, allowing either small locality, but superpolynomial blocklength, or small blocklength, but high locality. However, in the computationally bounded, adversarial channel model, proposed by Lipton (STACS 1994), constructions of locally decodable codes suddenly exhibit small locality and small blocklength, but these constructions require strong trusted setup assumptions e.g., Ostrovsky, Pandey and Sahai (ICALP 2007) construct private locally decodable codes in the setting where the sender and receiver already share a symmetric key. We study variants of locally decodable and locally correctable codes in computationally bounded, adversarial channels, in a setting with no public-key or private-key cryptographic setup. The only setup assumption we require is the selection of the {\em public} parameters (seed) for a collision-resistant hash function. Specifically, we provide constructions of {\em relaxed locally correctable} and {\em relaxed locally decodable codes} over the binary alphabet, with constant information rate, and poly-logarithmic locality. Our constructions, which compare favorably with their classical analogues in the computationally unbounded Hamming channel, crucially employ {\em collision-resistant hash functions} and {\em local expander graphs}, extending ideas from recent cryptographic constructions of memory-hard functions.more » « less
-
Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the
binder.sqlite
file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in theevents
table, which has the following schema.CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
version
indicates the version of the record as assigned by Binder. Theorigin
field became available with version 3, and theref
field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.timestamp
is the ISO timestamp of the launchprovider
gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.spec
gives the particular branch/release/commit being built. It consists of<github-id>/<repo>/<branch>
.origin
indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.ref
specifies the git commit that was actually used, rather than the named branch referenced byspec
. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.- For records where
ref
is not available, we attempted to clone the named reference given byspec
rather than the specific commit (see below). Theguessed_ref
field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-nullref
) will exclude these guessed commits. May be null.
The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both
ref
andguessed_ref
in theevents
table). The schema is as follows.CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );
Here
ref
corresponds toref
and/orguessed_ref
from theevents
table. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.runtime
:runtime.txt
apt
:apt.txt
conda
:environment.yml
pip
:requirements.txt
pipfile
:Pipfile.lock
orPipfile
julia
:Project.toml
orREQUIRE
r
:install.R
nix
:default.nix
docker
:Dockerfile
setup
:setup.py
postbuild
:postBuild
start
:start
The
ls
field gives a metadata listing of the repo contents (excluding the.git
directory). This field is JSON encoded with the following structure based on JSON types:- Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
- String: symlink. The string value gives the link target.
- Number: regular file. The number value gives the file size in bytes.
CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );
The
clean_specs
table provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using theconda.models.match_spec.MatchSpec
class included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.The
missing
table gives the repos that were not accessible, andevent_logs
records which log files have already been added. These tables are used for updating the dataset and should not be of interest to users. -
In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of communication with contextual uncertainty, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function f and an input string x, and Bob is given a function g and an input string y. The pair (x,y) comes from a known distribution mu and f and g are guaranteed to be close under this distribution. Alice and Bob wish to compute g(x,y) with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with one-way communication complexity k in the standard model (i.e., without uncertainty, in other words, under the promise that f=g) has public-coin communication at most O(k(1+I)) bits in the uncertain case, where I is the mutual information between x and y. Moreover, a lower bound of Omega(sqrt{I}) bits on the public-coin uncertain communication was also shown. However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are public-coin protocols in overcoming uncertainty? Motivated by these two questions: - We prove the first separation between private-coin uncertain communication and public-coin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the public-coin uncertain communication are O(1) while the private-coin uncertain communication is a growing function of n (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of public-coin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between public-coin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob imperfectly share a sequence of random bits (a setup weaker than public randomness), then achieving a constant blow-up in communication is still possible. - We improve the lower-bound of the previous work on public-coin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information I approx n) for which the one-way certain communication is k bits but the one-way public-coin uncertain communication is at least Omega(sqrt{k}*sqrt{I}) bits. Our proofs introduce new problems in the standard communication complexity model and prove lower bounds for these problems. Both the problems and the lower bound techniques may be of general interest.more » « less
-
We present gOTzilla, a protocol for interactive zero-knowledge proofs for very large disjunctive statements of the following format: given publicly known circuit C, and set of values Y = {y1 , . . . , yn }, prove knowledge of a witness x such that C(x) = y1 ∨ C(x) = y2 ∨ · · · ∨ C(x) = yn . These type of statements are extremely important for the proof of assets (PoA) problem in cryptocurrencies where a prover wants to prove the knowledge of a secret key sk that associates with the hash of a public key H(pk) posted on the ledger. We note that the size of n in popular cryptocurrencies, such as Bitcoin, is estimated to 80 million. For the construction of gOTzilla, we start by observing that if we restructure the proof statement to an equivalent of proving knowledge of (x, y) such that (C(x) = y) ∧ (y = y1 ∨ · · · ∨ y = yn )), then we can reduce the disjunction of equalities to 1-out-of-N oblivious transfer (OT). Our overall protocol is based on the MPC in the head (MPCitH) paradigm. We additionally provide a concrete, efficient extension of our protocol for the case where C combines algebraic and non-algebraic statements (which is the case in the PoA application). We achieve an asymptotic communication cost of O(log n) plus the proof size of the underlying MPCitH protocol. While related work has similar asymptotic complexity, our approach results in concrete performance improvements. We implement our protocol and provide benchmarks. Concretely, for a set of size 1 million entries, the total run-time of our protocol is 14.89 seconds using 48 threads, with 6.18 MB total communication, which is about 4x faster compared to the state of the art when considering a disjunctive statement with algebraic and non-algebraic elements.more » « less
-
Abstract We obtain new quantitative estimates on Weyl Law remainders under dynamical assumptions on the geodesic flow. On a smooth compact Riemannian manifold ( M , g ) of dimension n , let $$\Pi _\lambda $$ Π λ denote the kernel of the spectral projector for the Laplacian, $$\mathbb {1}_{[0,\lambda ^2]}(-\Delta _g)$$ 1 [ 0 , λ 2 ] ( - Δ g ) . Assuming only that the set of near periodic geodesics over $${W}\subset M$$ W ⊂ M has small measure, we prove that as $$\lambda \rightarrow \infty $$ λ → ∞ $$\begin{aligned} \int _{{W}} \Pi _\lambda (x,x)dx=(2\pi )^{-n}{{\,\textrm{vol}\,}}_{_{{\mathbb {R}}^n}}\!(B){{\,\textrm{vol}\,}}_g({W})\,\lambda ^n+O\Big (\frac{\lambda ^{n-1}}{\log \lambda }\Big ), \end{aligned}$$ ∫ W Π λ ( x , x ) d x = ( 2 π ) - n vol R n ( B ) vol g ( W ) λ n + O ( λ n - 1 log λ ) , where B is the unit ball. One consequence of this result is that the improved remainder holds on all product manifolds, in particular giving improved estimates for the eigenvalue counting function in the product setup. Our results also include logarithmic gains on asymptotics for the off-diagonal spectral projector $$\Pi _\lambda (x,y)$$ Π λ ( x , y ) under the assumption that the set of geodesics that pass near both x and y has small measure, and quantitative improvements for Kuznecov sums under non-looping type assumptions. The key technique used in our study of the spectral projector is that of geodesic beams.more » « less