skip to main content


Title: Stronger Calibration Lower Bounds via Sidestepping
We consider an online binary prediction setting where a forecaster observes a sequence of T bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is 1. The forecaster is called well-calibrated if for each p in [0,1], among the n_p bits for which the forecaster predicts probability p, the actual number of ones, m_p, is indeed equal to p*n_p. The calibration error, defined as \sum_p |m_p - p n_p|, quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an O(T^(2/3)) calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an sqrt(T) bound that follows from the trivial example of independent fair coin flips. In this paper, we prove an T^(0.528) bound on the calibration error, which is the first bound above the trivial sqrt(T) lowerbound for this setting. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termed the Sign-Preservation game, which may be of independent interest. This game has a much smaller state space than the full prediction setting and allows simpler analyses. The T^0.528 lower bound follows from a general reduction theorem that translates lower bounds on the game value of Sign-Preservation into lower bounds on the calibration error.  more » « less
Award ID(s):
1813049 1704417
NSF-PAR ID:
10275104
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Annual ACM Symposium on Theory of Computing
ISSN:
0737-8017
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, …, Ps, the i-th of which holds a subset A(i) x = b(i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ϵ ℝd for which A(i) x = b(i) for i = 1, …, s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is (d2 L + sd) and (sd2L), respectively. We obtain similar results for the blackboard communication model. As a result of independent interest, we show the probability a random matrix with integer entries in {–2L, …, 2L} is invertible is 1–2−Θ(dL), whereas previously only 1 – 2−Θ(d) was known. When there is no solution to the linear system, a natural alternative is to find the solution minimizing the ℓp loss, which is the ℓp regression problem. While this problem has been studied, we give improved upper or lower bounds for every value of p ≥ 1. One takeaway message is that sampling and sketching techniques, which are commonly used in earlier work on distributed optimization, are neither optimal in the dependence on d nor on the dependence on the approximation ε, thus motivating new techniques from optimization to solve these problems. Towards this end, we consider the communication complexity of optimization tasks which generalize linear systems, such as linear, semi-definite, and convex programming. For linear programming, we first resolve the communication complexity when d is constant, showing it is (sL) in the point-to-point model. For general d and in the point-to-point model, we show an Õ(sd3L) upper bound and an (d2 L + sd) lower bound. In fact, we show if one perturbs the coefficients randomly by numbers as small as 2−Θ(L), then the upper bound is Õ(sd2L) + poly(dL), and so this bound holds for almost all linear programs. Our study motivates understanding the bit complexity of linear programming, which is related to the running time in the unit cost RAM model with words of O(log(nd)) bits, and we give the fastest known algorithms for linear programming in this model. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975994.106 
    more » « less
  2. In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of communication with contextual uncertainty, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function f and an input string x, and Bob is given a function g and an input string y. The pair (x,y) comes from a known distribution mu and f and g are guaranteed to be close under this distribution. Alice and Bob wish to compute g(x,y) with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with one-way communication complexity k in the standard model (i.e., without uncertainty, in other words, under the promise that f=g) has public-coin communication at most O(k(1+I)) bits in the uncertain case, where I is the mutual information between x and y. Moreover, a lower bound of Omega(sqrt{I}) bits on the public-coin uncertain communication was also shown. However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are public-coin protocols in overcoming uncertainty? Motivated by these two questions: - We prove the first separation between private-coin uncertain communication and public-coin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the public-coin uncertain communication are O(1) while the private-coin uncertain communication is a growing function of n (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of public-coin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between public-coin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob imperfectly share a sequence of random bits (a setup weaker than public randomness), then achieving a constant blow-up in communication is still possible. - We improve the lower-bound of the previous work on public-coin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information I approx n) for which the one-way certain communication is k bits but the one-way public-coin uncertain communication is at least Omega(sqrt{k}*sqrt{I}) bits. Our proofs introduce new problems in the standard communication complexity model and prove lower bounds for these problems. Both the problems and the lower bound techniques may be of general interest. 
    more » « less
  3. The epsilon-approximate degree, deg_epsilon(f), of a Boolean function f is the least degree of a real-valued polynomial that approximates f pointwise to within epsilon. A sound and complete certificate for approximate degree being at least k is a pair of probability distributions, also known as a dual polynomial, that are perfectly k-wise indistinguishable, but are distinguishable by f with advantage 1 - epsilon. Our contributions are: - We give a simple, explicit new construction of a dual polynomial for the AND function on n bits, certifying that its epsilon-approximate degree is Omega (sqrt{n log 1/epsilon}). This construction is the first to extend to the notion of weighted degree, and yields the first explicit certificate that the 1/3-approximate degree of any (possibly unbalanced) read-once DNF is Omega(sqrt{n}). It draws a novel connection between the approximate degree of AND and anti-concentration of the Binomial distribution. - We show that any pair of symmetric distributions on n-bit strings that are perfectly k-wise indistinguishable are also statistically K-wise indistinguishable with at most K^{3/2} * exp (-Omega (k^2/K)) error for all k < K <= n/64. This bound is essentially tight, and implies that any symmetric function f is a reconstruction function with constant advantage for a ramp secret sharing scheme that is secure against size-K coalitions with statistical error K^{3/2} * exp (-Omega (deg_{1/3}(f)^2/K)) for all values of K up to n/64 simultaneously. Previous secret sharing schemes required that K be determined in advance, and only worked for f=AND. Our analysis draws another new connection between approximate degree and concentration phenomena. As a corollary of this result, we show that for any d <= n/64, any degree d polynomial approximating a symmetric function f to error 1/3 must have coefficients of l_1-norm at least K^{-3/2} * exp ({Omega (deg_{1/3}(f)^2/d)}). We also show this bound is essentially tight for any d > deg_{1/3}(f). These upper and lower bounds were also previously only known in the case f=AND. 
    more » « less
  4. Mikolaj Bojanczyk ; Emanuela Merelli ; David P. Woodruff (Ed.)
    Two equal length strings are a parameterized match (p-match) iff there exists a one-to-one function that renames the symbols in one string to those in the other. The Parameterized Suffix Tree (PST) [Baker, STOC' 93] is a fundamental data structure that handles various string matching problems under this setting. The PST of a text T[1,n] over an alphabet Σ of size σ takes O(nlog n) bits of space. It can report any entry in (parameterized) (i) suffix array, (ii) inverse suffix array, and (iii) longest common prefix (LCP) array in O(1) time. Given any pattern P as a query, a position i in T is an occurrence iff T[i,i+|P|-1] and P are a p-match. The PST can count the number of occurrences of P in T in time O(|P|log σ) and then report each occurrence in time proportional to that of accessing a suffix array entry. An important question is, can we obtain a compressed version of PST that takes space close to the text’s size of nlogσ bits and still support all three functionalities mentioned earlier? In SODA' 17, Ganguly et al. answered this question partially by presenting an O(nlogσ) bit index that can support (parameterized) suffix array and inverse suffix array operations in O(log n) time. However, the compression of the (parameterized) LCP array and the possibility of faster suffix array and inverse suffix array queries in compact space were left open. In this work, we obtain a compact representation of the (parameterized) LCP array. With this result, in conjunction with three new (parameterized) suffix array representations, we obtain the first set of PST representations in o(nlog n) bits (when logσ = o(log n)) as follows. Here ε > 0 is an arbitrarily small constant. - Space O(n logσ) bits and query time O(log_σ^ε n); - Space O(n logσlog log_σ n) bits and query time O(log log_σ n); and - Space O(n logσ log^ε_σ n) bits and query time O(1). The first trade-off is an improvement over Ganguly et al.’s result, whereas our third trade-off matches the optimal time performance of Baker’s PST while squeezing the space by a factor roughly log_σ n. We highlight that our trade-offs match the space-and-time bounds of the best-known compressed text indexes for exact pattern matching and further improvement is highly unlikely. 
    more » « less
  5. Abstract

    We study the distribution over measurement outcomes of noisy random quantum circuits in the regime of low fidelity, which corresponds to the setting where the computation experiences at least one gate-level error with probability close to one. We model noise by adding a pair of weak, unital, single-qubit noise channels after each two-qubit gate, and we show that for typical random circuit instances, correlations between the noisy output distribution$$p_{\text {noisy}}$$pnoisyand the corresponding noiseless output distribution$$p_{\text {ideal}}$$pidealshrink exponentially with the expected number of gate-level errors. Specifically, the linear cross-entropy benchmarkFthat measures this correlation behaves as$$F=\text {exp}(-2s\epsilon \pm O(s\epsilon ^2))$$F=exp(-2sϵ±O(sϵ2)), where$$\epsilon $$ϵis the probability of error per circuit location andsis the number of two-qubit gates. Furthermore, if the noise is incoherent—for example, depolarizing or dephasing noise—the total variation distance between the noisy output distribution$$p_{\text {noisy}}$$pnoisyand the uniform distribution$$p_{\text {unif}}$$punifdecays at precisely the same rate. Consequently, the noisy output distribution can be approximated as$$p_{\text {noisy}}\approx Fp_{\text {ideal}}+ (1-F)p_{\text {unif}}$$pnoisyFpideal+(1-F)punif. In other words, although at least one local error occurs with probability$$1-F$$1-F, the errors are scrambled by the random quantum circuit and can be treated as global white noise, contributing completely uniform output. Importantly, we upper bound the average total variation error in this approximation by$$O(F\epsilon \sqrt{s})$$O(Fϵs). Thus, the “white-noise approximation” is meaningful when$$\epsilon \sqrt{s} \ll 1$$ϵs1, a quadratically weaker condition than the$$\epsilon s\ll 1$$ϵs1requirement to maintain high fidelity. The bound applies if the circuit size satisfies$$s \ge \Omega (n\log (n))$$sΩ(nlog(n)), which corresponds to onlylogarithmic depthcircuits, and if, additionally, the inverse error rate satisfies$$\epsilon ^{-1} \ge {\tilde{\Omega }}(n)$$ϵ-1Ω~(n), which is needed to ensure errors are scrambled faster thanFdecays. The white-noise approximation is useful for salvaging the signal from a noisy quantum computation; for example, it was an underlying assumption in complexity-theoretic arguments that noisy random quantum circuits cannot be efficiently sampled classically, even when the fidelity is low. Our method is based on a map from second-moment quantities in random quantum circuits to expectation values of certain stochastic processes for which we compute upper and lower bounds.

     
    more » « less