skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Stronger Calibration Lower Bounds via Sidestepping
We consider an online binary prediction setting where a forecaster observes a sequence of T bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is 1. The forecaster is called well-calibrated if for each p in [0,1], among the n_p bits for which the forecaster predicts probability p, the actual number of ones, m_p, is indeed equal to p*n_p. The calibration error, defined as \sum_p |m_p - p n_p|, quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an O(T^(2/3)) calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an sqrt(T) bound that follows from the trivial example of independent fair coin flips. In this paper, we prove an T^(0.528) bound on the calibration error, which is the first bound above the trivial sqrt(T) lowerbound for this setting. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termed the Sign-Preservation game, which may be of independent interest. This game has a much smaller state space than the full prediction setting and allows simpler analyses. The T^0.528 lower bound follows from a general reduction theorem that translates lower bounds on the game value of Sign-Preservation into lower bounds on the calibration error.  more » « less
Award ID(s):
1813049 1704417
PAR ID:
10275104
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Annual ACM Symposium on Theory of Computing
ISSN:
0737-8017
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an ϵ-calibration error rate of O(1/sqrt T), which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be Ω(1). Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster ϵ-calibration rate of O(1/T) can be achieved even without deploying any randomization. 
    more » « less
  2. Santhanam, Rahul (Ed.)
    Depth-3 circuit lower bounds and k-SAT algorithms are intimately related; the state-of-the-art Σ^k_3-circuit lower bound (Or-And-Or circuits with bottom fan-in at most k) and the k-SAT algorithm of Paturi, Pudlák, Saks, and Zane (J. ACM'05) are based on the same combinatorial theorem regarding k-CNFs. In this paper we define a problem which reveals new interactions between the two, and suggests a concrete approach to significantly stronger circuit lower bounds and improved k-SAT algorithms. For a natural number k and a parameter t, we consider the Enum(k, t) problem defined as follows: given an n-variable k-CNF and an initial assignment α, output all satisfying assignments at Hamming distance t(n) of α, assuming that there are no satisfying assignments of Hamming distance less than t(n) of α. We observe that an upper bound b(n, k, t) on the complexity of Enum(k, t) simultaneously implies depth-3 circuit lower bounds and k-SAT algorithms: - Depth-3 circuits: Any Σ^k_3 circuit computing the Majority function has size at least binom(n,n/2)/b(n, k, n/2). - k-SAT: There exists an algorithm solving k-SAT in time O(∑_{t=1}^{n/2}b(n, k, t)). A simple construction shows that b(n, k, n/2) ≥ 2^{(1 - O(log(k)/k))n}. Thus, matching upper bounds for b(n, k, n/2) would imply a Σ^k_3-circuit lower bound of 2^Ω(log(k)n/k) and a k-SAT upper bound of 2^{(1 - Ω(log(k)/k))n}. The former yields an unrestricted depth-3 lower bound of 2^ω(√n) solving a long standing open problem, and the latter breaks the Super Strong Exponential Time Hypothesis. In this paper, we propose a randomized algorithm for Enum(k, t) and introduce new ideas to analyze it. We demonstrate the power of our ideas by considering the first non-trivial instance of the problem, i.e., Enum(3, n/2). We show that the expected running time of our algorithm is 1.598ⁿ, substantially improving on the trivial bound of 3^{n/2} ≃ 1.732ⁿ. This already improves Σ^3_3 lower bounds for Majority function to 1.251ⁿ. The previous bound was 1.154ⁿ which follows from the work of Håstad, Jukna, and Pudlák (Comput. Complex.'95). By restricting ourselves to monotone CNFs, Enum(k, t) immediately becomes a hypergraph Turán problem. Therefore our techniques might be of independent interest in extremal combinatorics. 
    more » « less
  3. Calibration measures quantify how much a forecaster’s predictions violate calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, StepCEsub, that is both decision-theoretic and truthful. In particular, on any product distribution, StepCEsub is truthful up to an O(1) factor whereas prior decision-theoretic calibration measures suffer from an e−Ω(T)–Ω(T−−√) truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude c>0, StepCEsub is truthful up to an O(log(1/c)−−−−−−−√) factor, while prior decision-theoretic measures have an e−Ω(T)–Ω(T1/3) truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting. 
    more » « less
  4. We study calibration measures in a sequential prediction setup. In addition to rewarding accurate predictions (completeness) and penalizing incorrect ones (soundness), an important desideratum of calibration measures is truthfulness, a minimal condition for the forecaster not to be incentivized to exploit the system. Formally, a calibration measure is truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. We conduct a taxonomy of existing calibration measures. Perhaps surprisingly, all of them are far from being truthful. We introduce a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE), which is complete and sound, and under which truthful prediction is optimal up to a constant multiplicative factor. In contrast, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. 
    more » « less
  5. We study calibration measures in a sequential prediction setup. In addition to rewarding accurate predictions (completeness) and penalizing incorrect ones (soundness), an important desideratum of calibration measures is truthfulness, a minimal condition for the forecaster not to be incentivized to exploit the system. Formally, a calibration measure is truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. We conduct a taxonomy of existing calibration measures. Perhaps surprisingly, all of them are far from being truthful. We introduce a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE), which is complete and sound, and under which truthful prediction is optimal up to a constant multiplicative factor. In contrast, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. 
    more » « less