skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: Faster online calibration without randomization: interval forecasts and the power of two choices
We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an ϵ-calibration error rate of O(1/sqrt T), which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be Ω(1). Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster ϵ-calibration rate of O(1/T) can be achieved even without deploying any randomization.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
35th Annual Conference on Learning Theory
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental USA. WNV occurrence has high spatiotemporal variation, and current approaches to targeted control of the virus are limited, making forecasting a public health priority. However, little research has been done to compare strengths and weaknesses of WNV disease forecasting approaches on the national scale. We used forecasts submitted to the 2020 WNV Forecasting Challenge, an open challenge organized by the Centers for Disease Control and Prevention, to assess the status of WNV neuroinvasive disease (WNND) prediction and identify avenues for improvement. Methods We performed a multi-model comparative assessment of probabilistic forecasts submitted by 15 teams for annual WNND cases in US counties for 2020 and assessed forecast accuracy, calibration, and discriminatory power. In the evaluation, we included forecasts produced by comparison models of varying complexity as benchmarks of forecast performance. We also used regression analysis to identify modeling approaches and contextual factors that were associated with forecast skill. Results Simple models based on historical WNND cases generally scored better than more complex models and combined higher discriminatory power with better calibration of uncertainty. Forecast skill improved across updated forecast submissions submitted during the 2020 season. Among models using additional data, inclusion of climate or human demographic data was associated with higher skill, while inclusion of mosquito or land use data was associated with lower skill. We also identified population size, extreme minimum winter temperature, and interannual variation in WNND cases as county-level characteristics associated with variation in forecast skill. Conclusions Historical WNND cases were strong predictors of future cases with minimal increase in skill achieved by models that included other factors. Although opportunities might exist to specifically improve predictions for areas with large populations and low or high winter temperatures, areas with high case-count variability are intrinsically more difficult to predict. Also, the prediction of outbreaks, which are outliers relative to typical case numbers, remains difficult. Further improvements to prediction could be obtained with improved calibration of forecast uncertainty and access to real-time data streams (e.g. current weather and preliminary human cases). Graphical Abstract 
    more » « less
  2. Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the global convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis applies to very general settings, as we only assume ergodicity of the underlying Markov decision process. In order to ensure enough exploration, we employ an ϵ-greedy sampling of the trajectory. For a fixed and small enough exploration parameter ϵ, we show that the two-time-scale natural actor-critic algorithm has a rate of convergence of O~(1/T1/4), where T is the number of samples, and this leads to a sample complexity of O~(1/δ8) samples to find a policy that is within an error of δ from the global optimum. Moreover, by carefully decreasing the exploration parameter ϵ as the iterations proceed, we present an improved sample complexity of O~(1/δ6) for convergence to the global optimum. 
    more » « less
  3. null (Ed.)
    We consider an online binary prediction setting where a forecaster observes a sequence of T bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is 1. The forecaster is called well-calibrated if for each p in [0,1], among the n_p bits for which the forecaster predicts probability p, the actual number of ones, m_p, is indeed equal to p*n_p. The calibration error, defined as \sum_p |m_p - p n_p|, quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an O(T^(2/3)) calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an sqrt(T) bound that follows from the trivial example of independent fair coin flips. In this paper, we prove an T^(0.528) bound on the calibration error, which is the first bound above the trivial sqrt(T) lowerbound for this setting. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termed the Sign-Preservation game, which may be of independent interest. This game has a much smaller state space than the full prediction setting and allows simpler analyses. The T^0.528 lower bound follows from a general reduction theorem that translates lower bounds on the game value of Sign-Preservation into lower bounds on the calibration error. 
    more » « less
  4. Abstract

    When forecasts for a major weather event begin days in advance, updates may be more accurate but inconsistent with the original forecast. Evidence suggests that resulting inconsistency may reduce user trust. However, adding an uncertainty estimate to the forecast may attenuate any loss of trust due to forecast inconsistency, as has been shown with forecast inaccuracy. To evaluate this hypothesis, this experiment tested the impact on trust of adding probabilistic snow-accumulation forecasts to single-value forecasts in a series of original and revised forecast pairs (based on historical records) that varied in both consistency and accuracy. Participants rated their trust in the forecasts and used them to make school-closure decisions. One-half of the participants received single-value forecasts, and one-half also received the probability of 6 in. or more (decision threshold in the assigned task). As with previous research, forecast inaccuracy was detrimental to trust, although probabilistic forecasts attenuated the effect. Moreover, the inclusion of probabilistic forecasts allowed participants to make economically better decisions. Surprisingly, in this study inconsistency increased rather than decreased trust, perhaps because it alerted participants to uncertainty and led them to make more cautious decisions. Furthermore, the positive effect of inconsistency on trust was enhanced by the inclusion of probabilistic forecast. This work has important implications for practical settings, suggesting that both probabilistic forecasts and forecast inconsistency provide useful information to decision-makers. Therefore, members of the public may benefit from well-calibrated uncertainty estimates and newer, more reliable information.

    Significance Statement

    The purpose of this study was to clarify how explicit uncertainty information and forecast inconsistency impact trust and decision-making in the context of sequential forecasts from the same source. This is important because trust is critical for effective risk communication. In the absence of trust, people may not use available information and subsequently may put themselves and others at greater-than necessary risk. Our results suggest that updating forecasts when newer, more reliable information is available and providing reliable uncertainty estimates can support user trust and decision-making.

    more » « less
  5. We consider the vulnerability of fairness-constrained learning to malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and proved that any proper learner can exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we need only incur a Θ(α) loss in accuracy, where α is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(sqrt(α)) loss, and give a matching Ω(sqrt(α)) lower bound. For Equalized Odds and Predictive Parity, however, and adversary can indeed force an Ω(1) loss. The key technical novelty of our work is how randomization can bypass simple 'tricks' an adversary can use to amplify its power. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data. 
    more » « less