skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Theory of Selective Prediction
We consider a model of selective prediction, where the prediction algorithm is given a data sequence in an online fashion and asked to predict a pre-specified statistic of the upcoming data points. The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far. We prove that, even without any distributional assumption on the input data stream, a large family of statistics can be estimated to non-trivial accuracy. To give one concrete example, suppose that we are given access to an arbitrary binary sequence x1, . . . , xn of length n. Our goal is to accurately predict the average observation, and we are allowed to choose the window over which the prediction is made: for some t < n and m < n − t, after seeing t observations we predict the average of x_{t+1},..., x{t+m}. This particular problem was first studied in [Dru13] and referred to as the “density prediction game”. We show that the expected squared error of our prediction can be bounded by O(1/log n) and prove a matching lower bound, which resolves an open question raised in [Dru13]. This result holds for any sequence (that is not adaptive to when the prediction is made, or the predicted value), and the expectation of the error is with respect to the randomness of the prediction algorithm. Our results apply to more general statistics of a sequence of observations, and we highlight several open directions for future work.  more » « less
Award ID(s):
1813049 1704417
PAR ID:
10098974
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Conference on Learning Theory (COLT)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Santhanam, Rahul (Ed.)
    {"Abstract":["A long-standing open problem dating back to the 1960s is whether there exists a search-to-decision reduction for the time-bounded Kolmogorov complexity problem - that is, the problem of determining whether the length of the shortest time-t program generating a given string x is at most s.\r\nIn this work, we consider the more "robust" version of the time-bounded Kolmogorov complexity problem, referred to as the GapMINKT problem, where given a size bound s and a running time bound t, the goal is to determine whether there exists a poly(t,|x|)-time program of length s+O(log |x|) that generates x. We present the first non-trivial search-to-decision reduction R for the GapMINKT problem; R has a running-time bound of 2^{ε n} for any ε > 0 and additionally only queries its oracle on "thresholds" s of size s+O(log |x|). As such, we get that any algorithm with running-time (resp. circuit size) 2^{α s} poly(|x|,t,s) for solving GapMINKT (given an instance (x,t,s), yields an algorithm for finding a witness with running-time (resp. circuit size) 2^{(α+ε) s} poly(|x|,t,s).\r\nOur second result is a polynomial-time search-to-decision reduction for the time-bounded Kolmogorov complexity problem in the average-case regime. Such a reduction was recently shown by Liu and Pass (FOCS'20), heavily relying on cryptographic techniques. Our reduction is more direct and additionally has the advantage of being length-preserving, and as such also applies in the exponential time/size regime.\r\nA central component in both of these results is the use of Kolmogorov and Levin’s Symmetry of Information Theorem."]} 
    more » « less
  2. Abstract In this work we study d-dimensional majorant properties. We prove that a set of frequencies in $$\mathbb{Z}^d$$ satisfies the strict majorant property on $L^p([0,1]^d)$ for all p > 0 if and only if the set is affinely independent. We further construct three types of violations of the strict majorant property. Any set of at least d + 2 frequencies in $$\mathbb{Z}^d$$ violates the strict majorant property on $L^p([0,1]^d)$ for an open interval of $$p \not\in 2\mathbb{N}$$ of length 2. Any infinite set of frequencies in $$\mathbb{Z}^d$$ violates the strict majorant property on $L^p([0,1]^d)$ for an infinite sequence of open intervals of $$p \not\in 2\mathbb{N}$$ of length 2. Finally, given any p > 0 with $$p \not\in 2\mathbb{N}$$, we exhibit a set of d + 2 frequencies on the moment curve in $$\mathbb{R}^d$$ that violate the strict majorant property on $L^p([0,1]^d).$ 
    more » « less
  3. For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected d-regular graph with eO( 1 1−λ √ n) runtime per query, where λ is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random d-regular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input d-regular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product. 
    more » « less
  4. We present a general, efficient technique for providing contextual predictions that are "multivalid" in various senses, against an online sequence of adversarially chosen examples (x,y). This means that the resulting estimates correctly predict various statistics of the labels y not just marginally --- as averaged over the sequence of examples --- but also conditionally on x in G for any G belonging to an arbitrary intersecting collection of groups. We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from Hebert-Johnson et al. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of mean-conditioned moment multicalibration from Jung et al. Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees. 
    more » « less
  5. Lysyanskaya, Anna; Handschuh, Helena (Ed.)
    We study the black-box function inversion problem, which is the problem of finding x[N] such that f(x)=y, given as input some challenge point y in the image of a function f:[N][N], using T oracle queries to f and preprocessed advice 01S depending on f. We prove a number of new results about this problem, as follows. 1. We show an algorithm that works for any T and S satisfying TS2maxST=(N3) . In the important setting when ST, this improves on the celebrated algorithm of Fiat and Naor [STOC, 1991], which requires TS3N3. E.g., Fiat and Naor's algorithm is only non-trivial for SN23 , while our algorithm gives a non-trivial tradeoff for any SN12 . (Our algorithm and analysis are quite simple. As a consequence of this, we also give a self-contained and simple proof of Fiat and Naor's original result, with certain optimizations left out for simplicity.) 2. We show a non-adaptive algorithm (i.e., an algorithm whose ith query xi is chosen based entirely on and y, and not on the f(x1)f(xi−1)) that works for any T and S satisfying S=(Nlog(NT)) giving the first non-trivial non-adaptive algorithm for this problem. E.g., setting T=Npolylog(N) gives S=(NloglogN). This answers a question due to Corrigan-Gibbs and Kogan [TCC, 2019], who asked whether it was possible for a non-adaptive algorithm to work with parameters T and S satisfying T+SlogNo(N) . We also observe that our non-adaptive algorithm is what we call a guess-and-check algorithm, that is, it is non-adaptive and its final output is always one of the oracle queries x1xT. For guess-and-check algorithms, we prove a matching lower bound, therefore completely characterizing the achievable parameters (ST) for this natural class of algorithms. (Corrigan-Gibbs and Kogan showed that any such lower bound for arbitrary non-adaptive algorithms would imply new circuit lower bounds.) 3. We show equivalence between function inversion and a natural decision version of the problem in both the worst case and the average case, and similarly for functions f:[N][M] with different ranges. All of the above results are most naturally described in a model with shared randomness (i.e., random coins shared between the preprocessing algorithm and the online algorithm). However, as an additional contribution, we show (using a technique from communication complexity due to Newman [IPL, 1991]) how to generically convert any algorithm that uses shared randomness into one that does not. 
    more » « less