skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 30, 2025

Title: Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a fixed conditional distribution. The agent seeks to remain competitive with the “best” among the given set of experts. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts and provide horizon-independent constant regret bounds that only scale linearly in the number of experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions. Further, we investigate the episodic setting where the agent interacts with an environment that changes over episodes. Each episode can have different context and reward distributions resulting in the best expert changing across episodes. We show that by bootstrapping from\(\mathcal {O}(N\log (NT^2\sqrt {E}))\)samples, ED-UCB guarantees a regret that scales as\(\mathcal {O}(E(N+1) + \frac{N\sqrt {E}}{T^2})\)forNexperts overEepisodes, each of lengthT. We finally empirically validate our findings through simulations.  more » « less
Award ID(s):
2107037
PAR ID:
10567978
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Modeling and Performance Evaluation of Computing Systems
Volume:
9
Issue:
3
ISSN:
2376-3639
Page Range / eLocation ID:
1 to 33
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We prove novel algorithmic guarantees for several online problems in the smoothed analysis model. In this model, at each time step an adversary chooses an input distribution with density function bounded above pointwise by \(\tfrac{1}{\sigma }\)times that of the uniform distribution; nature then samples an input from this distribution. Here, σ is a parameter that interpolates between the extremes of worst-case and average case analysis. Crucially, our results hold foradaptiveadversaries that can base their choice of input distribution on the decisions of the algorithm and the realizations of the inputs in the previous time steps. An adaptive adversary can nontrivially correlate inputs at different time steps with each other and with the algorithm’s current state; this appears to rule out the standard proof approaches in smoothed analysis. This paper presents a general technique for proving smoothed algorithmic guarantees against adaptive adversaries, in effect reducing the setting of an adaptive adversary to the much simpler case of an oblivious adversary (i.e., an adversary that commits in advance to the entire sequence of input distributions). We apply this technique to prove strong smoothed guarantees for three different problems:(1)Online learning: We consider the online prediction problem, where instances are generated from an adaptive sequence of σ-smooth distributions and the hypothesis class has VC dimensiond. We bound the regret by\(\tilde{O}(\sqrt {T d\ln (1/\sigma)} + d\ln (T/\sigma))\)and provide a near-matching lower bound. Our result shows that under smoothed analysis, learnability against adaptive adversaries is characterized by the finiteness of the VC dimension. This is as opposed to the worst-case analysis, where online learnability is characterized by Littlestone dimension (which is infinite even in the extremely restricted case of one-dimensional threshold functions). Our results fully answer an open question of Rakhlin et al. [64].(2)Online discrepancy minimization: We consider the setting of the online Komlós problem, where the input is generated from an adaptive sequence of σ-smooth and isotropic distributions on the ℓ2unit ball. We bound the ℓnorm of the discrepancy vector by\(\tilde{O}(\ln ^2(\frac{nT}{\sigma }))\). This is as opposed to the worst-case analysis, where the tight discrepancy bound is\(\Theta (\sqrt {T/n})\). We show such\(\mathrm{polylog}(nT/\sigma)\)discrepancy guarantees are not achievable for non-isotropic σ-smooth distributions.(3)Dispersion in online optimization: We consider online optimization with piecewise Lipschitz functions where functions with ℓ discontinuities are chosen by a smoothed adaptive adversary and show that the resulting sequence is\(({\sigma }/{\sqrt {T\ell }}, \tilde{O}(\sqrt {T\ell }))\)-dispersed. That is, every ball of radius\({\sigma }/{\sqrt {T\ell }}\)is split by\(\tilde{O}(\sqrt {T\ell })\)of the partitions made by these functions. This result matches the dispersion parameters of Balcan et al. [13] for oblivious smooth adversaries, up to logarithmic factors. On the other hand, worst-case sequences are trivially (0,T)-dispersed.1 
    more » « less
  2. A constraint satisfaction problem (CSP),\(\textsf {Max-CSP}(\mathcal {F})\), is specified by a finite set of constraints\(\mathcal {F}\subseteq \lbrace [q]^k \rightarrow \lbrace 0,1\rbrace \rbrace\)for positive integersqandk. An instance of the problem onnvariables is given bymapplications of constraints from\(\mathcal {F}\)to subsequences of thenvariables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the (γ ,β)-approximation version of the problem for parameters 0 ≤ β ≤ γ ≤ 1, the goal is to distinguish instances where at least γ fraction of the constraints can be satisfied from instances where at most β fraction of the constraints can be satisfied. In this work, we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family\(\mathcal {F}\)and every β < γ, we show that either a linear sketching algorithm solves the problem in polylogarithmic space or the problem is not solvable by any sketching algorithm in\(o(\sqrt {n})\)space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case ofq=k=2, where we get a dichotomy, and the case when the satisfying assignments of the constraints of\(\mathcal {F}\)support a distribution on\([q]^k\)with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variablesq= 2, binary constraintsk=2, and singleton families\(|\mathcal {F}|=1\)and only considered the setting where constraints are placed on literals rather than variables. Our positive results show wide applicability of bias-based algorithms used previously by [47] and [41], which we extend to include richer norm estimation algorithms, by giving a systematic way to discover biases. Our negative results combine the Fourier analytic methods of [56], which we extend to a wider class of CSPs, with a rich collection of reductions among communication complexity problems that lie at the heart of the negative results. In particular, previous works used Fourier analysis over the Boolean cube to initiate their results and the results seemed particularly tailored to functions on Boolean literals (i.e., with negations). Our techniques surprisingly allow us to get to generalq-ary CSPs without negations by appealing to the same Fourier analytic starting point over Boolean hypercubes. 
    more » « less
  3. This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional\(\log n\)-bit pointers can be replaced with\(o(\log n)\)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an item in an array filled to load factor\(1-\delta\), then the optimal tiny-pointer size is\(\Theta(\log\log\log n+\log\delta^{-1})\)bits in the fixed-size case, and\(\Theta(\log\delta^{-1})\)expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we apply tiny pointers to five classic data-structure problems. We show that:A data structure storing\(n\)\(v\)-bit values for\(n\)keys with constant-factor time modifications/queries can be implemented to take space\(nv+O(n\log^{(r)}n)\)bits, for any constant\(r\gt0\), as long as the user stores a tiny pointer of expected size\(O(1)\)with each key—here,\(\log^{(r)}n\)is the\(r\)-th iterated logarithm.Any binary search tree can be made succinct, meaning that it achieves\((1+o(1))\)times the optimal space, with constant-factor time overhead, and can even be made to be within\(O(n)\)bits of optimal if we allow for\(O(\log^{*}n)\)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree.Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-factor time overhead and\((1+o(1))\)-factor space overhead.Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-factor time overhead and with an additional space consumption of\(\log^{(r)}n+O(\log j)\)bits per\(j\)-bit value for an arbitrary constant\(r\gt0\)of our choice.Given an external-memory array\(A\)of size\((1+\varepsilon)n\)containing a dynamic set of up to\(n\)key-value pairs, it is possible to maintain an internal-memory stash of size\(O(n\log\varepsilon^{-1})\)bits so that the location of any key-value pair in\(A\)can be computed in constant time (and with no IOs). In each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free. 
    more » « less
  4. Given a weighted, ordered query set\(Q\)and a partition of\(Q\)into classes, we study the problem of computing a minimum-cost decision tree that, given any query\(q\in Q\), uses equality tests and less-than tests to determine\(q\)'s class. Such a tree can be faster and smaller than a conventional search tree and smaller than a lookup table (both of which must identify\(q\), not just its class). We give the first polynomial-time algorithm for the problem. The algorithm extends naturally to the setting where each query has multiple allowed classes. 
    more » « less
  5. Current computing device authentication often presents accessibility barriers for people withupper extremity impairments (UEI). In this article, we present a framework calledAccessible image-Association-based Authentication for Computing devices (A3C), a novel recognition-based graphical authentication framework specifically designed for people with UEI to authenticate to their computing devices. A3C requires users to provide a set of primary images the user knows that are recognizable to them and subsequently associate each primary image with a secondary image. To evaluate the efficacy of the A3C framework, we instantiated the framework by implementing a version of A3C calledA3C-FA, which uses images of faces of people the user knows as the primary image and animal images as the secondary image. We then performed three studies to evaluate A3C-FA: a shoulder-surfing attack study (N\(=\)319), a close-adversary attack study (N\(=\)268), and a usability study with people with UEI (N\(=\)14). We found that A3C was robust against both shoulder-surfing and close-adversary attacks. We also performed a detailed study to evaluate the accessibility of A3C-FA. Our participants reported that A3C-FA was more usable and more secure than the authentication approaches with which they were familiar. Based on these findings, we suggest four areas of future research to further improve the design of the A3C framework. 
    more » « less