skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Active Search using Meta-Bandits
There are many applications where positive instances are rare but important to identify. For example, in NLP, positive sentences for a given relation are rare in a large corpus. Positive data are more informative for learning in these applications, but before one labels a certain amount of data, it is unknown where to find the rare positives. Since random sampling can lead to significant waste in labeling effort, previous ”active search” methods use a single bandit model to learn about the data distribution (exploration) while sampling from the regions potentially containing more positives (exploitation). Many bandit models are possible and a sub-optimal model reduces labeling efficiency, but the optimal model is unknown before any data are labeled. We propose Meta-AS (Meta Active Search) that uses a meta-bandit to evaluate a set of base bandits and aims to label positive examples efficiently, comparing to the optimal base bandit with hindsight. The meta-bandit estimates the mean and variance of the performance of the base bandits, and selects a base bandit to propose what data to label next for exploration or exploitation. The feedback in the labels updates both the base bandits and the meta-bandit for the next round. Meta-AS can accommodate a diverse set of base bandits to explore assumptions about the dataset, without over-committing to a single model before labeling starts. Experiments on five datasets for relation extraction demonstrate that Meta-AS labels positives more efficiently than the base bandits and other bandit selection strategies.  more » « less
Award ID(s):
1931042 1757787
PAR ID:
10184358
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The 29th ACM International Conference on Information and Knowledge Management (CIKM'2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Prophet Inequality and Pandora's Box problems are fundamental stochastic problem with applications in Mechanism Design, Online Algorithms, Stochastic Optimization, Optimal Stopping, and Operations Research. A usual assumption in these works is that the probability distributions of the n underlying random variables are given as input to the algorithm. Since in practice these distributions need to be learned under limited feedback, we initiate the study of such stochastic problems in the Multi-Armed Bandits model. In the Multi-Armed Bandits model we interact with n unknown distributions over T rounds: in round t we play a policy x(t) and only receive the value of x(t) as feedback. The goal is to minimize the regret, which is the difference over T rounds in the total value of the optimal algorithm that knows the distributions vs. the total value of our algorithm that learns the distributions from the limited feedback. Our main results give near-optimal Õ (poly (n) √T) total regret algorithms for both Prophet Inequality and Pandora's Box. Our proofs proceed by maintaining confidence intervals on the unknown indices of the optimal policy. The exploration-exploitation tradeoff prevents us from directly refining these confidence intervals, so the main technique is to design a regret upper bound function that is learnable while playing low-regret Bandit policies. 
    more » « less
  2. In this paper, we propose and study opportunistic contextual bandits - a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a sub-optimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive Upper-Confidence-Bound algorithm (AdaLinUCB) to adaptively balance the exploration-exploitation trade-off for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problem-dependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and real-world dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations. 
    more » « less
  3. We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by 𝑑⋆ ) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of “corralling” multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension 𝑑⋆ comes at a cost: There is no algorithm that can achieve the regret bound 𝑂˜(𝑑⋆𝑇‾‾‾‾√) simultaneously for all values of 𝑑⋆ . We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones. 
    more » « less
  4. We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (β at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite [Formula: see text]), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings, and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits. 
    more » « less
  5. Peng, Jie (Ed.)
    Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels for all data points (instances) in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide the label combination for all data points by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion that balances the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and classification resolution (how many labels are predictable), to guide practitioners on how to combine ambiguous outcome labels. To find the optimal label combination indicated by ITCA, we propose two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: improving prediction accuracy and identifying ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications, including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification. We also provide theoretical insights into ITCA by studying the oracle and the linear discriminant analysis classification algorithms. Python package itca (available at https://github.com/JSB-UCLA/ITCA) implements ITCA and the search strategies. 
    more » « less