skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pareto Optimal Model Selection in Linear Bandits
We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by ๐‘‘โ‹† ) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of โ€œcorrallingโ€ multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension ๐‘‘โ‹† comes at a cost: There is no algorithm that can achieve the regret bound ๐‘‚หœ(๐‘‘โ‹†๐‘‡โ€พโ€พโ€พโ€พโˆš) simultaneously for all values of ๐‘‘โ‹† . We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.  more » « less
Award ID(s):
2023239
PAR ID:
10533126
Author(s) / Creator(s):
;
Publisher / Repository:
International Conference on Artificial Intelligence and Statistics
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings. 
    more » « less
  2. Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are โ€œrules-of-thumbsโ€ from an โ€œeasy-to-learn classโ€. (Schapire and Freund โ€™12, Shalev-Shwartz and Ben-David โ€™14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed in order to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire (โ€™95, โ€™12). Whereas the lower bound shows that ฮฉ(1/ฮณ2) weak hypotheses with ฮณ-margin are sometimes necessary, our new method requires only ร•(1/ฮณ) weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex (โ€œdeeperโ€) aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are โ€œfar awayโ€ from the class be learned? Towards answering the first question we identify a combinatorial-geometric parameter which captures the expressivity of base-classes in boosting. As a corollary we provide an affirmative answer to the second question for many well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory. 
    more » « less
  3. Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset ๐‘‹ onto a random ๐‘‘=๐‘‚(๐‘‘๐‘‹)-dimensional subspace (where ๐‘‘๐‘‹ is the doubling dimension of ๐‘‹), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension ๐‘‘ having an extra loglog๐‘› term and the approximation factor being arbitrarily close to 1. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of ๐‘˜-means and ๐‘˜-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of ๐‘‹. 
    more » « less
  4. In this paper, we study the online multidimensional knapsack problem (called OMdKP) in which there is a knapsack whose capacity is represented in m dimensions, each dimension could have a different capacity. Then, n items with different scalar profit values and m-dimensional weights arrive in an online manner and the goal is to admit or decline items upon their arrival such that the total profit obtained by admitted items is maximized and the capacity of knapsack across all dimensions is respected. This is a natural generalization of the classic single-dimension knapsack problem and finds several relevant applications such as in virtual machine allocation, job scheduling, and all-or-nothing flow maximization over a graph. We develop two algorithms for OMdKP that use linear and exponential reservation functions to make online admission decisions. Our competitive analysis shows that the linear and exponential algorithms achieve the competitive ratios of O(ฮธฮฑ ) and O(ล‚ogล‚(ฮธฮฑ)), respectively, where ฮฑ is the ratio between the aggregate knapsack capacity and the minimum capacity over a single dimension and ฮธ is the ratio between the maximum and minimum item unit values. We also characterize a lower bound for the competitive ratio of any online algorithm solving OMdKP and show that the competitive ratio of our algorithm with exponential reservation function matches the lower bound up to a constant factor. 
    more » « less
  5. Chambers, Erin W.; Gudmundsson, Joachim (Ed.)
    We present a (combinatorial) algorithm with running time close to O(n^d) for computing the minimum directed L_โˆž Hausdorff distance between two sets of n points under translations in any constant dimension d. This substantially improves the best previous time bound near O(n^{5d/4}) by Chew, Dor, Efrat, and Kedem from more than twenty years ago. Our solution is obtained by a new generalization of Chanโ€™s algorithm [FOCS'13] for Kleeโ€™s measure problem. To complement this algorithmic result, we also prove a nearly matching conditional lower bound close to ฮฉ(n^d) for combinatorial algorithms, under the Combinatorial k-Clique Hypothesis. 
    more » « less