The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings.
more »
« less
Pareto Optimal Model Selection in Linear Bandits
We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by ๐โ
) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of โcorrallingโ multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension ๐โ
comes at a cost: There is no algorithm that can achieve the regret bound ๐ห(๐โ๐โพโพโพโพโ)
simultaneously for all values of ๐โ
. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.
more »
« less
- Award ID(s):
- 2023239
- PAR ID:
- 10533126
- Publisher / Repository:
- International Conference on Artificial Intelligence and Statistics
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset ๐ onto a random ๐=๐(๐๐)-dimensional subspace (where ๐๐ is the doubling dimension of ๐), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension ๐ having an extra loglog๐ term and the approximation factor being arbitrarily close to 1. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of ๐-means and ๐-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of ๐.more » « less
-
We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of ร(KโKNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of โlog 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.more » « less
-
In this paper, we study the online multidimensional knapsack problem (called OMdKP) in which there is a knapsack whose capacity is represented in m dimensions, each dimension could have a different capacity. Then, n items with different scalar profit values and m-dimensional weights arrive in an online manner and the goal is to admit or decline items upon their arrival such that the total profit obtained by admitted items is maximized and the capacity of knapsack across all dimensions is respected. This is a natural generalization of the classic single-dimension knapsack problem and finds several relevant applications such as in virtual machine allocation, job scheduling, and all-or-nothing flow maximization over a graph. We develop two algorithms for OMdKP that use linear and exponential reservation functions to make online admission decisions. Our competitive analysis shows that the linear and exponential algorithms achieve the competitive ratios of O(ฮธฮฑ ) and O(ลogล(ฮธฮฑ)), respectively, where ฮฑ is the ratio between the aggregate knapsack capacity and the minimum capacity over a single dimension and ฮธ is the ratio between the maximum and minimum item unit values. We also characterize a lower bound for the competitive ratio of any online algorithm solving OMdKP and show that the competitive ratio of our algorithm with exponential reservation function matches the lower bound up to a constant factor.more » « less
-
Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are โrules-of-thumbsโ from an โeasy-to-learn classโ. (Schapire and Freund โ12, Shalev-Shwartz and Ben-David โ14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed in order to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire (โ95, โ12). Whereas the lower bound shows that ฮฉ(1/ฮณ2) weak hypotheses with ฮณ-margin are sometimes necessary, our new method requires only ร(1/ฮณ) weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex (โdeeperโ) aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are โfar awayโ from the class be learned? Towards answering the first question we identify a combinatorial-geometric parameter which captures the expressivity of base-classes in boosting. As a corollary we provide an affirmative answer to the second question for many well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.more » « less