The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings.
more »
« less
This content will become publicly available on July 13, 2026
Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition
We investigate the problem of identifying the optimal scoring rule within the principal-agent framework for online information acquisition problem. We focus on the principal's perspective, seeking to determine the desired scoring rule through interactions with the agent. To address this challenge, we propose two algorithms: OIAFC and OIAFB, tailored for fixed confidence and fixed budget settings, respectively. Our theoretical analysis demonstrates that OIAFC can extract the desired $$(\epsilon,\delta)$$-scoring rule with an efficient instance-dependent sample complexity or an instance-independent sample complexity. Our analysis also shows that OIAFB matches the instance-independent performance bound of OIAFC, while both algorithms share the same complexity across fixed confidence and fixed budget settings.
more »
« less
- Award ID(s):
- 2403401
- PAR ID:
- 10620920
- Publisher / Repository:
- The 42nd International Conference on Machine Learning (ICML 2025)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A. (Ed.)While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL)—the complexity of learning on the “worst-case” instance—such measures of complexity often do not capture the true difficulty of learning. In practice, on an “easy” instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the “instance-dependent” complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, Pedel, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that Pedel yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the “directions” most relevant to learning a near-optimal policy, and may be of independent interest.more » « less
-
Most linear experimental design problems assume homogeneous variance, while the presence of heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors that can be probed to receive noisy linear responses. We propose, analyze and empirically evaluate a novel design for uniformly bounding estimation error of the variance parameters. We demonstrate this method on two adaptive experimental design problems under heteroskedastic noise, fixed confidence transductive best-arm identification and level-set identification and prove the first instance-dependent lower bounds in these settings. Lastly, we construct near-optimal algorithms and demonstrate the large improvements in sample complexity gained from accounting for heteroskedastic variance in these designs empirically.more » « less
-
null (Ed.)Peer prediction mechanisms incentivize agents to truthfully report their signals even in the absence of verification by comparing agents’ reports with those of their peers. In the detail-free multi-task setting, agents are asked to respond to multiple independent and identically distributed tasks, and the mechanism does not know the prior distribution of agents’ signals. The goal is to provide an epsilon-strongly truthful mechanism where truth-telling rewards agents “strictly” more than any other strategy profile (with epsilon additive error) even for heterogeneous agents, and to do so while requiring as few tasks as possible. We design a family of mechanisms with a scoring function that maps a pair of reports to a score. The mechanism is strongly truthful if the scoring function is “prior ideal”. Moreover, the mechanism is epsilon-strongly truthful as long as the scoring function used is sufficiently close to the ideal scoring function. This reduces the above mechanism design problem to a learning problem – specifically learning an ideal scoring function. Because learning the prior distribution is sufficient (but not necessary) to learn the scoring function, we can apply standard learning theory techniques that leverage side information about the prior (e.g., that it is close to some parametric model). Furthermore, we derive a variational representation of an ideal scoring function and reduce the learning problem into an empirical risk minimization. We leverage this reduction to obtain very general results for peer prediction in the multi-task setting. Specifically, Sample Complexity. We show how to derive good bounds on the number of tasks required for different types of priors–in some cases exponentially improving previous results. In particular, we can upper bound the required number of tasks for parametric models with bounded learning complexity. Furthermore, our reduction applies to myriad continuous signal space settings. To the best of our knowledge, this is the first peer-prediction mechanism on continuous signals designed for the multi-task setting. Connection to Machine Learning. We show how to turn a soft-predictor of an agent’s signals (given the other agents’ signals) into a mechanism. This allows the practical use of machine learning algorithms that give good results even when many agents provide noisy information. Stronger Properties. In the finite setting, we obtain -strongly truthful mechanisms for any stochastically relevant prior. Prior works either only apply to more restrictive settings, or achieve a weaker notion of truthfulness (informed truthfulness).more » « less
-
In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction.more » « less
An official website of the United States government
