NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Function Design for Improved Competitive Ratio in Online Resource Allocation with Procurement Costs

https://doi.org/10.1287/ijoo.2021.0012

Ray, Mitas; Sadeghi, Omid; Ratliff, Lillian J; Fazel, Maryam (December 2024, INFORMS Journal on Optimization)

We study the problem of online resource allocation, where customers arrive sequentially, and the seller must irrevocably allocate resources to each incoming customer while also facing a prespecified procurement cost function over the total allocation. The objective is to maximize the reward obtained from fulfilling the customers’ requests sans the cumulative procurement cost. We analyze the competitive ratio of a primal-dual algorithm in this setting and develop an optimization framework for designing a surrogate function for the procurement cost to be used by the algorithm to improve the competitive ratio of the primal-dual algorithm. We use the optimal surrogate function for polynomial procurement cost functions to improve on previous bounds. For general procurement cost functions, our design method uses quasiconvex optimization to find optimal design parameters. We then implement the design techniques and show the improved performance of the algorithm in numerical examples. Finally, we extend the analysis by devising a posted pricing mechanism in which the algorithm does not require the customers’ preferences to be revealed. Funding: M. Fazel’s work was supported in part by the National Science Foundation [Awards 2023166, 2007036, and 1740551]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoo.2021.0012 .
more » « less
Full Text Available
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Wagenmaker, Andrew; Huang, Kevin; Ke, Liyiming; Boots, Byron; Jamieson, Kevin; Gupta, Abhishek (December 2024, Conference on Neural Information Processing Systems)

Full Text Available
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Wang, Yiping; Chen, Yifang; Yan, Wendan; Fang, Alex; Zhou, Wenjin; Du, Simon; Jamieson, Kevin (December 2024, Conference on Neural Information Processing Systems)

Full Text Available
Fair Active Learning in Low-Data Regimes

Romain, Camilleri; Jain, Lalit; Jamieson, Kevin; Morgenstern, Jamie (July 2024, Uncertainty in Artificial Intelligence)

Full Text Available
Optimal Exploration is no harder than Thompson Sampling

Li, Zhaoqi; Jamieson, Kevin; Jain, Lalit (May 2024, International Conference on Artificial Intelligence and Statistics)

Full Text Available
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Xiong, Zhihan; Camilleri, Romain; Fazel, Maryam; Jain, Lalit; Jamieson, Kevin (May 2024, International Conference on Artificial Intelligence and Statistics)

Full Text Available
Optimal Exploration for Model-Based RL in Nonlinear Systems

Wagenmaker, Andrew; Shi, Gunaya; Jamieson, Kevin (December 2023, Advances in neural information processing systems)

Full Text Available
Stochastic Contextual Bandits with Long Horizon Rewards

https://doi.org/10.1609/aaai.v37i8.26140

Qin, Yuzhen; Li, Yingcong; Pasqualetti, Fabio; Fazel, Maryam; Oymak, Samet (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where the current reward depends on at most s prior actions and contexts (not necessarily consecutive), up to a time horizon of h. In order to avoid polynomial dependence on h, we propose new algorithms that leverage sparsity to discover the dependence pattern and arm parameters jointly. We consider both the data-poor (T= h) regimes and derive respective regret upper bounds O(d square-root(sT) +min(q, T) and O( square-root(sdT) ), with sparsity s, feature dimension d, total time horizon T, and q that is adaptive to the reward dependence pattern. Complementing upper bounds, we also show that learning over a single trajectory brings inherent challenges: While the dependence pattern and arm parameters form a rank-1 matrix, circulant matrices are not isometric over rank-1 manifolds and sample complexity indeed benefits from the sparse reward dependence structure. Our results necessitate a new analysis to address long-range temporal dependencies across data and avoid polynomial dependence on the reward horizon h. Specifically, we utilize connections to the restricted isometry property of circulant matrices formed by dependent sub-Gaussian vectors and establish new guarantees that are also of independent interest.
more » « less
Full Text Available
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Zhihan Xiong, Ruoqi Shen (December 2022, Proceedings of Machine Learning Research)

Full Text Available
Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Wagenmaker, Andrew; Simchowitz, Max; Jamieson, Kevin (January 2022, Proceedings of Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records