- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources1
- Resource Type
-
10
- Availability
-
10
- Author / Contributor
- Filter by Author / Creator
-
-
Bartlett, Peter L. (1)
-
Ghavamzadeh, Mohammad (1)
-
Jiang, Heinrich (1)
-
Pacchiano, Aldo (1)
-
#Tyler Phillips, Kenneth E. (0)
-
& *Soto, E. (0)
-
& Ahmed, Khadija. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
& Attari, S. Z. (0)
-
& Ayala, O. (0)
-
& Babbitt, W. (0)
-
& Baek, Y. (0)
-
& Bai, F. (0)
-
& Barth-Cohen, L. (0)
-
& Bassett, L. (0)
-
& Beaulieu, C (0)
-
& Bein, E. (0)
-
- Filter by Editor
-
-
Banerjee, Arindam (1)
-
Fukumizu, Kenji (1)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
:Chaosong Huang, Gang Lu (0)
-
A. Beygelzimer (0)
-
A. Ghate, K. Krishnaiyer (0)
-
A. I. Sacristán, J. C. (0)
-
A. Weinberg, D. Moore-Russo (0)
-
A. Weinberger (0)
-
A.I. Sacristán, J.C. Cortés-Zavala (0)
-
A.I., Dimitrova (0)
-
ACS (0)
-
AIAA (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of multiple rounds is maximum, and each one of them has an expected cost below a certain threshold. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove a sublinear bound on its regret that is inversely proportional to the difference between the constraint threshold and the cost of a known feasible action. Our algorithm balances exploration and constraint satisfaction using a novel idea thatmore »