Search for: All records

Award ID contains: 2141781

« Prev Next »

Total Resources

13

Resource Type
Conference Paper

13

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

6

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reinforcement Learning in Low-rank MDPs with Density Features

Huang, A ; Chen, J ; Jiang, N. ( July 2024 , Proceedings of the 40th International Conference on Machine Learning)

Free, publicly-accessible full text available July 23, 2025
Adversarial Model for Offline Reinforcement Learning

Bhardwaj, M ; Xie, T ; Boots, B ; Jiang, N ; Cheng, C. ( December 2023 , Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2024
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, M ; Kiyohara, H ; Bennett, A ; Chernozhukov, V ; Jiang, N ; Kallus, N ; Shi, C ; Sun, W. ( December 2023 , Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2024
Offline Learning in Markov Games with General Function Approximation

Zhang, Y ; Bai, Y ; Jiang, N. ( July 2023 , Proceedings of the 40th International Conference on Machine Learning)

Free, publicly-accessible full text available July 23, 2024
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Amortila, P ; Jiang, N ; Szepesvári, C. ( July 2023 , Proceedings of the 40th International Conference on Machine Learning)

Free, publicly-accessible full text available July 23, 2024
THE ROLE OF COVERAGE IN ONLINE REINFORCEMENT LEARNING

Xie, T ; Foster, D ; Bai, Y ; Jiang, N ; Kakade, S. ( May 2023 , Proceedings of the Eleventh International Conference on Learning Representations)

Free, publicly-accessible full text available May 1, 2024
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

Huang, Audrey ; Jiang, Nan ( December 2022 , Advances in neural information processing systems)

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios). While recent works on marginalized importance sampling (MIS) show that the former can enjoy provable guarantees under realizable function approximation, the latter is only known to be feasible under much stronger assumptions such as prohibitively expressive discriminators. In this work, we provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the MIS objectives. Compared to commonly used regularization in MIS, our regularizer is much more flexible and can account for an arbitrary user-specified distribution, under which the learned function will be close to the ground truth. We provide exact characterization of the optimal dual solution that needs to be realized by the discriminator class, which determines the data-coverage assumption in the case of value-function learning. As another surprising observation, the regularizer can be altered to relax the data-coverage requirement, and completely eliminate it in the ideal case with strong side information.
more » « less
Full Text Available
Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

Huang, Jiawei ; Zhao, Li ; Qin, Tao ; Chen, Wei ; Jiang, Nan ; Liu, Tie-Yan ( December 2022 , Advances in neural information processing systems)

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies π^O and πE: π^O ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while π^E ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., π^E=π^O) for the risk-averse users. We individually consider the gap-independent vs. gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce π^E, we can achieve a constant regret for risk-averse users independent of the number of episodes K, which is in sharp contrast to the Ω(logK) regret for any online RL algorithms in the same setting, while the regret of π^O (almost) maintains its online regret optimality and does not need to compromise for the success of π^E.
more » « less
Full Text Available
Offline reinforcement learning under value and density-ratio realizability: The power of gaps

Chen, Jinglin ; Jiang, Nan ( August 2022 , Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence)

We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling (MIS), and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the utility and the novel mechanism of gap assumptions in offline RL with weak function approximation.
more » « less
Full Text Available
Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Zhan, Wenhao ; Huang, Baihe ; Huang, Audrey ; Jiang, Nan ; Lee, Jason ( July 2022 , Proceedings of Thirty Fifth Conference on Learning Theory)

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability. We also provide alternative analyses based on different assumptions to shed light on the nature of primal-dual algorithms for offline RL.
more » « less
Full Text Available

« Prev Next »