Search for: All records

Award ID contains: 2134080

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

4

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

https://doi.org/10.1287/opre.2023.2450

Li, Gen ; Cai, Changxiao ; Chen, Yuxin ; Wei, Yuting ; Chi, Yuejie ( January 2024 , Operations Research)

This paper investigates a model-free algorithm of broad interest in reinforcement learning, namely, Q-learning. Whereas substantial progress had been made toward understanding the sample efficiency of Q-learning in recent years, it remained largely unclear whether Q-learning is sample-optimal and how to sharpen the sample complexity analysis of Q-learning. In this paper, we settle these questions: (1) When there is only a single action, we show that Q-learning (or, equivalently, TD learning) is provably minimax optimal. (2) When there are at least two actions, our theory unveils the strict suboptimality of Q-learning and rigorizes the negative impact of overestimation in Q-learning. Our theory accommodates both the synchronous case (i.e., the case in which independent samples are drawn) and the asynchronous case (i.e., the case in which one only has access to a single Markovian trajectory).

more » « less
Free, publicly-accessible full text available January 1, 2025
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

https://doi.org/10.1137/21M1456789

Zhan, Wenhao ; Cen, Shicong ; Huang, Baihe ; Chen, Yuxin ; Lee, Jason D. ; Chi, Yuejie ( June 2023 , SIAM Journal on Optimization)

Free, publicly-accessible full text available June 30, 2024
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

https://doi.org/10.1093/imaiai/iaac034

Li, Gen ; Shi, Laixi ; Chen, Yuxin ; Chi, Yuejie ( February 2023 , Information and Inference: A Journal of the IMA)

Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.
more » « less
Full Text Available
Active Heterogeneous Graph Neural Networks with Per-step Meta-Q-Learning

https://doi.org/10.1109/ICDM54844.2022.00176

Zhang, Yuheng ; Xia, Yinglong ; Zhu, Yan ; Chi, Yuejie ; Ying, Lei ; Tong, Hanghang ( November 2022 , 2022 IEEE International Conference on Data Mining (ICDM))

Recent years have witnessed the superior performance of heterogeneous graph neural networks (HGNNs) in dealing with heterogeneous information networks (HINs). Nonetheless, the success of HGNNs often depends on the availability of sufficient labeled training data, which can be very expensive to obtain in real scenarios. Active learning provides an effective solution to tackle the data scarcity challenge. For the vast majority of the existing work regarding active learning on graphs, they mainly focus on homogeneous graphs, and thus fall in short or even become inapplicable on HINs. In this paper, we study the active learning problem with HGNNs and propose a novel meta-reinforced active learning framework MetRA. Previous reinforced active learning algorithms train the policy network on labeled source graphs and directly transfer the policy to the target graph without any adaptation. To better exploit the information from the target graph in the adaptation phase, we propose a novel policy transfer algorithm based on meta-Q-learning termed per-step MQL. Empirical evaluations on HINs demonstrate the effectiveness of our proposed framework. The improvement over the best baseline is up to 7% in Micro-F1.
more » « less
Full Text Available
Batch Active Learning with Graph Neural Networks via Multi-Agent Deep Reinforcement Learning

Zhang, Yuheng ; Tong, Hanghang ; Xia, Yinglong ; Zhu, Yan ; Chi, Yuejie ; Ying, Lei ( January 2022 , 36th AAAI Conference on Artificial Intelligence)

Full Text Available
SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

Li, Z. ; Zhao, H. ; Li, B. ; Chi, Y. ( January 2022 , Advances in neural information processing systems)

Full Text Available
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Li, Gen ; Shi, Laixi ; Chen, Yuxin ; Gu, Yuantao ; Chi, Yuejie ( January 2021 , Advances in Neural Information Processing Systems 34)

Full Text Available