Search for: All records

Creators/Authors contains: "Xu, Tengyu"

« Prev Next »

Total Resources

13

Resource Type
Conference Paper

11

Conference Proceeding

0

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

13

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model-Based Offline Meta-Reinforcement Learning with Regularization

Lin, Sen ; Wan, Jialin ; Xu, Tengyu ; Liang, Yingbin ; Zhang. Junshan ( April 2022 , The Tenth International Conference on Learning Representations)

Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balance has to be delicately calibrated between "exploring" the out-of-distribution state-actions by following the meta-policy and "exploiting" the offline dataset by staying close to the behavior policy. Motivated by such empirical analysis, we propose model-based offline ta-RL with regularized policy optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative meta-policy for safe exploration of out-of-distribution state-actions. In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using both conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy. We theoretically show that the learnt policy offers guaranteed improvement over both the behavior policy and the meta-policy, thus ensuring the performance improvement on new tasks via offline Meta-RL. Our experiments corroborate the superior performance of MerPO over existing offline Meta-RL methods.
more » « less
Full Text Available
PER-ETD: A polynomially efficient emphatic temporal difference learning method

Guan, Ziwei ; Xu, Tengyu ; Liang, Yingbin. ( January 2022 , International Conference on Learning Representations (ICLR))

Full Text Available
A unified off-policy evaluation approach for general value function

Xu, Tengyu ; Yang, Zhuoran ; Wang, Zhaoran ; Liang, Yingbin. ( January 2022 , Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Model-Based Offline Meta-Reinforcement Learning with Regularization

Lin, Sen ; Wan, Jialin ; Xu, Tengyu ; Liang, Yingbin ; Zhang, Junshan. ( January 2022 , International Conference on Learning Representations (ICLR))

Full Text Available
Deterministic policy gradient: convergence analysis

Xiong, Huaqing ; Xu, Tengyu ; Zhao, Lin ; Liang, Yingbin ; Zhang, Wei. ( January 2022 , Proc. 38th Conference on Uncertainty in Artificial Intelligence (UAI))

Full Text Available
Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

Chen, Ziyi ; Zhou, Yi ; Xu, Tengyu ; Liang, Yingbin ( September 2021 , International Conference on Learning Representation)

Full Text Available
Sample complexity bounds for two timescale value-based reinforcement learning algorithms

Xu, Tengyu ; Liang, Yingbin. ( January 2021 , Proc. International Conference on Artificial Intelligence and Statistics (AISTATS))
null (Ed.)
Full Text Available
CRPO: A new approach for safe reinforcement learning with convergence guarantee

Xu, Tengyu ; Liang, Yingbin ; Lan, Guanghui. ( January 2021 , Proc. International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
When will generative adversarial imitation learning algorithms attain global convergence

Guan, Ziwei ; Xu, Tengyu ; Liang, Yingbin. ( January 2021 , Proc. International Conference on Artificial Intelligence and Statistics (AISTATS))
null (Ed.)
Full Text Available
Proximal gradient descent-ascent: Variable convergence under KL geometry

Chen, Ziyi ; Zhou, Yi ; Xu, Tengyu ; Liang, Yingbin. ( January 2021 , Proc. International Conference on Learning Representations (ICLR))
null (Ed.)
Full Text Available

« Prev Next »