Search for: All records

Creators/Authors contains: "Shi, Laixi"

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

5

Conference Proceeding

0

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

https://doi.org/10.1093/imaiai/iaac034

Li, Gen ; Shi, Laixi ; Chen, Yuxin ; Chi, Yuejie ( February 2023 , Information and Inference: A Journal of the IMA)

Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.
more » « less
Full Text Available
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Shi, Laixi ; Li, Gen ; Wei, Yuting ; Chen, Yuxin ; Chi, Yuejie ( January 2022 , Proceedings of the 39th International Conference on Machine Learning)

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts — which do not require explicit model estimation — have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.
more » « less
Full Text Available
Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation

Huang, Peide ; Xu, Mengdi ; Zhu, Jiacheng ; Shi, Laixi ; Fang, Fei ; Zhao, Ding ( January 2022 , Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently

https://doi.org/10.1109/TIT.2021.3075148

Shi, Laixi ; Chi, Yuejie ( July 2021 , IEEE Transactions on Information Theory)
null (Ed.)
Full Text Available
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Li, Gen ; Shi, Laixi ; Chen, Yuxin ; Gu, Yuantao ; Chi, Yuejie ( January 2021 , Advances in Neural Information Processing Systems 34)

Full Text Available
Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently

https://doi.org/10.1109/ICASSP40776.2020.9054356

Shi, Laixi ; Chi, Yuejie ( May 2020 , 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Full Text Available
Device-free Multiple People Localization through Floor Vibration

https://doi.org/10.1145/3360773.3360887

Shi, Laixi ; Mirshekari, Mostafa ; Fagert, Jonathon ; Chi, Yuejie ; Noh, Hae Young ; Zhang, Pei ; Pan, Shijia ( November 2019 , Proceedings of the 1st ACM International Workshop on Device-Free Human Sensing)

Full Text Available