Search for: All records

Creators/Authors contains: "Li, Gen"

« Prev Next »

Total Resources

56

Resource Type
Conference Paper

12

Conference Proceeding

0

Dataset

0

Journal Article

44

Workshop Report

0

Availability
Full Text / Resource Available

50

Citation Only

6

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

https://doi.org/10.1287/opre.2023.2451

Li, Gen ; Wei, Yuting ; Chi, Yuejie ; Chen, Yuxin ( January 2024 , Operations Research)

This paper studies a central issue in modern reinforcement learning, the sample efficiency, and makes progress toward solving an idealistic scenario that assumes access to a generative model or a simulator. Despite a large number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds some enormous threshold. The current paper overcomes this barrier and fully settles this problem; more specifically, we establish the minimax optimality of the model-based approach for any given target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).

more » « less
Free, publicly-accessible full text available January 1, 2025
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

https://doi.org/10.1287/opre.2023.2450

Li, Gen ; Cai, Changxiao ; Chen, Yuxin ; Wei, Yuting ; Chi, Yuejie ( January 2024 , Operations Research)

This paper investigates a model-free algorithm of broad interest in reinforcement learning, namely, Q-learning. Whereas substantial progress had been made toward understanding the sample efficiency of Q-learning in recent years, it remained largely unclear whether Q-learning is sample-optimal and how to sharpen the sample complexity analysis of Q-learning. In this paper, we settle these questions: (1) When there is only a single action, we show that Q-learning (or, equivalently, TD learning) is provably minimax optimal. (2) When there are at least two actions, our theory unveils the strict suboptimality of Q-learning and rigorizes the negative impact of overestimation in Q-learning. Our theory accommodates both the synchronous case (i.e., the case in which independent samples are drawn) and the asynchronous case (i.e., the case in which one only has access to a single Markovian trajectory).

more » « less
Free, publicly-accessible full text available January 1, 2025
CRISPR Empowers Tree Bioengineering for a Sustainable Future

https://doi.org/10.1089/crispr.2023.29161.gli

Li, Gen ; Qi, Yiping ( August 2023 , The CRISPR Journal)

Free, publicly-accessible full text available August 1, 2024
Approximate message passing from random initialization with applications to Z 2 synchronization

https://doi.org/10.1073/pnas.2302930120

Li, Gen ; Fan, Wei ; Wei, Yuting ( August 2023 , Proceedings of the National Academy of Sciences)

This paper is concerned with the problem of reconstructing an unknown rank-one matrix with prior structural information from noisy observations. While computing the Bayes optimal estimator is intractable in general due to the requirement of computing high-dimensional integrations/summations, Approximate Message Passing (AMP) emerges as an efficient first-order method to approximate the Bayes optimal estimator. However, the theoretical underpinnings of AMP remain largely unavailable when it starts from random initialization, a scheme of critical practical utility. Focusing on a prototypical model called Z 2 synchronization, we characterize the finite-sample dynamics of AMP from random initialization, uncovering its rapid global convergence. Our theory—which is nonasymptotic in nature—in this model unveils the non-necessity of a careful initialization for the success of AMP.
more » « less
Free, publicly-accessible full text available August 1, 2024
CRISPR–Cas12a base editors confer efficient multiplexed genome editing in rice

https://doi.org/10.1016/j.xplc.2023.100601

Cheng, Yanhao ; Zhang, Yingxiao ; Li, Gen ; Fang, Hong ; Sretenovic, Simon ; Fan, Avery ; Li, Jiang ; Xu, Jianping ; Que, Qiudeng ; Qi, Yiping ( July 2023 , Plant Communications)

Free, publicly-accessible full text available July 1, 2024
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

https://doi.org/10.1109/CVPR52729.2023.00989

Li, Gen ; Ji, Jie ; Qin, Minghai ; Niu, Wei ; Ren, Bin ; Afghah, Fatemeh ; Guo, Linke ; Ma, Xiaolong ( June 2023 , IEEE)

Free, publicly-accessible full text available June 1, 2024
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

https://doi.org/10.1093/imaiai/iaac034

Li, Gen ; Shi, Laixi ; Chen, Yuxin ; Chi, Yuejie ( February 2023 , Information and Inference: A Journal of the IMA)

Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.
more » « less
Full Text Available
Guide RNA library-based CRISPR screens in plants: opportunities and challenges

https://doi.org/10.1016/j.copbio.2022.102883

Pan, Changtian ; Li, Gen ; Bandyopadhyay, Anindya ; Qi, Yiping ( February 2023 , Current Opinion in Biotechnology)

Full Text Available
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization

https://doi.org/10.1109/TIT.2023.3274152

Li, Gen ; Wang, Ganghua ; Ding, Jie ( January 2023 , IEEE Transactions on Information Theory)

Full Text Available
Softmax policy gradient methods can take exponential time to converge

https://doi.org/10.1007/s10107-022-01920-6

Li, Gen ; Wei, Yuting ; Chi, Yuejie ; Chen, Yuxin ( January 2023 , Mathematical Programming)

Abstract
The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$ $γ$ -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$ $S$ and the effective horizon$$\frac{1}{1-\gamma }$$ $\frac{1}{1 - γ}$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ $η$ can take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ $\begin{matrix} \frac{1}{η} {| S |}^{2^{Ω (\frac{1}{1 - γ})}} iterations \end{matrix}$ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.

more » « less

« Prev Next »