Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principlemore »Free, publicly-accessible full text available February 21, 2024
-
Free, publicly-accessible full text available February 1, 2024
-
Abstract Background Cas12a (formerly known as Cpf1), the class II type V CRISPR nuclease, has been widely used for genome editing in mammalian cells and plants due to its distinct characteristics from Cas9. Despite being one of the most robust Cas12a nucleases, LbCas12a in general is less efficient than SpCas9 for genome editing in human cells, animals, and plants.
Results To improve the editing efficiency of LbCas12a, we conduct saturation mutagenesis in
E. coli and identify 1977 positive point mutations of LbCas12a. We selectively assess the editing efficiency of 56 LbCas12a variants in human cells, identifying an optimal LbCas12a variant (RVQ: G146R/R182V/E795Q) with the most robust editing activity. We further test LbCas12a-RV, LbCas12a-RRV, and LbCas12a-RVQ in plants and find LbCas12a-RV has robust editing activity in rice and tomato protoplasts. Interestingly, LbCas12a-RRV, resulting from the stacking of RV and D156R, displays improved editing efficiency in stably transformed rice and poplar plants, leading to up to 100% editing efficiency inT 0plants of both plant species. Moreover, this high-efficiency editing occurs even at the non-canonical TTV PAM sites.Conclusions Our results demonstrate that LbCas12a-RVQ is a powerful tool for genome editing in human cells while LbCas12a-RRV confers robust genome editing in plants. Our study reveals the tremendous potential ofmore »
-
Abstract The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For
-discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$$\gamma $$ and the effective horizon$${\mathcal {S}}$$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\frac{1}{1-\gamma }$$ can take$$\eta $$ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization inmore »$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ -
Free, publicly-accessible full text available November 4, 2023
-
Free, publicly-accessible full text available September 30, 2023
-
We study a noisy tensor completion problem of broad practical interest, namely, the reconstruction of a low-rank tensor from highly incomplete and randomly corrupted observations of its entries. Whereas a variety of prior work has been dedicated to this problem, prior algorithms either are computationally too expensive for large-scale applications or come with suboptimal statistical guarantees. Focusing on “incoherent” and well-conditioned tensors of a constant canonical polyadic rank, we propose a two-stage nonconvex algorithm—(vanilla) gradient descent following a rough initialization—that achieves the best of both worlds. Specifically, the proposed nonconvex algorithm faithfully completes the tensor and retrieves all individual tensor factors within nearly linear time, while at the same time enjoying near-optimal statistical guarantees (i.e., minimal sample complexity and optimal estimation accuracy). The estimation errors are evenly spread out across all entries, thus achieving optimal [Formula: see text] statistical accuracy. We also discuss how to extend our approach to accommodate asymmetric tensors. The insight conveyed through our analysis of nonconvex optimization might have implications for other tensor estimation problems.