Search for: All records

Creators/Authors contains: "Agarwal, Alekh"

« Prev Next »

Total Resources

10

Resource Type
Conference Paper

9

Conference Proceeding

0

Dataset

0

Journal Article

1

Workshop Report

0

Availability
Full Text / Resource Available

9

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Provable Benefits of Representational Transfer in Reinforcement Learning

Agarwal, Alekh ; Song, Yuda ; Sun, Wen ; Wang, Kaiwen ; Wang, Mengdi ; Zhang, Xuezhou ( July 2023 , The Conference on Learning Theory)

Free, publicly-accessible full text available July 12, 2024
Adversarially Trained Actor Critic for Offline Reinforcement Learning

Cheng, Ching-An ; Xie, Tengyang ; Jiang, Nan ; Agarwal, Alekh ( July 2022 , Proceedings of the 39th International Conference on Machine Learning)

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
more » « less
Full Text Available
Towards a Dimension-Free Understanding of Adaptive Linear Control

Perdomo, Juan C ; Simchowitz, Max ; Agarwal, Alekh ; Bartlett, Peter ( October 2021 , Proceedings of Thirty Fourth Conference on Learning Theory)
null (Ed.)
Full Text Available
Towards a Dimension-Free Understanding of Adaptive Linear Control

Perdomo, Juan ; Simchowitz, Max ; Agarwal, Alekh ; Bartlett, Peter L. ( July 2021 , Proceedings of the 34th Conference on Learning Theory (COLT2021))
null (Ed.)
Full Text Available
Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

Agarwal, Alekh ; Kakade, Sham ; Yang, Lin F. ( July 2020 , Proceedings of Machine Learning Research)

This work considers the sample and computational complexity of obtaining an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. We are interested in a basic and unresolved question in model based planning: is this naïve “plug-in” approach — where we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP — non-asymptotically, minimax optimal? Our main result answers this question positively. With regards to computation, our result provides a simpler approach towards minimax optimal planning: in comparison to prior model-free results, we show that using \emph{any} high accuracy, black-box planning oracle in the empirical model suffices to obtain the minimax error rate. The key proof technique uses a leave-one-out analysis, in a novel “absorbing MDP” construction, to decouple the statistical dependency issues that arise in the analysis of model-based planning; this construction may be helpful more generally.
more » « less
Full Text Available
Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits?

Wei, Chen-Yu ; Luo, Haipeng ; Agarwal, Alekh ( July 2020 , Conference on Learning Theory)

Full Text Available
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Agarwal, Alekh ; Henaff, Mikael ; Kakade, Sham ; Sun, Wen ( January 2020 , Advances in neural information processing systems)
null (Ed.)
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their primary drawback is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approaches and Q-learning can, at least in theory, directly handle exploration through the use of optimism, their ability to handle model misspecification and function approximation is far less evident. This work introduces the the POLICY COVER GUIDED POLICY GRADIENT (PC- PG) algorithm, which provably balances the exploration vs. exploitation tradeoff using an ensemble of learned policies (the policy cover). PC-PG enjoys polynomial sample complexity and run time for both tabular MDPs and, more generally, linear MDPs in an infinite dimensional RKHS. Furthermore, PC-PG also has strong guarantees under model misspecification that go beyond the standard worst case L infinity assumptions; these include approximation guarantees for state aggregation under an average case error assumption, along with guarantees under a more general assumption where the approximation error under distribution shift is controlled. We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
more » « less
Full Text Available
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

Agarwal, Alekh ; Kakade, Sham ; Krishnamurthy, Akshay ; Sun, Wen ( January 2020 , Advances in neural information processing systems)

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space. This work focuses on the representation learning question: how can we learn such features? Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem. Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations, such as block MDPs, for representation learning in RL. Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models. On a technical level, our analysis eliminates reachability assumptions that appear in prior results on the simpler block MDP model and may be of independent interest.
more » « less
Full Text Available
Efficient Contextual Bandits in Non-stationary Worlds

Luo, Haipeng ; Wei, Chen-Yu ; Agarwal, Alekh ; Langford, John ( July 2018 , Proceedings of Machine Learning Research)

Full Text Available
Practical Contextual Bandits with Regression Oracles

Foster, Dylan ; Agarwal, Alekh ; Dudik, Miroslav ; Luo, Haipeng ; Schapire, Robert ( July 2018 , Proceedings of Machine Learning Research)

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function class, a more realistic and reasonable oracle than the classification oracles over policies typically assumed by agnostic methods. Our approach generalizes both UCB and LinUCB to far more expressive possible model classes and achieves low regret under certain distributional assumptions. In an extensive empirical evaluation, we find that our approach typically matches or outperforms both realizability-based and agnostic baselines.
more » « less
Full Text Available