NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Meta-Learning Parameterized Skills

Fu, Hoatian; Yu, Shangqun; Tiwari, Saket; Littman, Michael; Konidaris, George (July 2023, Proceedings of the Fortieth International Conference on Machine Learning)

We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a three-level hierarchical framework that models a Temporally-extended Parameterized Action Markov Decision Process. We empirically demonstrate that the proposed algorithms enable an agent to solve a set of difficult long-horizon (obstacle-course and robot manipulation) tasks.
more » « less
Full Text Available
Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs

Fu, H; Yao, J; Gottesman, O; Doshi-Velez, F; Konidaris, GD (May 2023, Proceedings of the Eleventh International Conference on Learning Representations)

In the Hidden-Parameter MDP (HiP-MDP) framework, a family of reinforcement learning tasks is generated by varying hidden parameters specifying the dynamics and reward function for each individual task. The HiP-MDP is a natural model for families of tasks in which meta- and lifelong-reinforcement learning approaches can succeed. Given a learned context encoder that infers the hidden parameters from previous experience, most existing algorithms fall into two categories: model transfer and policy transfer, depending on which function the hidden parameters are used to parameterize. We characterize the robustness of model and policy transfer algorithms with respect to hidden parameter estimation error. We first show that the value function of HiP-MDPs is Lipschitz continuous under certain conditions. We then derive regret bounds for both settings through the lens of Lipschitz continuity. Finally, we empirically corroborate our theoretical analysis by varying the hyper-parameters governing the Lipschitz constants of two continuous control problems; the resulting performance is consistent with our theoretical results.
more » « less
Full Text Available
Coarse-Grained Smoothness for Reinforcement Learning in Metric Spaces

Gottesman, O; Asadi, K; Allen, C; Lobel, S; Konidaris, GD; Littman, ML (April 2023, Proceedings of the 26th International Conference on Artificial Intelligence and Statistics)

Principled decision-making in continuous state-action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning. We provide a theoretical analysis of our new smoothness definition, and discuss its implications and impact on control and exploration in continuous domains.
more » « less
Full Text Available
Q-Functionals For Value-Based Continuous Control

Lobel, S; Rammohan, S; He, B; Yu, S; Konidaris, GD (February 2023, Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence)

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.
more » « less
Full Text Available
Optimistic Initialization for Exploration in Continuous Control

Lobel, S; Bagaria, A; Allen, C; Gottesman, O; Konidaris, G.D. (February 2022, Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence)

Optimistic initialization underpins many theoretically sound exploration schemes in tabular domains; however, in the deep function approximation setting, optimism can quickly disappear if initialized naively. We propose a framework for more effectively incorporating optimistic initialization into reinforcement learning for continuous control. Our approach uses metric information about the state-action space to estimate which transitions are still unexplored, and explicitly maintains the initial Q-value optimism for the corresponding state-action pairs. We also develop methods for efficiently approximating these training objectives, and for incorporating domain knowledge into the optimistic envelope to improve sample efficiency. We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques.
more » « less
Full Text Available
Learning Markov State Abstractions for Deep Reinforcement Learning

Allen, C; Parikh, N; Gottesman, O; Konidaris, G.D. (December 2021, Neural Information Processing Systems 34)

A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features—often matching or exceeding the performance achieved with hand-designed compact state information.
more » « less
Full Text Available
Learning to Detect Multi-Modal Grasps for Dexterous Grasping in Dense Clutter

https://doi.org/10.1109/IROS51168.2021.9636876

Corsaro, M; Tellex, S; Konidaris, GD (September 2021, Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems)

We propose an approach to multi-modal grasp detection that jointly predicts the probabilities that several types of grasps succeed at a given grasp pose. Given a partial point cloud of a scene, the algorithm proposes a set of feasible grasp candidates, then estimates the probabilities that a grasp of each type would succeed at each candidate pose. Predicting grasp success probabilities directly from point clouds makes our approach agnostic to the number and placement of depth sensors at execution time. We evaluate our system both in simulation and on a real robot with a Robotiq 3-Finger Adaptive Gripper and compare our network against several baselines that perform fewer types of grasps. Our experiments show that a system that explicitly models grasp type achieves an object retrieval rate 8.5% higher in a complex cluttered environment than our highest-performing baseline.
more » « less
Full Text Available
Robustly Learning Composable Options in Deep Reinforcement Learning

https://doi.org/10.24963/ijcai.2021/298

Bagaria, A; Senthil, J; Slivinski, M; Konidaris, G.D. (August 2021, Proceedings of the 30th International Joint Conference on Artificial Intelligence)

Hierarchical reinforcement learning (HRL) is only effective for long-horizon problems when high-level skills can be reliably sequentially executed. Unfortunately, learning reliably composable skills is difficult, because all the components of every skill are constantly changing during learning. We propose three methods for improving the composability of learned skills: representing skill initiation regions using a combination of pessimistic and optimistic classifiers; learning re-targetable policies that are robust to non-stationary subgoal regions; and learning robust option policies using model-based RL. We test these improvements on four sparse-reward maze navigation tasks involving a simulated quadrupedal robot. Each method successively improves the robustness of a baseline skill discovery method, substantially outperforming state-of-the-art flat and hierarchical methods.
more » « less
Full Text Available
Skill Discovery for Exploration and Planning using Deep Skill Graphs

Bagaria, A; Senthil, J.; Konidaris, G.D. (July 2021, Proceedings of the Thirty-Eighth International Conference on Machine Learning)

We introduce a new skill-discovery algorithm that builds a discrete graph representation of large continuous MDPs, where nodes correspond to skill subgoals and the edges to skill policies. The agent constructs this graph during an unsupervised training phase where it interleaves discovering skills and planning using them to gain coverage over ever-increasing portions of the state-space. Given a novel goal at test time, the agent plans with the acquired skill graph to reach a nearby state, then switches to learning to reach the goal. We show that the resulting algorithm, Deep Skill Graphs, outperforms both flat and existing hierarchical reinforcement learning methods on four difficult continuous control tasks.
more » « less
Full Text Available
Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

Roy, J; Konidaris, G.D. (February 2021, Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence)

We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and both the easy and hard settings of of 16 OpenAI Procgen environments.
more » « less
Full Text Available

« Prev Next »

Search for: All records