Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this paper, we propose a novel approach to address this challenge. We show that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space. Unlike existing approaches that focus on LTL specifications expressible as regular expressions, our technique is unrestricted and generalizes to ω-regular expressions. Experiment results demonstrate the effectiveness of our approach in adapting goal-conditioned RL agents to satisfy complex temporal logic task specifications zero-shot.
more »
« less
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.
more »
« less
- Award ID(s):
- 1730628
- PAR ID:
- 10523933
- Publisher / Repository:
- PMLR
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
While dense retrieval has been shown to be effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance labels are available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first zero-shot prompts an instruction-following language model (e.g., InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is “fake” and may contain hallucinations. Then, an unsupervised contrastively learned encoder (e.g., Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, from which similar real documents are retrieved based on vector similarity. This second step grounds the generated document to the actual corpus, with the encoder’s dense bottleneck filtering out the hallucinations. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers across various tasks (e.g. web search, QA, fact verification) and in non-English languages (e.g., sw, ko, ja, bn).more » « less
-
Long-horizon tasks in unstructured environments are notoriously challenging for robots because they require the prediction of extensive action plans with thousands of steps while adapting to ever-changing conditions by reasoning among multimodal sensing spaces. Humans can efficiently tackle such compound problems by breaking them down into easily reachable abstract sub-goals, significantly reducing complexity. Inspired by this ability, we explore how we can enable robots to acquire sub-goal formulation skills for long-horizon tasks and generalize them to novel situations and environments. To address these challenges, we propose the Zero-shot Abstract Sub-goal Framework (ZAS-F), which empowers robots to decompose overarching action plans into transferable abstract sub-goals, thereby providing zero-shot capability in new task conditions. ZAS-F is an imitation-learning-based method that efficiently learns a task policy from a few demonstrations. The learned policy extracts abstract features from multimodal and extensive temporal observations and subsequently uses these features to predict task-agnostic sub-goals by reasoning about their latent relations. We evaluated ZAS-F in radio frequency identification (RFID) inventory tasks across various dynamic environments, a typical long-horizon task requiring robots to handle unpredictable conditions, including unseen objects and structural layouts. Ourexperiments demonstrated that ZAS-F achieves a learning efficiency 30 times higher than previous methods, requiring only 8k demonstrations. Compared to prior approaches, ZAS-F achieves a 98.3% scanning accuracy while significantly reducing the training data requirement. Further, ZAS-F demonstrated strong generalization, maintaining a scan success rate of 99.4% in real-world deployment without additional finetuning. In long-term operations spanning 100 rooms, ZAS-F maintained consistent performance compared to short-term tasks, highlighting its robustness against compounding errors. These results establish ZAS-F as an efficient and adaptable solution for long-horizon robotic tasks in unstructured environments.more » « less
-
UniT is an approach to tactile representation learn¬ing, using VQGAN to learn a compact latent space and serve as the tactile representation. It uses tactile images obtained from a single simple object to train the representation with generalizability. This tactile representation can be zero-shot transferred to various downstream tasks, including perception tasks and manipulation policy learning. Our benchmarkings on in-hand 3D pose and 6D pose estimation tasks and a tactile classifcation task show that UniT outperforms existing visual and tactile representation learning methods. Additionally, UniT’s effectiveness in policy learning is demonstrated across three real-world tasks involving diverse manipulated objects and complex robot-object-environment interactions. Through extensive experi¬mentation, UniT is shown to be a simple-to-train, plug-and-play, yet widely effective method for tactile representation learning. For more details, please refer to our open-source repository https://github.com/ZhengtongXu/UniT and the project website https://zhengtongxu.github.io/unit-website/.more » « less
-
The fast development in Deep Learning (DL) has made it a promising technique for various autonomous robotic systems. Recently, researchers have explored deploying DL models, such as Reinforcement Learning and Imitation Learning, to enable robots for Radio-frequency Identification (RFID) based inventory tasks. However, the existing methods are either focused on a single field or need tremendous data and time to train. To address these problems, this paper presents a Cross-Modal Reasoning Model (CMRM), which is designed to extract high-dimension information from multiple sensors and learn to reason from spatial and historical features for latent crossmodal relations. Furthermore, CMRM aligns the learned tasking policy to high-level features to offer zero-shot generalization to unseen environments. We conduct extensive experiments in several virtual environments as well as in indoor settings with robots for RFID inventory. The experimental results demonstrate that the proposed CMRM can significantly improve learning efficiency by around 20 times. It also demonstrates a robust zero-shot generalization for deploying a learned policy in unseen environments to perform RFID inventory tasks successfully.more » « less