skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CMRM: A Cross-Modal Reasoning Model to Enable Zero-Shot Imitation Learning for Robotic RFID Inventory in Unstructured Environments
The fast development in Deep Learning (DL) has made it a promising technique for various autonomous robotic systems. Recently, researchers have explored deploying DL models, such as Reinforcement Learning and Imitation Learning, to enable robots for Radio-frequency Identification (RFID) based inventory tasks. However, the existing methods are either focused on a single field or need tremendous data and time to train. To address these problems, this paper presents a Cross-Modal Reasoning Model (CMRM), which is designed to extract high-dimension information from multiple sensors and learn to reason from spatial and historical features for latent crossmodal relations. Furthermore, CMRM aligns the learned tasking policy to high-level features to offer zero-shot generalization to unseen environments. We conduct extensive experiments in several virtual environments as well as in indoor settings with robots for RFID inventory. The experimental results demonstrate that the proposed CMRM can significantly improve learning efficiency by around 20 times. It also demonstrates a robust zero-shot generalization for deploying a learned policy in unseen environments to perform RFID inventory tasks successfully.  more » « less
Award ID(s):
2245607 2245608 2300955 1923717
PAR ID:
10519879
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISSN:
2576-6813
ISBN:
979-8-3503-1090-0
Page Range / eLocation ID:
5354 to 5359
Subject(s) / Keyword(s):
Imitating Learning, RFID inventory, Longhorizon tasks, Cross-modal reasoning, multiple sensing spaces
Format(s):
Medium: X
Location:
Kuala Lumpur, Malaysia
Sponsoring Org:
National Science Foundation
More Like this
  1. Long-horizon tasks in unstructured environments are notoriously challenging for robots because they require the prediction of extensive action plans with thousands of steps while adapting to ever-changing conditions by reasoning among multimodal sensing spaces. Humans can efficiently tackle such compound problems by breaking them down into easily reachable abstract sub-goals, significantly reducing complexity. Inspired by this ability, we explore how we can enable robots to acquire sub-goal formulation skills for long-horizon tasks and generalize them to novel situations and environments. To address these challenges, we propose the Zero-shot Abstract Sub-goal Framework (ZAS-F), which empowers robots to decompose overarching action plans into transferable abstract sub-goals, thereby providing zero-shot capability in new task conditions. ZAS-F is an imitation-learning-based method that efficiently learns a task policy from a few demonstrations. The learned policy extracts abstract features from multimodal and extensive temporal observations and subsequently uses these features to predict task-agnostic sub-goals by reasoning about their latent relations. We evaluated ZAS-F in radio frequency identification (RFID) inventory tasks across various dynamic environments, a typical long-horizon task requiring robots to handle unpredictable conditions, including unseen objects and structural layouts. Ourexperiments demonstrated that ZAS-F achieves a learning efficiency 30 times higher than previous methods, requiring only 8k demonstrations. Compared to prior approaches, ZAS-F achieves a 98.3% scanning accuracy while significantly reducing the training data requirement. Further, ZAS-F demonstrated strong generalization, maintaining a scan success rate of 99.4% in real-world deployment without additional finetuning. In long-term operations spanning 100 rooms, ZAS-F maintained consistent performance compared to short-term tasks, highlighting its robustness against compounding errors. These results establish ZAS-F as an efficient and adaptable solution for long-horizon robotic tasks in unstructured environments. 
    more » « less
  2. This project introduces a framework to enable robots to recognize human hand signals, a reliable and feasible device-free means of communication in many noisy environments such as construction sites and airport ramps, to facilitate efficient human-robot collaboration. Various hand signal systems are accepted in many small groups for specific purposes, such as Marshalling on airport ramps and construction site crane operations. Robots must be robust to unpredictable conditions, including various backgrounds and human appearances, an extreme challenge imposed by open environments. To address these challenges, we propose Instant Hand Signal Recognition (IHSR), a learning-based framework with world knowledge of human gestures embedded, for robots to learn novel hand signals in a few samples. It also offers robust zero-shot generalization to recognize learned signals in novel scenarios. Extensive experiments show that our IHSR can learn a novel hand signal in only 50 samples, which is 30+ times more efficient than the state-of-the-art method. It also demonstrates a robust zero-shot generalization for deploying a learned model in unseen environments to recognize hand signals from unseen human users. 
    more » « less
  3. Mobile robot navigation is a critical aspect of robotics, with applications spanning from service robots to industrial automation. However, navigating in complex and dynamic environments poses many challenges, such as avoiding obstacles, making decisions in real-time, and adapting to new situations. Reinforcement Learning (RL) has emerged as a promising approach to enable robots to learn navigation policies from their interactions with the environment. However, application of RL methods to real-world tasks such as mobile robot navigation, and evaluating their performance under various training–testing settings has not been sufficiently researched. In this paper, we have designed an evaluation framework that investigates the RL algorithm’s generalization capability in regard to unseen scenarios in terms of learning convergence and success rates by transferring learned policies in simulation to physical environments. To achieve this, we designed a simulated environment in Gazebo for training the robot over a high number of episodes. The training environment closely mimics the typical indoor scenarios that a mobile robot can encounter, replicating real-world challenges. For evaluation, we designed physical environments with and without unforeseen indoor scenarios. This evaluation framework outputs statistical metrics, which we then use to conduct an extensive study on a deep RL method, namely the proximal policy optimization (PPO). The results provide valuable insights into the strengths and limitations of the method for mobile robot navigation. Our experiments demonstrate that the trained model from simulations can be deployed to the previously unseen physical world with a success rate of over 88%. The insights gained from our study can assist practitioners and researchers in selecting suitable RL approaches and training–testing settings for their specific robotic navigation tasks. 
    more » « less
  4. null (Ed.)
    The problem of learning to generalize on unseen classes during the training step, also known as few-shot classification, has attracted considerable attention. Initialization based methods, such as the gradient-based model agnostic meta-learning (MAML) [1], tackle the few-shot learning problem by “learning to fine-tune”. The goal of these approaches is to learn proper model initialization so that the classifiers for new classes can be learned from a few labeled examples with a small number of gradient update steps. Few shot meta-learning is well-known with its fast-adapted capability and accuracy generalization onto unseen tasks [2]. Learning fairly with unbiased outcomes is another significant hallmark of human intelligence, which is rarely touched in few-shot meta-learning. In this work, we propose a novel Primal-Dual Fair Meta-learning framework, namely PDFM, which learns to train fair machine learning models using only a few examples based on data from related tasks. The key idea is to learn a good initialization of a fair model’s primal and dual parameters so that it can adapt to a new fair learning task via a few gradient update steps. Instead of manually tuning the dual parameters as hyperparameters via a grid search, PDFM optimizes the initialization of the primal and dual parameters jointly for fair meta-learning via a subgradient primal-dual approach. We further instantiate an example of bias controlling using decision boundary covariance (DBC) [3] as the fairness constraint for each task, and demonstrate the versatility of our proposed approach by applying it to classification on a variety of three real-world datasets. Our experiments show substantial improvements over the best prior work for this setting. 
    more » « less
  5. Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner. Given a high-level, symbolic description of a novel concept in terms of previously learned visual concepts and their relations, humans can recognize novel concepts without seeing any examples. Moreover, they can acquire new concepts by parsing and communicating symbolic structures using learned visual concepts and relations. Endowing these capabilities in machines is pivotal in improving their generalization capability at inference time. We introduced Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way. ZeroC represents concepts as graphs of constituent concept models (as nodes) and their relations (as edges). To allow inference time composition, we employed energy-based models (EBMs) to model concepts and relations. We designed ZeroC architecture so that it allows a one-to-one mapping between a symbolic graph structure of a concept and its corresponding EBM, which for the first time, allows acquiring new concepts, communicating its graph structure, and applying it to classification and detection tasks (even across domains) at inference time. We introduced algorithms for learning and inference with ZeroC. We evaluated ZeroC on a challenging grid-world dataset which is designed to probe zero-shot concept recognition and acquisition, and demonstrated its capability. 
    more » « less