skip to main content


Title: FuseBot: RF-Visual Mechanical Search
Mechanical search is a robotic problem where a robot needs to retrieve a target item that is partially or fully occluded from its camera. State-of-the-art approaches for mechanical search either require an expensive search process to find the target item, or they require the item to be tagged with a radio frequency identification tag (e.g., RFID), making their approach beneficial only to tagged items in the environment. We present FuseBot, the first robotic system for RF-Visual mechanical search that enables efficient retrieval of both RFtagged and untagged items in a pile. Rather than requiring all target items in a pile to be RF-tagged, FuseBot leverages the mere existence of an RF-tagged item in the pile to benefit both tagged and untagged items. Our design introduces two key innovations. The first is RF-Visual Mapping, a technique that identifies and locates RF-tagged items in a pile and uses this information to construct an RF-Visual occupancy distribution map. The second is RF-Visual Extraction, a policy formulated as an optimization problem that minimizes the number of actions required to extract the target object by accounting for the probabilistic occupancy distribution, the expected grasp quality, and the expected information gain from future actions. We built a real-time end-to-end prototype of our system on a UR5e robotic arm with in-hand vision and RF perception modules. We conducted over 180 real-world experimental trials to evaluate FuseBot and compare its performance to a of-the-art vision-based system named X-Ray. Our experimental results demonstrate that FuseBot outperforms X-Ray’s efficiency by more than 40% in terms of the number of actions required for successful mechanical search. Furthermore, in comparison to X-Ray’s success rate of 84%, FuseBot achieves a success rate of 95% in retrieving untagged items, demonstrating for the first time that the benefits of RF perception extend beyond tagged objects in the mechanical search problem.  more » « less
Award ID(s):
1844280
NSF-PAR ID:
10393541
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Robotics: Science and Systems 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present the design, implementation, and evaluation of RFusion, a robotic system that can search for and retrieve RFID-tagged items in line-of-sight, non-line-of-sight, and fully-occluded settings. RFusion consists of a robotic arm that has a camera and antenna strapped around its gripper. Our design introduces two key innovations: the first is a method that geometrically fuses RF and visual information to reduce uncertainty about the target object's location, even when the item is fully occluded. The second is a novel reinforcement-learning network that uses the fused RF-visual information to efficiently localize, maneuver toward, and grasp target items. We built an end-to-end prototype of RFusion and tested it in challenging real-world environments. Our evaluation demonstrates that RFusion localizes target items with centimeter-scale accuracy and achieves 96% success rate in retrieving fully occluded objects, even if they are under a pile. The system paves the way for novel robotic retrieval tasks in complex environments such as warehouses, manufacturing plants, and smart homes. 
    more » « less
  2. We present the design, implementation, and evaluation of RF-Grasp, a robotic system that can grasp fully-occluded objects in unknown and unstructured environments. Unlike prior systems that are constrained by the line-of-sight perception of vision and infrared sensors, RF-Grasp employs RF (Radio Frequency) perception to identify and locate target objects through occlusions, and perform efficient exploration and complex manipulation tasks in non-line-of-sight settings.RF-Grasp relies on an eye-in-hand camera and batteryless RFID tags attached to objects of interest. It introduces two main innovations: (1) an RF-visual servoing controller that uses the RFID’s location to selectively explore the environment and plan an efficient trajectory toward an occluded target, and (2) an RF-visual deep reinforcement learning network that can learn and execute efficient, complex policies for decluttering and grasping.We implemented and evaluated an end-to-end physical prototype of RF-Grasp. We demonstrate it improves success rate and efficiency by up to 40-50% over a state-of-the-art baseline. We also demonstrate RF-Grasp in novel tasks such mechanical search of fully-occluded objects behind obstacles, opening up new possibilities for robotic manipulation. Qualitative results (videos) available at rfgrasp.media.mit.edu 
    more » « less
  3. Locating RFID-tagged items in the environment and guiding humans to retrieve the tagged items is an important problem in the RFID community. This paper explores how to exploit synergies between Augmented Reality (AR) headsets and RFID localization to help solve this problem by improving both user experience and localization accuracy. Using fundamental mathematical formulations for RFID localization, we derive confidence metrics and display guidance to the user to improve their experience and enable them to retrieve items faster. We build our primitives into an end - to-end system, RF - AR, and show that it achieves 8.6 cm median localization accuracy within 76 seconds and enables 55% faster retrieval than state-of-the-art past systems. Our results demonstrate that AR-based “human-in-the-loop” designs can make the localization task more accurate and efficient, and thus holds the potential to improve processes where items need to be retrieved quickly, such as in manufacturing, retail, and warehousing. 
    more » « less
  4. null (Ed.)
    Sequential recommendation is the task of predicting the next items for users based on their interaction history. Modeling the dependence of the next action on the past actions accurately is crucial to this problem. Moreover, sequential recommendation often faces serious sparsity of item-to-item transitions in a user's action sequence, which limits the practical utility of such solutions. To tackle these challenges, we propose a Category-aware Collaborative Sequential Recommender. Our preliminary statistical tests demonstrate that the in-category item-to-item transitions are often much stronger indicators of the next items than the general item-to-item transitions observed in the original sequence. Our method makes use of item category in two ways. First, the recommender utilizes item category to organize a user's own actions to enhance dependency modeling based on her own past actions. It utilizes self-attention to capture in-category transition patterns, and determines which of the in-category transition patterns to consider based on the categories of recent actions. Second, the recommender utilizes the item category to retrieve users with similar in-category preferences to enhance collaborative learning across users, and thus conquer sparsity. It utilizes attention to incorporate in-category transition patterns from the retrieved users for the target user. Extensive experiments on two large datasets prove the effectiveness of our solution against an extensive list of state-of-the-art sequential recommendation models. 
    more » « less
  5. The most common sensing modalities found in a robot perception system are vision and touch, which together can provide global and highly localized data for manipulation. However, these sensing modalities often fail to adequately capture the behavior of target objects during the critical moments as they transition out of static, controlled contact with an end-effector to dynamic and uncontrolled motion. In this work, we present a novel multimodal visuotactile sensor that provides simultaneous visuotactile and proximity depth data. The sensor integrates an RGB camera and air pressure sensor to sense touch with an infrared time-of-flight (ToF) camera to sense proximity by leveraging a selectively transmissive soft membrane to enable the dual sensing modalities. We present the mechanical design, fabrication techniques, algorithm implementations, and evaluation of the sensor's tactile and proximity modalities. The sensor is demonstrated in three open-loop robotic tasks: approaching and contacting an object, catching, and throwing. The fusion of tactile and proximity data could be used to capture key information about a target object's transition behavior for sensor-based control in dynamic manipulation. 
    more » « less