skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FuseBot: RF-Visual Mechanical Search
Mechanical search is a robotic problem where a robot needs to retrieve a target item that is partially or fully occluded from its camera. State-of-the-art approaches for mechanical search either require an expensive search process to find the target item, or they require the item to be tagged with a radio frequency identification tag (e.g., RFID), making their approach beneficial only to tagged items in the environment. We present FuseBot, the first robotic system for RF-Visual mechanical search that enables efficient retrieval of both RFtagged and untagged items in a pile. Rather than requiring all target items in a pile to be RF-tagged, FuseBot leverages the mere existence of an RF-tagged item in the pile to benefit both tagged and untagged items. Our design introduces two key innovations. The first is RF-Visual Mapping, a technique that identifies and locates RF-tagged items in a pile and uses this information to construct an RF-Visual occupancy distribution map. The second is RF-Visual Extraction, a policy formulated as an optimization problem that minimizes the number of actions required to extract the target object by accounting for the probabilistic occupancy distribution, the expected grasp quality, and the expected information gain from future actions. We built a real-time end-to-end prototype of our system on a UR5e robotic arm with in-hand vision and RF perception modules. We conducted over 180 real-world experimental trials to evaluate FuseBot and compare its performance to a of-the-art vision-based system named X-Ray. Our experimental results demonstrate that FuseBot outperforms X-Ray’s efficiency by more than 40% in terms of the number of actions required for successful mechanical search. Furthermore, in comparison to X-Ray’s success rate of 84%, FuseBot achieves a success rate of 95% in retrieving untagged items, demonstrating for the first time that the benefits of RF perception extend beyond tagged objects in the mechanical search problem.  more » « less
Award ID(s):
1844280
PAR ID:
10393541
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Robotics: Science and Systems 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present the design, implementation, and evaluation of RFusion, a robotic system that can search for and retrieve RFID-tagged items in line-of-sight, non-line-of-sight, and fully-occluded settings. RFusion consists of a robotic arm that has a camera and antenna strapped around its gripper. Our design introduces two key innovations: the first is a method that geometrically fuses RF and visual information to reduce uncertainty about the target object's location, even when the item is fully occluded. The second is a novel reinforcement-learning network that uses the fused RF-visual information to efficiently localize, maneuver toward, and grasp target items. We built an end-to-end prototype of RFusion and tested it in challenging real-world environments. Our evaluation demonstrates that RFusion localizes target items with centimeter-scale accuracy and achieves 96% success rate in retrieving fully occluded objects, even if they are under a pile. The system paves the way for novel robotic retrieval tasks in complex environments such as warehouses, manufacturing plants, and smart homes. 
    more » « less
  2. We present the design, implementation, and evaluation of RF-Grasp, a robotic system that can grasp fully-occluded objects in unknown and unstructured environments. Unlike prior systems that are constrained by the line-of-sight perception of vision and infrared sensors, RF-Grasp employs RF (Radio Frequency) perception to identify and locate target objects through occlusions, and perform efficient exploration and complex manipulation tasks in non-line-of-sight settings.RF-Grasp relies on an eye-in-hand camera and batteryless RFID tags attached to objects of interest. It introduces two main innovations: (1) an RF-visual servoing controller that uses the RFID’s location to selectively explore the environment and plan an efficient trajectory toward an occluded target, and (2) an RF-visual deep reinforcement learning network that can learn and execute efficient, complex policies for decluttering and grasping.We implemented and evaluated an end-to-end physical prototype of RF-Grasp. We demonstrate it improves success rate and efficiency by up to 40-50% over a state-of-the-art baseline. We also demonstrate RF-Grasp in novel tasks such mechanical search of fully-occluded objects behind obstacles, opening up new possibilities for robotic manipulation. Qualitative results (videos) available at rfgrasp.media.mit.edu 
    more » « less
  3. null (Ed.)
    Sequential recommendation is the task of predicting the next items for users based on their interaction history. Modeling the dependence of the next action on the past actions accurately is crucial to this problem. Moreover, sequential recommendation often faces serious sparsity of item-to-item transitions in a user's action sequence, which limits the practical utility of such solutions. To tackle these challenges, we propose a Category-aware Collaborative Sequential Recommender. Our preliminary statistical tests demonstrate that the in-category item-to-item transitions are often much stronger indicators of the next items than the general item-to-item transitions observed in the original sequence. Our method makes use of item category in two ways. First, the recommender utilizes item category to organize a user's own actions to enhance dependency modeling based on her own past actions. It utilizes self-attention to capture in-category transition patterns, and determines which of the in-category transition patterns to consider based on the categories of recent actions. Second, the recommender utilizes the item category to retrieve users with similar in-category preferences to enhance collaborative learning across users, and thus conquer sparsity. It utilizes attention to incorporate in-category transition patterns from the retrieved users for the target user. Extensive experiments on two large datasets prove the effectiveness of our solution against an extensive list of state-of-the-art sequential recommendation models. 
    more » « less
  4. Viral marketing on social networks, also known as Influence Maximization (IM), aims to select k users for the promotion of a target item by maximizing the total spread of their influence. However, most previous works on IM do not explore the dynamic user perception of promoted items in the process. In this paper, by exploiting the knowledge graph (KG) to capture dynamic user perception, we formulate the problem of Influence Maximization based on Dynamic Personal Perception (IMDPP) that considers user preferences and social influence reflecting the impact of relevant item adoptions. We prove the hardness of IMDPP and design an approximation algorithm, named Dynamic perception for seeding in target markets (Dysim), by exploring the concepts of dynamic reachability, target markets, and substantial influence to select and promote a sequence of relevant items. We evaluate the performance of Dysim in comparison with the state-of-the-art approaches using real social networks with real KGs. The experimental results show that Dysim effectively achieves at least 6 times of influence spread in large datasets over the state-of-the-art approaches. 
    more » « less
  5. Abstract Unconscious neural activity has been shown to precede both motor and cognitive acts. In the present study, we investigated the neural antecedents of overt attention during visual search, where subjects make voluntary saccadic eye movements to search a cluttered stimulus array for a target item. Building on studies of both overt self-generated motor actions (Lau et al., 2004, Soon et al., 2008) and self-generated cognitive actions (Bengson et al., 2014, Soon et al., 2013), we hypothesized that brain activity prior to the onset of a search array would predict the direction of the first saccade during unguided visual search. Because both spatial attention and gaze are coordinated during visual search, both cognition and motor actions are coupled during visual search. A well-established finding in fMRI studies of willed action is that neural antecedents of the intention to make a motor act (e.g., reaching) can be identified seconds before the action occurs. Studies of the volitional control ofcovertspatial attention in EEG have shown that predictive brain activity is limited to only a few hundred milliseconds before a voluntary shift of covert spatial attention. In the present study, the visual search task and stimuli were designed so that subjects could not predict the onset of the search array. Perceptual task difficulty was high, such that they could not locate the target using covert attention alone, thus requiring overt shifts of attention (saccades) to carry out the visual search. If the first saccade to the array onset in unguided visual search shares mechanisms with willed shifts of covert attention, we expected predictive EEG alpha-band activity (8-12 Hz) immediately prior to the array onset (within 1 sec) (Bengson et al., 2014; Nadra et al., 2023). Alternatively, if they follow the principles of willed motor actions, predictive neural signals should be reflected in broadband EEG activity (Libet et al., 1983) and would likely emerge earlier (Soon et al., 2008). Applying support vector machine decoding, we found that the direction of the first saccade in an unguided visual search could be predicted up to two seconds preceding the search array’s onset in the broadband but not alpha-band EEG. These findings suggest that self-directed eye movements in visual search emerge from early preparatory neural activity more akin to willed motor actions than to covert willed attention. This highlights a distinct role for unconscious neural dynamics in shaping visual search behavior. 
    more » « less