M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

Yuan, Wentao; Murali, Adithyavairavan; Mousavian, Arsalan

Citation Details

This content will become publicly available on November 6, 2024

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation [1, 2, 3, 4]. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models [5, 6] excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state- of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes where the object needs to be re-oriented for collision- free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench [7]. Videos of robot experiments on unseen objects in both real world and simulation are available on our project website https://m2-t2.github.io. more »

Award ID(s):: 2024057

NSF-PAR ID:: 10476834

Author(s) / Creator(s):: Yuan, Wentao; Murali, Adithyavairavan; Mousavian, Arsalan

Editor(s):: Tan, Jie; Toussaint, Marc

Publisher / Repository:: https://openreview.net/forum?id=6zGpfOBImD

Date Published:: 2023-11-06

Journal Name:: 7th Annual Conference on Robot Learning

Subject(s) / Keyword(s):: ["Robot manipulation","multi-task transformer"]

Format(s):: Medium: X

Location:: Atlanta, GA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on November 6, 2024
Conference Paper:
The DOI is not currently available.

More Like this