skip to main content

Title: FabricFlowNet: Bimanual Cloth Manipulation with a Flow-based Policy
We address the problem of goal-directed cloth manipulation, a chal- lenging task due to the deformability of cloth. Our insight is that optical flow, a technique normally used for motion estimation in video, can also provide an effective representation for corresponding cloth poses across observation and goal images. We introduce FabricFlowNet (FFN), a cloth manipulation policy that leverages flow as both an input and as an action representation to improve performance. FabricFlowNet also elegantly switches between dual-arm and single- arm actions based on the desired goal. We show that FabricFlowNet significantly outperforms state-of-the-art model-free and model-based cloth manipulation policies. We also present real-world experiments on a bimanual system, demonstrating effective sim-to-real transfer. Finally, we show that our method generalizes when trained on a single square cloth to other cloth shapes, such as T-shirts and rectangular cloths. Video and other supplementary materials are available at:
; ; ; ;
Award ID(s):
2046491 1849154
Publication Date:
Journal Name:
Conference on Robot Learning
Sponsoring Org:
National Science Foundation
More Like this
  1. Robotic manipulation of cloth remains challenging due to the complex dynamics of cloth, lack of a low-dimensional state representation, and self-occlusions. In contrast to previous model-based approaches that learn a pixel-based dynamics model or a compressed latent vector dynamics, we propose to learn a particle-based dynamics model from a partial point cloud observation. To overcome the challenges of partial observability, we infer which visible points are connected on the underlying cloth mesh. We then learn a dynamics model over this visible connectivity graph. Compared to previous learning-based approaches, our model poses strong inductive bias with its particle based representation for learning the underlying cloth physics; it can generalize to cloths with novel shapes; it is invariant to visual features; and the predictions can be more easily visualized. We show that our method greatly outperforms previous state-of-the-art model-based and model-free reinforcement learning methods in simulation. Furthermore, we demonstrate zero-shot sim-to-real transfer where we deploy the model trained in simulation on a Franka arm and show that the model can successfully smooth cloths of different materials, geometries and colors from crumpled configurations.
  2. We present a framework for deformable object manipulation that interleaves planning and control, enabling complex manipulation tasks without relying on high-fidelity modeling or simulation. The key question we address is when should we use planning and when should we use control to achieve the task? Planners are designed to find paths through complex configuration spaces, but for highly underactuated systems, such as deformable objects, achieving a specific configuration is very difficult even with high-fidelity models. Conversely, controllers can be designed to achieve specific configurations, but they can be trapped in undesirable local minima owing to obstacles. Our approach consists of three components: (1) a global motion planner to generate gross motion of the deformable object; (2) a local controller for refinement of the configuration of the deformable object; and (3) a novel deadlock prediction algorithm to determine when to use planning versus control. By separating planning from control we are able to use different representations of the deformable object, reducing overall complexity and enabling efficient computation of motion. We provide a detailed proof of probabilistic completeness for our planner, which is valid despite the fact that our system is underactuated and we do not have a steering function. We thenmore »demonstrate that our framework is able to successfully perform several manipulation tasks with rope and cloth in simulation, which cannot be performed using either our controller or planner alone. These experiments suggest that our planner can generate paths efficiently, taking under a second on average to find a feasible path in three out of four scenarios. We also show that our framework is effective on a 16-degree-of-freedom physical robot, where reachability and dual-arm constraints make the planning more difficult.« less
  3. Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also themore »most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.« less
  4. In multi-agent domains (MADs), an agent's action may not just change the world and the agent's knowledge and beliefs about the world, but also may change other agents' knowledge and beliefs about the world and their knowledge and beliefs about other agents' knowledge and beliefs about the world. The goals of an agent in a multi-agent world may involve manipulating the knowledge and beliefs of other agents' and again, not just their knowledge/belief about the world, but also their knowledge about other agents' knowledge about the world. Our goal is to present an action language (mA+) that has the necessary features to address the above aspects in representing and RAC in MADs. mA+ allows the representation of and reasoning about different types of actions that an agent can perform in a domain where many other agents might be present -- such as world-altering actions, sensing actions, and announcement/communication actions. It also allows the specification of agents' dynamic awareness of action occurrences which has future implications on what agents' know about the world and other agents' knowledge about the world. mA+ considers three different types of awareness: full-, partial- awareness, and complete oblivion of an action occurrence and its effects. Thismore »keeps the language simple, yet powerful enough to address a large variety of knowledge manipulation scenarios in MADs. The semantics of mA+ relies on the notion of state, which is described by a pointed Kripke model and is used to encode the agent's knowledge and the real state of the world. It is defined by a transition function that maps pairs of actions and states into sets of states. We illustrate properties of the action theories, including properties that guarantee finiteness of the set of initial states and their practical implementability. Finally, we relate mA+ to other related formalisms that contribute to RAC in MADs.« less
  5. We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. Human hands and robot hands differ in shape, size, and joint structure, and performing this translation from a single uncalibrated camera is a highly underconstrained problem. Moreover, the retargeted trajectories must effectively execute tasks on a physical robot, which requires them to be temporally smooth and free of self-collisions. Our key insight is that while paired human-robot correspondence data is expensive to collect, the internet contains a massive corpus of rich and diverse human hand videos. We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth, swift, safe, and semantically similar to the guiding demonstration. We demonstrate that it enables previously untrained people to teleoperate a robot on various dexterous manipulation tasks. Our low-cost, glove-free, marker-free remote teleoperation system makes robot teaching more accessible and we hope that it can aid robots that learn to act autonomously in the real world.