skip to main content


Title: AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions
Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically fine-grained. However, taking only passive observations as inputs, these methods ignore many hidden but important kinematic constraints (e.g., joint location and limits) and dynamic factors (e.g., joint friction and restitution), therefore losing significant accuracy for test cases with such uncertainties. In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors. We conduct large-scale experiments using the PartNet-Mobility dataset and prove that our system performs better than baselines.  more » « less
Award ID(s):
1763268
NSF-PAR ID:
10381767
Author(s) / Creator(s):
Date Published:
Journal Name:
European Conference on Computer Vision 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals. We design an interaction-for-perception framework VAT-Mart to learn such actionable visual representations by simultaneously training a curiosity-driven reinforcement learning policy exploring diverse interaction trajectories and a perception module summarizing and generalizing the explored knowledge for pointwise predictions among diverse shapes. Experiments prove the effectiveness of the proposed approach using the large-scale PartNet-Mobility dataset in SAPIEN environment and show promising generalization capabilities to novel test shapes, unseen object categories, and real-world data. 
    more » « less
  2. Highly articulated organisms serve as blueprints for incredibly dexterous mechanisms, but building similarly capable robotic counterparts has been hindered by the difficulties of developing electromechanical actuators with both the high strength and compactness of biological muscle. We develop a stackable electrostatic brake that has comparable specific tension and weight to that of muscles and integrate it into a robotic joint. High degree-of-freedom mechanisms composed of such electrostatic brake enabled joints can then employ established control algorithms to achieve hybrid motor-brake actuated dexterous manipulation. Specifically, our joint design enables a ten degree-of-freedom robot equipped with only one motor to manipulate multiple objects simultaneously. We also show that the use of brakes allows a two-fingered robot to perform in-hand re-positioning of an object 45% more quickly and with 53% lower positioning error than without brakes. Relative to fully actuated robots, robots equipped with such electrostatic brakes will have lower weight, volume, and power consumption yet retain the ability to reach arbitrary joint configurations.

     
    more » « less
  3. Contrary to the vast literature in modeling, perceiving, and understanding agent-object (e.g., human-object, hand-object, robot-object) interaction in computer vision and robotics, very few past works have studied the task of object-object interaction, which also plays an important role in robotic manipulation and planning tasks. There is a rich space of object-object interaction scenarios in our daily life, such as placing an object on a messy tabletop, fitting an object inside a drawer, pushing an object using a tool, etc. In this paper, we propose a unified affordance learning framework to learn object-object interaction for various tasks. By constructing four object-object interaction task environments using physical simulation (SAPIEN) and thousands of ShapeNet models with rich geometric diversity, we are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations. At the core of technical contribution, we propose an object-kernel point convolution network to reason about detailed interaction between two objects. Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach. 
    more » « less
  4. Synopsis

    Seaweeds inhabiting wave-battered coastlines are generally flexible, bending with the waves to adopt more streamlined shapes and reduce drag. Coralline algae, however, are firmly calcified, existing largely as crusts that avoid drag altogether or as upright branched forms with uncalcified joints (genicula) that confer flexibility to otherwise rigid thalli. Upright corallines have evolved from crustose ancestors independently multiple times, and the repeated evolution of genicula has contributed to the ecological success of articulated corallines worldwide. Structure and development of genicula are significantly different across evolutionary lineages, and yet biomechanical performance is broadly similar. Because chemical composition plays a central role in both calcification and biomechanics, we explored evolutionary trends in cell wall chemistry across crustose and articulated taxa. We compared the carbohydrate content of genicula across convergently evolved articulated species, as well as the carbohydrate content of calcified tissues from articulated and crustose species, to search for phylogenetic trends in cell wall chemistry during the repeated evolution of articulated taxa. We also analyzed the carbohydrate content of one crustose coralline species that evolved from articulated ancestors, allowing us to examine trends in chemistry during this evolutionary reversal and loss of genicula. We found several key differences in carbohydrate content between calcified and uncalcified coralline tissues, though the significance of these differences in relation to the calcification process requires more investigation. Comparisons across a range of articulated and crustose species indicated that carbohydrate chemistry of calcified tissues was generally similar, regardless of morphology or phylogeny; conversely, chemical composition of genicular tissues was different across articulated lineages, suggesting that significantly different biochemical trajectories have led to remarkably similar biomechanical innovations.

     
    more » « less
  5. We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input:(1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector's input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations. 
    more » « less