skip to main content


Title: Learning 3D Part Assembly from a Single Image
Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learning-based solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches (code and data available at https://github.com/AntheaLi/3DPartAssembly).  more » « less
Award ID(s):
1763268
NSF-PAR ID:
10285236
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
European Conference on Computer Vision
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tan, Jie ; Toussaint, Marc ; Darvish, Kourosh (Ed.)
    Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. As a fundamental step to a general part assembly system, we tackle the task of determining the precise poses of the parts in the target assembly, which we term “rearrangement planning". We present General Part Assembly Transformer (GPAT), a transformer-based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT’s generalization abilities to novel and diverse target and part shapes. 
    more » « less
  2. The value of electronic waste at present is estimated to increase rapidly year after year, and with rapid advances in electronics, shows no signs of slowing down. Storage devices such as SATA Hard Disks and Solid State Devices are electronic devices with high value recyclable raw materials which often goes unrecovered. Most of the e-waste currently generated, including HDDs, is either managed by the informal recycling sector, or is improperly landfilled with the municipal solid waste, primarily due to insufficient recovery infrastructure and labor shortage in the recycling industry. This emphasizes the importance of developing modern advanced recycling technologies such as robotic disassembly. Performing smooth robotic disassembly operations of precision electronics necessitates fast and accurate geometric 3D profiling to provide a quick and precise location of key components. Fringe Projection Profilometry (FPP), as a variation of the well-known structured light technology, provides both the high speed and high accuracy needed to accomplish this. However, Using FPP for disassembly of high-precision electronics such as hard disks can be especially challenging, given that the hard disk platter is almost completely reflective. Furthermore, the metallic nature of its various components make it difficult to render an accurate 3D reconstruction. To address this challenge, We have developed a single-shot approach to predict the 3D point cloud of these devices using a combination of computer graphics, fringe projection, and deep learning. We calibrate a physical FPP-based 3D shape measurement system and set up its digital twin using computer graphics. We capture HDD and SSD CAD models at various orientations to generate virtual training datasets consisting of fringe images and their point cloud reconstructions. This is used to train the U-NET which is then found efficient to predict the depth of the parts to a high accuracy with only a single shot fringe image. This proposed technology has the potential to serve as a valuable fast 3D vision tool for robotic re-manufacturing and is a stepping stone for building a completely automated assembly system. 
    more » « less
  3. Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. 
    more » « less
  4. Human pose estimation (HPE) is inherently a homogeneous multi-task learning problem, with the localization of each body part as a different task. Recent HPE approaches universally learn a shared representation for all parts, from which their locations are linearly regressed. However, our statistical analysis indicates not all parts are related to each other. As a result, such a sharing mechanism can lead to negative transfer and deteriorate the performance. This potential issue drives us to raise an interesting question. Can we identify related parts and learn specific features for them to improve pose estimation? Since unrelated tasks no longer share a high-level representation, we expect to avoid the adverse effect of negative transfer. In addition, more explicit structural knowledge, e.g., ankles and knees are highly related, is incorporated into the model, which helps resolve ambiguities in HPE. To answer this question, we first propose a data-driven approach to group related parts based on how much information they share. Then a part-based branching network (PBN) is introduced to learn representations specific to each part group. We further present a multi-stage version of this network to repeatedly refine intermediate features and pose estimates. Ablation experiments indicate learning specific features significantly improves the localization of occluded parts and thus benefits HPE. Our approach also outperforms all state-of-the-art methods on two benchmark datasets, with an outstanding advantage when occlusion occurs. 
    more » « less
  5. Additive Manufacturing (AM), also known as 3D printing, has been highlighted as a complementary method to the traditional (subtractive and formative) manufacturing. This mainly results from its distinctive characteristics to directly produce complex shapes and assemblies without an assembly process. With these aspects, AM has affected the way products are designed and formed, which leads to an exclusive research area, known as Design for AM (DfAM). As a step towards addressing DfAM, this paper reviews the literature on re-designing an original model into assemblies produced in AM, named as Part Decomposition (PD). Although PD has received less attention in DfAM compared with Part Consolidation (PC) that is re-designing assemblies into a consolidated single part, PD has been studied with various motives and challenges for AM. To investigate the research trend in PD, 37 main publications are categorized under five motives including printability, productivity, functionality, artistry and flexibility. Additionally, from technical and methodological aspects, relevant studies are organized into decomposition issues (automatic, semi-automatic and manual decompositions), buildup issues (orientation decision for single- and multi-part and packing problem), and assembly issues (connection design and assembly process planning). As witnessed in this comprehensive review, the concept of PD leaves further research challenges spanning several disciplines. Along this line, we further elaborate future research directions of PD under three main categories: (1) enhancing the AM productivity for mass customization; (2) developing novel decomposition methods and guidelines; and (3) applying conventional design methodologies to PD. 
    more » « less