skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning 3D Part Assembly from a Single Image
Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learning-based solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches (code and data available at https://github.com/AntheaLi/3DPartAssembly).  more » « less
Award ID(s):
1763268
PAR ID:
10285236
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
European Conference on Computer Vision
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tan, Jie; Toussaint, Marc; Darvish, Kourosh (Ed.)
    Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. As a fundamental step to a general part assembly system, we tackle the task of determining the precise poses of the parts in the target assembly, which we term “rearrangement planning". We present General Part Assembly Transformer (GPAT), a transformer-based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT’s generalization abilities to novel and diverse target and part shapes. 
    more » « less
  2. Given a part design, the task of manufacturing process selection chooses an appropriate manufacturing process to fabricate it. Prior research has traditionally determined manufacturing processes through direct classification. However, an alternative approach to select a manufacturing process for a new design involves identifying previously produced parts with comparable shapes and materials and learning from them. Finding similar designs from a large dataset of previously manufactured parts is a challenging problem. To solve this problem, researchers have proposed different spatial and spectral shape descriptors to extract shape features including the D2 distribution, spherical harmonics (SH), and the Fast Fourier Transform (FFT), as well as the application of different machine learning methods on various representations of 3D part models like multi-view images, voxel, triangle mesh, and point cloud. However, there has not been a comprehensive analysis of these different shape descriptors, especially for part similarity search aimed at manufacturing process selection. To remedy this gap, this paper presents an in-depth comparative study of these shape descriptors for part similarity search. While we acknowledge the importance of factors like part size, tolerance, and cost in manufacturing process selection, this paper focuses on part shape and material properties only. Our findings show that SH performs the best among non-machine learning methods for manufacturing process selection, yielding 97.96% testing accuracy using the proposed quantitative evaluation metric. For machine learning methods, deep learning on multi-view image representations is best, yielding 99.85% testing accuracy when rotational invariance is not a primary concern. Deep learning on point cloud representations excels, yielding 99.44% testing accuracy when considering rotational invariance. 
    more » « less
  3. Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. 
    more » « less
  4. The use of computer-aided manufacturing (CAM) software is essential in the rapid production of high-quality computer numerical control (CNC) machining toolpaths for complex parts. Typical CAM software relies on analytical representations of part geometry, where curves and surfaces are described by parametric functions. This paper proposes the use of a novel way to represent part geometry known as a voxel model. A voxel model uses a three-dimensional array of small cubes to represent a part volume; these cubes, or voxels, are the three-dimensional analog of two-dimensional pixels in an image. The use of voxels for a CAM application enables higher surface complexity, simplified collision checking, and more robust analysis of material removal than would be possible with typical parametric CAM. The unique capabilities of the voxel-based CAM approach described in this paper enable rapid production of high-quality 5-axis toolpaths for machining complex parts, such as the centrifugal compressor assembly that is presented in this work. 
    more » « less
  5. Human pose estimation (HPE) is inherently a homogeneous multi-task learning problem, with the localization of each body part as a different task. Recent HPE approaches universally learn a shared representation for all parts, from which their locations are linearly regressed. However, our statistical analysis indicates not all parts are related to each other. As a result, such a sharing mechanism can lead to negative transfer and deteriorate the performance. This potential issue drives us to raise an interesting question. Can we identify related parts and learn specific features for them to improve pose estimation? Since unrelated tasks no longer share a high-level representation, we expect to avoid the adverse effect of negative transfer. In addition, more explicit structural knowledge, e.g., ankles and knees are highly related, is incorporated into the model, which helps resolve ambiguities in HPE. To answer this question, we first propose a data-driven approach to group related parts based on how much information they share. Then a part-based branching network (PBN) is introduced to learn representations specific to each part group. We further present a multi-stage version of this network to repeatedly refine intermediate features and pose estimates. Ablation experiments indicate learning specific features significantly improves the localization of occluded parts and thus benefits HPE. Our approach also outperforms all state-of-the-art methods on two benchmark datasets, with an outstanding advantage when occlusion occurs. 
    more » « less