Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.
more »
« less
Learning to Infer Kinematic Hierarchies for Novel Object Instances.
Manipulating an articulated object requires perceiving its kinematic hierarchy: its parts, how each can move, and how those motions are coupled. Previous work has explored perception for kinematics, but none infers a complete kinematic hierarchy on never-before-seen object instances, without relying on a schema or template. We present a novel perception system that achieves this goal. Our system infers the moving parts of an object and the kinematic couplings that relate them. To infer parts, it uses a point cloud instance segmentation neural network and to infer kinematic hierarchies, it uses a graph neural network to predict the existence, direction, and type of edges (i.e. joints) that relate the inferred parts. We train these networks using simulated scans of synthetic 3D models. We evaluate our system on simulated scans of 3D objects, and we demonstrate a proof-of-concept use of our system to drive real-world robotic manipulation.
more »
« less
- Award ID(s):
- 1844960
- PAR ID:
- 10321089
- Date Published:
- Journal Name:
- Proceedings of the 2022 International Conference on Robotics and Automation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks. The network is trained on 100k synthetic images of 149 articulated objects from 6 categories. Synthetic images are rendered via a photorealistic simulator with domain randomization. Our proposed model predicts motion residual flows of object parts, and these flows are used to determine the articulation type and parameters. The network achieves an articulation type classification accuracy of 82.5% on novel object instances in trained categories. Experiments also show how this method enables generalization to novel categories and can be applied to real-world images without fine-tuning.more » « less
-
Abstract Social inequality is a consistent feature of animal societies, often manifesting as dominance hierarchies, in which each individual is characterized by a dominance rank denoting its place in the network of competitive relationships among group members. Most studies treat dominance hierarchies as static entities despite their true longitudinal, and sometimes highly dynamic, nature.To guide study of the dynamics of dominance, we propose the concept of a longitudinal hierarchy: the characterization of a single, latent hierarchy and its dynamics over time. Longitudinal hierarchies describe the hierarchy position (r) and dynamics (∆) associated with each individual as a property of its interaction data, the periods into which these data are divided based on a period delineation rule (p) and the method chosen to infer the hierarchy. Hierarchy dynamics result from both active (∆a) and passive (∆p) processes. Methods that infer longitudinal hierarchies should optimize accuracy of rank dynamics as well as of the rank orders themselves, but no studies have yet evaluated the accuracy with which different methods infer hierarchy dynamics.We modify three popular ranking approaches to make them better suited for inferring longitudinal hierarchies. Our three “informed” methods assign ranks that are informed by data from the prior period rather than calculating ranksde novoin each observation period and use prior knowledge of dominance correlates to inform placement of new individuals in the hierarchy. These methods are provided in an R package.Using both a simulated dataset and a long‐term empirical dataset from a species with two distinct sex‐based dominance structures, we compare the performance of these methods and their unmodified counterparts. We show that choice of method has dramatic impacts on inference of hierarchy dynamics via differences in estimates of∆a. Methods that calculate ranksde novoin each period overestimate hierarchy dynamics, but incorporation of prior information leads to more accurately inferred∆a. Of the modified methods, Informed MatReorder infers the most conservative estimates of hierarchy dynamics and Informed Elo infers the most dynamic hierarchies.This work provides crucially needed conceptual framing and methodological validation for studying social dominance and its dynamics.more » « less
-
null (Ed.)Most real-world 3D sensors such as LiDARs perform fixed scans of the entire environment, while being decoupled from the recognition system that processes the sensor data. In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient controllable sensor that measures depth at user-specified locations in the environment. Crucially, we propose using prediction uncertainty of a deep learning based 3D point cloud detector to guide active perception. Given a neural network's uncertainty, we derive an optimization objective to place light curtains using the principle of maximizing information gain. Then, we develop a novel and efficient optimization algorithm to maximize this objective by encoding the physical constraints of the device into a constraint graph and optimizing with dynamic programming. We show how a 3D detector can be trained to detect objects in a scene by sequentially placing uncertainty-guided light curtains to successively improve detection accuracy.more » « less
-
Creating soft robots with sophisticated, autonomous capabilities requires these systems to possess reliable, on-line proprioception of 3D configuration through integrated soft sensors. We present a framework for predicting a soft robot’s 3D configuration via deep learning using feedback from a soft, proprioceptive sensor skin. Our framework introduces a kirigami-enabled strategy for rapidly sensorizing soft robots using off-the-shelf materials, a general kinematic description for soft robot geometry, and an investigation of neural network designs for predicting soft robot configuration. Even with hysteretic, non-monotonic feedback from the piezoresistive sensors, recurrent neural networks show potential for predicting our new kinematic parameters and, thus, the robot’s configuration. One trained neural network closely predicts steady-state configuration during operation, though complete dynamic behavior is not fully captured. We validate our methods on a trunk-like arm with 12 discrete actuators and 12 proprioceptive sensors. As an essential advance in soft robotic perception, we anticipate our framework will open new avenues towards closed loop control in soft robotics.more » « less
An official website of the United States government

