A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection.
more »
« less
Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms
Abstract Realistic 3D indoor scene datasets have enabled significant recent progress in computer vision, scene understanding, autonomous navigation, and 3D reconstruction. But the scale, diversity, and customizability of existing datasets is limited, and it is time‐consuming and expensive to scan and annotate more. Fortunately, combinatorics is on our side: there are enough individualroomsin existing 3D scene datasets, if there was but a way to recombine them into new layouts. In this paper, we propose the task of generating novel 3D floor plans from existing 3D rooms. We identify three sub‐tasks of this problem: generation of 2D layout, retrieval of compatible 3D rooms, and deformation of 3D rooms to fit the layout. We then discuss different strategies for solving the problem, and design two representative pipelines: one uses available 2D floor plans to guide selection and deformation of 3D rooms; the other learns to retrieve a set of compatible 3D rooms and combine them into novel layouts. We design a set of metrics that evaluate the generated results with respect to each of the three subtasks and show that different methods trade off performance on these subtasks. Finally, we survey downstream tasks that benefit from generated 3D scenes and discuss strategies in selecting the methods most appropriate for the demands of these tasks.
more »
« less
- Award ID(s):
- 1907547
- PAR ID:
- 10448133
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Computer Graphics Forum
- Volume:
- 40
- Issue:
- 5
- ISSN:
- 0167-7055
- Page Range / eLocation ID:
- p. 57-69
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, we study the 3D volumetric modeling problem by adopting the Wasserstein introspective neural networks method (WINN) that was previously applied to 2D static im ages. We name our algorithm 3DWINN which enjoys the same properties as WINN in the 2D case: being simultaneously generative and discriminative. Compared to the existing 3D volumetric modeling approaches, 3DWINN demonstrates competitive results on several benchmarks in both the generation and the classification tasks. In addition to the standard inception score, the Fréchet Inception Distance (FID) metric is also adopted to measure the quality of 3D volumetric generations. In addition, we study adversarial attacks for volumetric data and demonstrate the robustness of 3DWINN against ad- versarial examples while achieving appealing results in both classification and generation within a single model. 3DWINN is a general framework and it can be applied to the emerging tasks for 3D object and scene modelingmore » « less
-
Despite deep learning approaches have achieved promising successes in 2D optical flow estimation, it is a challenge to accurately estimate scene flow in 3D space as point clouds are inherently lacking topological information. In this paper, we aim at handling the problem of self-supervised 3D scene flow estimation based on dynamic graph convolutional neural networks (GCNNs), namely 3D SceneFlowNet. To better learn geometric relationships among points, we introduce EdgeConv to learn multiple-level features in a pyramid from point clouds and a self-attention mechanism to apply the multi-level features to predict the final scene flow. Our trained model can efficiently process a pair of adjacent point clouds as input and predict a 3D scene flow accurately without any supervision. The proposed approach achieves superior performance on both synthetic ModelNet40 dataset and real LiDAR scans from KITTI Scene Flow 2015 datasets.more » « less
-
null (Ed.)The 3D world limits the human body pose and the hu- man body pose conveys information about the surrounding objects. Indeed, from a single image of a person placed in an indoor scene, we as humans are adept at resolving am- biguities of the human pose and room layout through our knowledge of the physical laws and prior perception of the plausible object and human poses. However, few computer vision models fully leverage this fact. In this work, we pro- pose a holistically trainable model that perceives the 3D scene from a single RGB image, estimates the camera pose and the room layout, and reconstructs both human body and object meshes. By imposing a set of comprehensive and sophisticated losses on all aspects of the estimations, we show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods. To the best of our knowledge, this is the first model that outputs both object and human predictions at the mesh level, and performs joint optimization on the scene and human poses.more » « less
-
The placement of vegetation plays a central role in the realism of virtual scenes. We introduce procedural placement models (PPMs) for vegetation in urban layouts. PPMs are environmentally sensitive to city geometry and allow identifying plausible plant positions based on structural and functional zones in an urban layout. PPMs can either be directly used by defining their parameters or learned from satellite images and land register data. This allows us to populate urban landscapes with complex 3D vegetation and enhance existing approaches for generating urban landscapes. Our framework’s effectiveness is shown through examples of large-scale city scenes and close-ups of individually grown tree models. We validate the results generated with our framework with a perceptual user study and its usability based on urban scene design sessions with expert users.more » « less
An official website of the United States government
