skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards Efficient 3D Point Cloud Scene Completion via Novel Depth View Synthesis
3D point cloud completion has been a long-standing challenge at scale, and corresponding per-point supervised training strategies suffered from cumbersome annotations. 2D supervision has recently emerged as a promising alternative for 3D tasks, but specific approaches for 3D point cloud completion still remain to be explored. To overcome these limitations, we propose an end-to-end method that directly lifts a single depth map to a completed point cloud. With one depth map as input, a multi-way novel depth view synthesis network (NDVNet) is designed to infer coarsely completed depth maps under various viewpoints. Meanwhile, a geometric depth perspective rendering module is introduced to utilize the raw input depth map to generate a reprojected depth map for each view. Therefore, the two parallelly generated depth maps for each view are further concatenated and refined by a depth completion network (DCNet). The final completed point cloud is fused from all refined depth views. Experimental results demonstrate the effectiveness of our proposed approach composed of aforementioned components, to produce high-quality, state-of-the-art results on the popular SUNCG benchmark.  more » « less
Award ID(s):
2041307
PAR ID:
10279865
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Pattern Recognition (ICPR), 2020.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings. 
    more » « less
  2. null (Ed.)
    In this paper, we address the problem of estimating dense depth from a sequence of images using deep neural networks. Specifically, we employ a dense-optical-flow network to compute correspondences and then triangulate the point cloud to obtain an initial depth map. Parts of the point cloud, however, may be less accurate than others due to lack of common observations or small parallax. To further increase the triangulation accuracy, we introduce a depth-refinement network (DRN) that optimizes the initial depth map based on the image’s contextual cues. In particular, the DRN contains an iterative refinement module (IRM) that improves the depth accuracy over iterations by refining the deep features. Lastly, the DRN also predicts the uncertainty in the refined depths, which is desirable in applications such as measurement selection for scene reconstruction. We show experimentally that our algorithm outperforms state-of-the-art approaches in terms of depth accuracy, and verify that our predicted uncertainty is highly correlated to the actual depth error. 
    more » « less
  3. We describe an improvement to the recently developed view independent rendering (VIR), and apply it to dynamic cube-mapped reflections. Standard multiview rendering (MVR) renders a scene six times for each cube map. VIR traverses the geometry once per frame to generate a point cloud optimized to many cube maps, using it to render reflected views in parallel. Our improvement, eye-resolution point rendering (EPR), is faster than VIR and makes cube maps faster than MVR, with comparable visual quality. We are currently improving EPR’s run time by reducing point cloud size and per-point processing. 
    more » « less
  4. Occlusion is a critical problem in the Autonomous Driving System. Solving this problem requires robust collaboration among autonomous vehicles traveling on the same roads. However, transferring the entirety of raw sensors' data among autonomous vehicles is expensive and can cause a delay in communication. This paper proposes a method called Realtime Collaborative Vehicular Communication based on Bird's-Eye-View (BEV) map. The BEV map holds the accurate depth information from the point cloud image while its 2D representation enables the method to use a novel and well-trained image-based backbone network. Most importantly, we encode the object detection results into the BEV representation to reduce the volume of data transmission and make real-time collaboration between autonomous vehicles possible. The output of this process, the BEV map, can also be used as direct input to most route planning modules. Numerical results show that this novel method can increase the accuracy of object detection by cross-verifying the results from multiple points of view. Thus, in the process, this new method also reduces the object detection challenges that stem from occlusion and partial occlusion. Additionally, different from many existing methods, this new method significantly reduces the data needed for transfer between vehicles, achieving a speed of 21.92 Hz for both the object detection process and the data transmission process, which is sufficiently fast for a real-time system. 
    more » « less
  5. In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCSREAL275 [29]) and articulated object pose benchmarks (SAPIEN [34], BMVC [18]) at the fastest FPS ∼ 12. 
    more » « less