skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: PointConv: Deep Convolutional Networks on 3D Point Clouds
Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
9613 to 9622
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. 3D CT point clouds reconstructed from the original CT images are naturally represented in real-world coordinates. Compared with CT images, 3D CT point clouds contain invariant geometric features with irregular spatial distributions from multiple viewpoints. This paper rethinks pulmonary nodule detection in CT point cloud representations. We first extract the multi-view features from a sparse convolutional (SparseConv) encoder by rotating the point clouds with different angles in the world coordinate. Then, to simultaneously learn the discriminative and robust spatial features from various viewpoints, a nodule proposal optimization schema is proposed to obtain coarse nodule regions by aggregating consistent nodule proposals prediction from multi-view features. Last, the multi-level features and semantic segmentation features extracted from a SparseConv decoder are concatenated with multi-view features for final nodule region regression. Experiments on the benchmark dataset (LUNA16) demonstrate the feasibility of applying CT point clouds in lung nodule detection task. Furthermore, we observe that by combining multi-view predictions, the performance of the proposed framework is greatly improved compared to single-view, while the interior texture features of nodules from images are more suitable for detecting nodules in small sizes. 
    more » « less
  2. null (Ed.)
    The success of supervised learning requires large-scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks including multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the five different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method. 
    more » « less
  3. To alleviate the cost of collecting and annotating large-scale "3D object" point cloud data, we propose an unsupervised learning approach to learn features from an unlabeled point cloud dataset by using part contrasting and object clustering with deep graph convolutional neural networks (GCNNs). In the contrast learning step, all the samples in the 3D object dataset are cut into two parts and put into a "part" dataset. Then a contrast learning GCNN (ContrastNet) is trained to verify whether two randomly sampled parts from the part dataset belong to the same object. In the cluster learning step, the trained ContrastNet is applied to all the samples in the original 3D object dataset to extract features, which are used to group the samples into clusters. Then another GCNN for clustering learning (ClusterNet) is trained from the orignal 3D data to predict the cluster IDs of all the training samples. The contrasting learning forces the ContrastNet to learn semantic features of objects, while the ClusterNet improves the quality of learned features by being trained to discover objects that belong to the same semantic categories by using cluster IDs. We have conducted extensive experiments to evaluate the proposed framework on point cloud classification tasks. The proposed unsupervised learning approach obtains comparable performance to the state-of-the-art with heavier shape auto-encoding unsupervised feature extraction methods. We have also tested the networks on object recognition using partial 3D data, by simulating occlusions and perspective views, and obtained practically useful results. The code of this work is publicly available at: 
    more » « less
  4. Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a stack of individual voxel `slices', which allows learning to segment a voxel grid to be as straightforward as extending existing image-based segmentation networks to the 3D domain. However, using a voxel grid representation requires a large memory footprint, expensive test-time and limiting the scalability of the solutions. In this paper, we propose Point-Unet, a novel method that incorporates the eciency of deep learning with 3D point clouds into volumetric segmentation. Our key idea is to rst predict the regions of interest in the volume by learning an attentional probability map, which is then used for sampling the volume into a sparse point cloud that is subsequently segmented using a point-based neural network. We have conducted the experiments on the medical volumetric segmentation task with both a small-scale dataset Pancreas and large-scale datasets BraTS18, BraTS19, and BraTS20 challenges. A comprehensive benchmark on di erent metrics has shown that our context-aware Point-Unet robustly outperforms the SOTA voxel-based networks at both accuracies, memory usage during training, and time consumption during testing. 
    more » « less
  5. 3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate pointlevel labels from a computer game. To our best knowledge, this is the first publication on LiDAR point cloud simulation framework for autonomous driving. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic registration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed. 
    more » « less