skip to main content

Title: Rethinking Pulmonary Nodule Detection in Multi-view 3D CT Point Cloud Representation
3D CT point clouds reconstructed from the original CT images are naturally represented in real-world coordinates. Compared with CT images, 3D CT point clouds contain invariant geometric features with irregular spatial distributions from multiple viewpoints. This paper rethinks pulmonary nodule detection in CT point cloud representations. We first extract the multi-view features from a sparse convolutional (SparseConv) encoder by rotating the point clouds with different angles in the world coordinate. Then, to simultaneously learn the discriminative and robust spatial features from various viewpoints, a nodule proposal optimization schema is proposed to obtain coarse nodule regions by aggregating consistent nodule proposals prediction from multi-view features. Last, the multi-level features and semantic segmentation features extracted from a SparseConv decoder are concatenated with multi-view features for final nodule region regression. Experiments on the benchmark dataset (LUNA16) demonstrate the feasibility of applying CT point clouds in lung nodule detection task. Furthermore, we observe that by combining multi-view predictions, the performance of the proposed framework is greatly improved compared to single-view, while the interior texture features of nodules from images are more suitable for detecting nodules in small sizes.
Award ID(s):
Publication Date:
Journal Name:
The Machine Learning in Medical Imaging (MLMI) Workshop in conjunction with MICCAI
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we present a model to obtain prior knowledge for organ localization in CT thorax images using three dimensional convolutional neural networks (3D CNNs). Specifically, we use the knowledge obtained from CNNs in a Bayesian detector to establish the presence and location of a given target organ defined within a spherical coordinate system. We train a CNN to perform a soft detection of the target organ potentially present at any point, x = [r,Θ,Φ]T. This probability outcome is used as a prior in a Bayesian model whose posterior probability serves to provide a more accurate solution to the target organ detection problem. The likelihoods for the Bayesian model are obtained by performing a spatial analysis of the organs in annotated training volumes. Thoracic CT images from the NSCLC–Radiomics dataset are used in our case study, which demonstrates the enhancement in robustness and accuracy of organ identification. The average value of the detector accuracies for the right lung, left lung, and heart were found to be 94.87%, 95.37%, and 90.76% after the CNN stage, respectively. Introduction of spatial relationship using a Bayes classifier improved the detector accuracies to 95.14%, 96.20%, and 95.15%, respectively, showing a marked improvement in heartmore »detection. This workflow improves the detection rate since the decision is made employing both lower level features (edges, contour etc) and complex higher level features (spatial relationship between organs). This strategy also presents a new application to CNNs and a novel methodology to introduce higher level context features like spatial relationship between objects present at a different location in images to real world object detection problems.« less
  2. 3D object recognition accuracy can be improved by learning the multi-scale spatial features from 3D spatial geometric representations of objects such as point clouds, 3D models, surfaces, and RGB-D data. Current deep learning approaches learn such features either using structured data representations (voxel grids and octrees) or from unstructured representations (graphs and point clouds). Learning features from such structured representations is limited by the restriction on resolution and tree depth while unstructured representations creates a challenge due to non-uniformity among data samples. In this paper, we propose an end-to-end multi-level learning approach on a multi-level voxel grid to overcome these drawbacks. To demonstrate the utility of the proposed multi-level learning, we use a multi-level voxel representation of 3D objects to perform object recognition. The multi-level voxel representation consists of a coarse voxel grid that contains volumetric information of the 3D object. In addition, each voxel in the coarse grid that contains a portion of the object boundary is subdivided into multiple fine-level voxel grids. The performance of our multi-level learning algorithm for object recognition is comparable to dense voxel representations while using significantly lower memory.
  3. Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloudmore »showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.« less
  4. Flat surfaces captured by 3D point clouds are often used for localization, mapping, and modeling. Dense point cloud processing has high computation and memory costs making low-dimensional representations of flat surfaces such as polygons desirable. We present Polylidar3D, a non-convex polygon extraction algorithm which takes as input unorganized 3D point clouds (e.g., LiDAR data), organized point clouds (e.g., range images), or user-provided meshes. Non-convex polygons represent flat surfaces in an environment with interior cutouts representing obstacles or holes. The Polylidar3D front-end transforms input data into a half-edge triangular mesh. This representation provides a common level of abstraction for subsequent back-end processing. The Polylidar3D back-end is composed of four core algorithms: mesh smoothing, dominant plane normal estimation, planar segment extraction, and finally polygon extraction. Polylidar3D is shown to be quite fast, making use of CPU multi-threading and GPU acceleration when available. We demonstrate Polylidar3D’s versatility and speed with real-world datasets including aerial LiDAR point clouds for rooftop mapping, autonomous driving LiDAR point clouds for road surface detection, and RGBD cameras for indoor floor/wall detection. We also evaluate Polylidar3D on a challenging planar segmentation benchmark dataset. Results consistently show excellent speed and accuracy.
  5. The success of supervised learning requires large-scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks includingmore »multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the five different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method.« less