skip to main content

Title: 3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation
Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and ane transformation. Capsule network is a data-ecient network design proposed to overcome such limitations by replacing pooling layers with dynamic routing and convolutional strides, which aims to preserve the part-whole relationships. Capsule network has shown a great performance in image recognition and natural language processing, but applications for medical image segmentation, particularly volumetric image segmentation, has been limited. In this work, we propose 3D-UCaps, a 3D voxel-based Capsule network for medical volumetric image segmentation. We build the concept of capsules into a CNN by designing a network with two pathways: the rst pathway is encoded by 3D Capsule blocks, whereas the second pathway is decoded by 3D CNNs blocks. 3D-UCaps, therefore inherits the merits from both Capsule network to preserve the spatial relationship and CNNs to learn visual representation. We conducted experiments on various datasets to demonstrate the robustness of 3D-UCaps including iSeg-2017, LUNA16, Hippocampus, and Cardiac, where our method outperforms previous Capsule networks and 3D-Unets.
Award ID(s):
Publication Date:
Journal Name:
Lecture notes in computer science
Sponsoring Org:
National Science Foundation
More Like this
  1. The newly discovered Coronavirus Disease 2019 (COVID-19) has been globally spreading and causing hundreds of thousands of deaths around the world as of its first emergence in late 2019. The rapid outbreak of this disease has overwhelmed health care infrastructures and arises the need to allocate medical equipment and resources more efficiently. The early diagnosis of this disease will lead to the rapid separation of COVID-19 and non-COVID cases, which will be helpful for health care authorities to optimize resource allocation plans and early prevention of the disease. In this regard, a growing number of studies are investigating the capability of deep learning for early diagnosis of COVID-19. Computed tomography (CT) scans have shown distinctive features and higher sensitivity compared to other diagnostic tests, in particular the current gold standard, i.e., the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test. Current deep learning-based algorithms are mainly developed based on Convolutional Neural Networks (CNNs) to identify COVID-19 pneumonia cases. CNNs, however, require extensive data augmentation and large datasets to identify detailed spatial relations between image instances. Furthermore, existing algorithms utilizing CT scans, either extend slice-level predictions to patient-level ones using a simple thresholding mechanism or rely on a sophisticated infection segmentation tomore »identify the disease. In this paper, we propose a two-stage fully automated CT-based framework for identification of COVID-19 positive cases referred to as the “COVID-FACT”. COVID-FACT utilizes Capsule Networks, as its main building blocks and is, therefore, capable of capturing spatial information. In particular, to make the proposed COVID-FACT independent from sophisticated segmentations of the area of infection, slices demonstrating infection are detected at the first stage and the second stage is responsible for classifying patients into COVID and non-COVID cases. COVID-FACT detects slices with infection, and identifies positive COVID-19 cases using an in-house CT scan dataset, containing COVID-19, community acquired pneumonia, and normal cases. Based on our experiments, COVID-FACT achieves an accuracy of 90.82 % , a sensitivity of 94.55 % , a specificity of 86.04 % , and an Area Under the Curve (AUC) of 0.98, while depending on far less supervision and annotation, in comparison to its counterparts.« less
  2. Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a stack of individual voxel `slices', which allows learning to segment a voxel grid to be as straightforward as extending existing image-based segmentation networks to the 3D domain. However, using a voxel grid representation requires a large memory footprint, expensive test-time and limiting the scalability of the solutions. In this paper, we propose Point-Unet, a novel method that incorporates the eciency of deep learning with 3D point clouds into volumetric segmentation. Our key idea is to rst predict the regions of interest in the volume by learning an attentional probability map, which is then used for sampling the volume into a sparse point cloud that is subsequently segmented using a point-based neural network. We have conducted the experiments on the medical volumetric segmentation task with both a small-scale dataset Pancreas and large-scale datasets BraTS18, BraTS19, and BraTS20 challenges. A comprehensive benchmark on di erent metrics has shown that our context-aware Point-Unet robustly outperforms the SOTAmore »voxel-based networks at both accuracies, memory usage during training, and time consumption during testing.« less
  3. Capsule Networks (CapsNets) have demonstrated to be a promising alternative to Convolutional Neural Networks (CNNs). However, they often fall short of state-of-the-art accuracies on large-scale high-dimensional datasets. We propose a Detail-Oriented Capsule Network (DECAPS) that combines the strength of CapsNets with several novel techniques to boost its classification accuracies. First, DECAPS uses an Inverted Dynamic Routing (IDR) mechanism to group lowerlevel capsules into heads before sending them to higher-level capsules. This strategy enables capsules to selectively attend to small but informative details within the data which may be lost during pooling operations in CNNs. Second, DECAPS employs a Peekaboo training procedure, which encourages the network to focus on fine-grained information through a second-level attention scheme. Finally, the distillation process improves the robustness of DECAPS by averaging over the original and attended image region predictions. We provide extensive experiments on the CheXpert and RSNA Pneumonia datasets to validate the effectiveness of DECAPS. Our networks achieve state-of-the-art accuracies not only in classification (increasing the average area under ROC curves from 87.24% to 92.82% on the CheXpert dataset) but also in the weaklysupervised localization of diseased areas (increasing average precision from 41.7% to 80% for the RSNA Pneumonia detection dataset).
  4. We develop three efficient approaches for generating visual explanations from 3D convolutional neural networks (3D- CNNs) for Alzheimer’s disease classification. One approach conducts sensitivity analysis on hierarchical 3D image segmentation, and the other two visualize network activations on a spatial map. Visual checks and a quantitative localization benchmark indicate that all approaches identify important brain parts for Alzheimer’s disease diagnosis. Comparative analysis show that the sensitivity analysis based approach has difficulty handling loosely distributed cerebral cortex, and approaches based on visualization of activations are constrained by the resolution of the convo- lutional layer. The complementarity of these methods improves the understanding of 3D-CNNs in Alzheimer’s disease classification from different perspectives.
  5. Osteoarthritis (OA) is the most common form of arthritis and can often occur in the knee. While convolutional neural networks (CNNs) have been widely used to study medical images, the application of a 3-dimensional (3D) CNN in knee OA diagnosis is limited. This study utilizes a 3D CNN model to analyze sequences of knee magnetic resonance (MR) images to perform knee OA classification. An advantage of using 3D CNNs is the ability to analyze the whole sequence of 3D MR images as a single unit as opposed to a traditional 2D CNN, which examines one image at a time. Therefore, 3D features could be extracted from adjacent slices, which may not be detectable from a single 2D image. The input data for each knee were a sequence of double-echo steady-state (DESS) MR images, and each knee was labeled by the Kellgren and Lawrence (KL) grade of severity at levels 0–4. In addition to the 5-category KL grade classification, we further examined a 2-category classification that distinguishes non-OA (KL ≤ 1) from OA (KL ≥ 2) knees. Clinically, diagnosing a patient with knee OA is the ultimate goal of assigning a KL grade. On a dataset with 1100 knees, the 3Dmore »CNN model that classifies knees with and without OA achieved an accuracy of 86.5% on the validation set and 83.0% on the testing set. We further conducted a comparative study between MRI and X-ray. Compared with a CNN model using X-ray images trained from the same group of patients, the proposed 3D model with MR images achieved higher accuracy in both the 5-category classification (54.0% vs. 50.0%) and the 2-category classification (83.0% vs. 77.0%). The result indicates that MRI, with the application of a 3D CNN model, has greater potential to improve diagnosis accuracy for knee OA clinically than the currently used X-ray methods.« less