skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enhanced U-Net Tool Segmentation using Hybrid Coordinate Representations of Endoscopic Images
This paper presents an approach to enhanced endoscopic tool segmentation combining separate pathways utilizing input images in two different coordinate representations. The proposed method examines U-Net convolutional neural networks with input endoscopic images represented via (1) the original rectangular coordinate format alongside (2) a morphological polar coordinate transformation. To maximize information and the breadth of the endoscope frustrum, imaging sensors are oftentimes larger than the image circle. This results in unused border regions. Ideally, the region of interest is proximal to the image center. The above two observations formed the basis for the morphological polar transformation pathway as an augmentation to typical rectangular input image representations. Results indicate that neither of the two investigated coordinate representations consistently yielded better segmentation performance as compared to the other. Improved segmentation can be achieved with a hybrid approach that carefully selects which of the two pathways to be used for individual input images. Towards that end, two binary classifiers were trained to identify, given an input endoscopic image, which of the two coordinate representation segmentation pathways (rectangular or polar), would result in better segmentation performance. Results are promising and suggest marked improvements using a hybrid pathway selection approach compared to either alone. The experiment used to evaluate the proposed hybrid method utilized a dataset consisting of 8360 endoscopic images from real surgery and evaluated segmentation performance with Dice coefficient and Intersection over Union. The results suggest that on-the-fly polar transformation for tool segmentation is useful when paired with the proposed hybrid tool-segmentation approach.  more » « less
Award ID(s):
2101107
PAR ID:
10326946
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2021 International Symposium on Medical Robotics (ISMR)
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a tool-pose-informed variable center morphological polar transform to enhance segmentation of endoscopic images. The representation, while not loss-less, transforms rigid tool shapes into morphologies consistently more rectangular that may be more amenable to image segmentation networks. The proposed method was evaluated using the U-Net convolutional neural network, and the input images from endoscopy were represented in one of the four different coordinate formats (1) the original rectangular image representation, (2) the morphological polar coordinate transform, (3) the proposed variable center transform about the tool-tip pixel and (4) the proposed variable center transform about the tool vanishing point pixel. Previous work relied on the observations that endoscopic images typically exhibit unused border regions with content in the shape of a circle (since the image sensor is designed to be larger than the image circle to maximize available visual information in the constrained environment) and that the region of interest (ROI) was most ideally near the endoscopic image center. That work sought an intelligent method for, given an input image, carefully selecting between methods (1) and (2) for best image segmentation prediction. In this extension, the image center reference constraint for polar transformation in method (2) is relaxed via the development of a variable center morphological transformation. Transform center selection leads to different spatial distributions of image loss, and the transform-center location can be informed by robot kinematic model and endoscopic image data. In particular, this work is examined using the tool-tip and tool vanishing point on the image plane as candidate centers. The experiments were conducted for each of the four image representations using a data set of 8360 endoscopic images from real sinus surgery. The segmentation performance was evaluated with standard metrics, and some insight about loss and tool location effects on performance are provided. Overall, the results are promising, showing that selecting a transform center based on tool shape features using the proposed method can improve segmentation performance. 
    more » « less
  2. This paper proposes a modified method for training tool segmentation networks for endoscopic images by parsing training images into two disjoint sets: one for rectangular representations of endoscopic images and one for polar. Previous work [1], [2] demonstrated that certain endoscopic images may be better segmented by a U-Net network trained on the original rectangular representation of images alone, and others performed better with polar representations. This work extends that observation to the training images and seeks to intelligently decompose the aggregate training data into disjoint image sets — one ideal for training a network to segment original, rectangular endoscopic images and the other for training a polar segmentation network. The training set decomposition consists of three stages: (1) initial data split and models, (2) image reallocation and transition mechanisms with retraining, and (3) evaluation. In (2), two separate frameworks for parsing polar vs. rectangular training images were investigated, with three switching metrics utilized in both. Experiments comparatively evaluated the segmentation performance (via Sørenson Dice coefficient) of the in-group and out-of-group images between the set-decomposed models. Results are encouraging, showing improved aggregate in-group Dice scores as well as image sets trending towards convergence. 
    more » « less
  3. Accurate semantic image segmentation from medical imaging can enable intelligent vision-based assistance in robot-assisted minimally invasive surgery. The human body and surgical procedures are highly dynamic. While machine-vision presents a promising approach, sufficiently large training image sets for robust performance are either costly or unavailable. This work examines three novel generative adversarial network (GAN) methods of providing usable synthetic tool images using only surgical background images and a few real tool images. The best of these three novel approaches generates realistic tool textures while preserving local background content by incorporating both a style preservation and a content loss component into the proposed multi-level loss function. The approach is quantitatively evaluated, and results suggest that the synthetically generated training tool images enhance UNet tool segmentation performance. More specifically, with a random set of 100 cadaver and live endoscopic images from the University of Washington Sinus Dataset, the UNet trained with synthetically generated images using the presented method resulted in 35.7% and 30.6% improvement over using purely real images in mean Dice coefficient and Intersection over Union scores, respectively. This study is promising towards the use of more widely available and routine screening endoscopy to preoperatively generate synthetic training tool images for intraoperative UNet tool segmentation. 
    more » « less
  4. Segmentation of echocardiograms plays an essential role in the quantitative analysis of the heart and helps diagnose cardiac diseases. In the recent decade, deep learning-based approaches have significantly improved the performance of echocardiogram segmentation. Most deep learning-based methods assume that the image to be processed is rectangular in shape. However, typically echocardiogram images are formed within a sector of a circle, with a significant region in the overall rectangular image where there is no data, a result of the ultrasound imaging methodology. This large non-imaging region can influence the training of deep neural networks. In this paper, we propose to use polar transformation to help train deep learning algorithms. Using the r-θ transformation, a significant portion of the non-imaging background is removed, allowing the neural network to focus on the heart image. The segmentation model is trained on both x-y and r-θ images. During inference, the predictions from the x-y and r-θ images are combined using max-voting. We verify the efficacy of our method on the CAMUS dataset with a variety of segmentation networks, encoder networks, and loss functions. The experimental results demonstrate the effectiveness and versatility of our proposed method for improving the segmentation results. 
    more » « less
  5. Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and ane transformation. Capsule network is a data-ecient network design proposed to overcome such limitations by replacing pooling layers with dynamic routing and convolutional strides, which aims to preserve the part-whole relationships. Capsule network has shown a great performance in image recognition and natural language processing, but applications for medical image segmentation, particularly volumetric image segmentation, has been limited. In this work, we propose 3D-UCaps, a 3D voxel-based Capsule network for medical volumetric image segmentation. We build the concept of capsules into a CNN by designing a network with two pathways: the rst pathway is encoded by 3D Capsule blocks, whereas the second pathway is decoded by 3D CNNs blocks. 3D-UCaps, therefore inherits the merits from both Capsule network to preserve the spatial relationship and CNNs to learn visual representation. We conducted experiments on various datasets to demonstrate the robustness of 3D-UCaps including iSeg-2017, LUNA16, Hippocampus, and Cardiac, where our method outperforms previous Capsule networks and 3D-Unets. 
    more » « less