This content will become publicly available on July 15, 2025
- Award ID(s):
- 2101107
- PAR ID:
- 10552164
- Publisher / Repository:
- 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM)
- Date Published:
- ISSN:
- 2159-6255
- ISBN:
- 979-8-3503-5536-9
- Page Range / eLocation ID:
- 393 to 398
- Format(s):
- Medium: X
- Location:
- Boston, MA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
This paper presents an approach to enhanced endoscopic tool segmentation combining separate pathways utilizing input images in two different coordinate representations. The proposed method examines U-Net convolutional neural networks with input endoscopic images represented via (1) the original rectangular coordinate format alongside (2) a morphological polar coordinate transformation. To maximize information and the breadth of the endoscope frustrum, imaging sensors are oftentimes larger than the image circle. This results in unused border regions. Ideally, the region of interest is proximal to the image center. The above two observations formed the basis for the morphological polar transformation pathway as an augmentation to typical rectangular input image representations. Results indicate that neither of the two investigated coordinate representations consistently yielded better segmentation performance as compared to the other. Improved segmentation can be achieved with a hybrid approach that carefully selects which of the two pathways to be used for individual input images. Towards that end, two binary classifiers were trained to identify, given an input endoscopic image, which of the two coordinate representation segmentation pathways (rectangular or polar), would result in better segmentation performance. Results are promising and suggest marked improvements using a hybrid pathway selection approach compared to either alone. The experiment used to evaluate the proposed hybrid method utilized a dataset consisting of 8360 endoscopic images from real surgery and evaluated segmentation performance with Dice coefficient and Intersection over Union. The results suggest that on-the-fly polar transformation for tool segmentation is useful when paired with the proposed hybrid tool-segmentation approach.more » « less
-
This paper presents a tool-pose-informed variable center morphological polar transform to enhance segmentation of endoscopic images. The representation, while not loss-less, transforms rigid tool shapes into morphologies consistently more rectangular that may be more amenable to image segmentation networks. The proposed method was evaluated using the U-Net convolutional neural network, and the input images from endoscopy were represented in one of the four different coordinate formats (1) the original rectangular image representation, (2) the morphological polar coordinate transform, (3) the proposed variable center transform about the tool-tip pixel and (4) the proposed variable center transform about the tool vanishing point pixel. Previous work relied on the observations that endoscopic images typically exhibit unused border regions with content in the shape of a circle (since the image sensor is designed to be larger than the image circle to maximize available visual information in the constrained environment) and that the region of interest (ROI) was most ideally near the endoscopic image center. That work sought an intelligent method for, given an input image, carefully selecting between methods (1) and (2) for best image segmentation prediction. In this extension, the image center reference constraint for polar transformation in method (2) is relaxed via the development of a variable center morphological transformation. Transform center selection leads to different spatial distributions of image loss, and the transform-center location can be informed by robot kinematic model and endoscopic image data. In particular, this work is examined using the tool-tip and tool vanishing point on the image plane as candidate centers. The experiments were conducted for each of the four image representations using a data set of 8360 endoscopic images from real sinus surgery. The segmentation performance was evaluated with standard metrics, and some insight about loss and tool location effects on performance are provided. Overall, the results are promising, showing that selecting a transform center based on tool shape features using the proposed method can improve segmentation performance.
-
Accurate semantic image segmentation from medical imaging can enable intelligent vision-based assistance in robot-assisted minimally invasive surgery. The human body and surgical procedures are highly dynamic. While machine-vision presents a promising approach, sufficiently large training image sets for robust performance are either costly or unavailable. This work examines three novel generative adversarial network (GAN) methods of providing usable synthetic tool images using only surgical background images and a few real tool images. The best of these three novel approaches generates realistic tool textures while preserving local background content by incorporating both a style preservation and a content loss component into the proposed multi-level loss function. The approach is quantitatively evaluated, and results suggest that the synthetically generated training tool images enhance UNet tool segmentation performance. More specifically, with a random set of 100 cadaver and live endoscopic images from the University of Washington Sinus Dataset, the UNet trained with synthetically generated images using the presented method resulted in 35.7% and 30.6% improvement over using purely real images in mean Dice coefficient and Intersection over Union scores, respectively. This study is promising towards the use of more widely available and routine screening endoscopy to preoperatively generate synthetic training tool images for intraoperative UNet tool segmentation.more » « less
-
Electron microscopy images of carbon nanotube (CNT) forests are difficult to segment due to the long and thin nature of the CNTs; density of the CNT forests resulting in CNTs touching, crossing, and occluding each other; and low signal-to-noise ratio electron microscopy imagery. In addition, due to image complexity, it is not feasible to prepare training segmentation masks. In this paper, we propose CNTSegNet, a dual loss, orientation-guided, self-supervised, deep learning network for CNT forest segmentation in scanning electron microscopy (SEM) images. Our training labels consist of weak segmentation labels produced by intensity thresholding of the raw SEM images and self labels produced by estimating orientation distribution of CNTs in these raw images. The proposed network extends a U-net-like encoder-decoder architecture with a novel two-component loss function. The first component is dice loss computed between the predicted segmentation maps and the weak segmentation labels. The second component is mean squared error (MSE) loss measuring the difference between the orientation histogram of the predicted segmentation map and the original raw image. Weighted sum of these two loss functions is used to train the proposed CNTSegNet network. The dice loss forces the network to perform background-foreground segmentation using local intensity features. The MSE loss guides the network with global orientation features and leads to refined segmentation results. The proposed system needs only a few-shot dataset for training. Thanks to it’s self-supervised nature, it can easily be adapted to new datasets.more » « less
-
null (Ed.)Training a semantic segmentation model requires large densely-annotated image datasets that are costly to obtain. Once the training is done, it is also difficult to add new object categories to such segmentation models. In this paper, we tackle the few-shot semantic segmentation problem, which aims to perform image segmentation task on unseen object categories merely based on one or a few support example(s). The key to solving this few-shot segmentation problem lies in effectively utilizing object information from support examples to separate target objects from the background in a query image. While existing methods typically generate object-level representations by averaging local features in support images, we demonstrate that such object representations are typically noisy and less distinguishing. To solve this problem, we design an object representation generator (ORG) module which can effectively aggregate local object features from support image( s) and produce better object-level representation. The ORG module can be embedded into the network and trained end-to-end in a weakly-supervised fashion without extra human annotation. We incorporate this design into a modified encoder-decoder network to present a powerful and efficient framework for few-shot semantic segmentation. Experimental results on the Pascal-VOC and MS-COCO datasets show that our approach achieves better performance compared to existing methods under both one-shot and five-shot settings.more » « less