Accurate semantic image segmentation from medical imaging can enable intelligent vision-based assistance in robot-assisted minimally invasive surgery. The human body and surgical procedures are highly dynamic. While machine-vision presents a promising approach, sufficiently large training image sets for robust performance are either costly or unavailable. This work examines three novel generative adversarial network (GAN) methods of providing usable synthetic tool images using only surgical background images and a few real tool images. The best of these three novel approaches generates realistic tool textures while preserving local background content by incorporating both a style preservation and a content loss component into the proposed multi-level loss function. The approach is quantitatively evaluated, and results suggest that the synthetically generated training tool images enhance UNet tool segmentation performance. More specifically, with a random set of 100 cadaver and live endoscopic images from the University of Washington Sinus Dataset, the UNet trained with synthetically generated images using the presented method resulted in 35.7% and 30.6% improvement over using purely real images in mean Dice coefficient and Intersection over Union scores, respectively. This study is promising towards the use of more widely available and routine screening endoscopy to preoperatively generate synthetic training tool images for intraoperative UNet tool segmentation.
more »
« less
Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized Generative Adversarial Network (VQ-GAN)) have been used to generate images but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual Large Language Models (LLMs). Stable Diffusion and DreamBooth Low-Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models Big Transfer (BiT), Fixed Resolution Residual Next Generation Network (FixResNeXt), and Efficient Neural Network (EfficientNet) were 92%, 91%, and 86%, respectively. Vision Transformer (ViT) and Data-Efficient Image Transformers (DeiT) had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using Segment Anything Model (SAM). Then, five segmentation models (U-Net, Pyramid Scene Parsing Network (PSNet), Feature Pyramid Network (FPN), Link Network (LinkNet), and Multi-scale Attention Network (MANet)) were adopted. FPN produced excellent results, with an Intersection Over Union (IoU) of 0.64, an F1 score of 0.78, a recall of 0.75, and a Dice coefficient of 0.77. This demonstrates strong performance in terms of both segmentation accuracy and overlap metrics, with particularly robust results in balanced detection capability as shown by the high F1 score and Dice coefficient. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection.
more »
« less
- Award ID(s):
- 2131207
- PAR ID:
- 10671993
- Publisher / Repository:
- MDPI, Algorithms
- Date Published:
- Journal Name:
- Algorithms
- Volume:
- 18
- Issue:
- 3
- ISSN:
- 1999-4893
- Page Range / eLocation ID:
- 155
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Among the non-invasive Colorectal cancer (CRC) screening approaches, Computed Tomography Colonography (CTC) and Virtual Colonoscopy (VC), are much more accurate. This work proposes an AI-based polyp detection framework for virtual colonoscopy (VC). Two main steps are addressed in this work: automatic segmentation to isolate the colon region from its background, and automatic polyp detection. Moreover, we evaluate the performance of the proposed framework on low-dose Computed Tomography (CT) scans. We build on our visualization approach, Fly-In (FI), which provides “filet”-like projections of the internal surface of the colon. The performance of the Fly-In approach confirms its ability with helping gastroenterologists, and it holds a great promise for combating CRC. In this work, these 2D projections of FI are fused with the 3D colon representation to generate new synthetic images. The synthetic images are used to train a RetinaNet model to detect polyps. The trained model has a 94% f1-score and 97% sensitivity. Furthermore, we study the effect of dose variation in CT scans on the performance of the the FI approach in polyp visualization. A simulation platform is developed for CTC visualization using FI, for regular CTC and low-dose CTC. This is accomplished using a novel AI restoration algorithm that enhances the Low-Dose CT images so that a 3D colon can be successfully reconstructed and visualized using the FI approach. Three senior board-certified radiologists evaluated the framework for the peak voltages of 30 KV, and the average relative sensitivities of the platform were 92%, whereas the 60 KV peak voltage produced average relative sensitivities of 99.5%.more » « less
-
Traditionally, a high-performance microscope with a large numerical aperture is required to acquire high-resolution images. However, the images’ size is typically tremendous. Therefore, they are not conveniently managed and transferred across a computer network or stored in a limited computer storage system. As a result, image compression is commonly used to reduce image size resulting in poor image resolution. Here, we demonstrate custom convolution neural networks (CNNs) for both super-resolution image enhancement from low-resolution images and characterization of both cells and nuclei from hematoxylin and eosin (H&E) stained breast cancer histopathological images by using a combination of generator and discriminator networks so-called super-resolution generative adversarial network-based on aggregated residual transformation (SRGAN-ResNeXt) to facilitate cancer diagnosis in low resource settings. The results provide high enhancement in image quality where the peak signal-to-noise ratio and structural similarity of our network results are over 30 dB and 0.93, respectively. The derived performance is superior to the results obtained from both the bicubic interpolation and the well-known SRGAN deep-learning methods. In addition, another custom CNN is used to perform image segmentation from the generated high-resolution breast cancer images derived with our model with an average Intersection over Union of 0.869 and an average dice similarity coefficient of 0.893 for the H&E image segmentation results. Finally, we propose the jointly trained SRGAN-ResNeXt and Inception U-net Models, which applied the weights from the individually trained SRGAN-ResNeXt and inception U-net models as the pre-trained weights for transfer learning. The jointly trained model’s results are progressively improved and promising. We anticipate these custom CNNs can help resolve the inaccessibility of advanced microscopes or whole slide imaging (WSI) systems to acquire high-resolution images from low-performance microscopes located in remote-constraint settings.more » « less
-
In the medical sector, three-dimensional (3D) images are commonly used like computed tomography (CT) and magnetic resonance imaging (MRI). The 3D MRI is a non-invasive method of studying the soft-tissue structures in a knee joint for osteoarthritis studies. It can greatly improve the accuracy of segmenting structures such as cartilage, bone marrow lesion, and meniscus by identifying the bone structure first. U-net is a convolutional neural network that was originally designed to segment the biological images with limited training data. The input of the original U-net is a single 2D image and the output is a binary 2D image. In this study, we modified the U-net model to identify the knee bone structures using 3D MRI, which is a sequence of 2D slices. A fully automatic model has been proposed to detect and segment knee bones. The proposed model was trained, tested, and validated using 99 knee MRI cases where each case consists of 160 2D slices for a single knee scan. To evaluate the model’s performance, the similarity, dice coefficient (DICE), and area error metrics were calculated. Separate models were trained using different knee bone components including tibia, femur, patella, as well as a combined model for segmenting all the knee bones. Using the whole MRI sequence (160 slices), the method was able to detect the beginning and ending bone slices first, and then segment the bone structures for all the slices in between. On the testing set, the detection model accomplished 98.79% accuracy and the segmentation model achieved DICE 96.94% and similarity 93.98%. The proposed method outperforms several state-of-the-art methods, i.e., it outperforms U-net by 3.68%, SegNet by 14.45%, and FCN-8 by 2.34%, in terms of DICE score using the same dataset.more » « less
-
Machine learning (ML) based skin cancer detection tools are an example of a transformative medical technology that could potentially democratize early detection for skin cancer cases for everyone. However, due to the dependency of datasets for training, ML based skin cancer detection always suffers from a systemic racial bias. Racial communities and ethnicity not well represented within the training datasets will not be able to use these tools, leading to health disparities being amplified. Based on empirical observations we posit that skin cancer training data is biased as it’s dataset represents mostly communities of lighter skin tones, despite skin cancer being far more lethal for people of color. In this paper we use domain adaptation techniques by employing CycleGANs to mitigate racial biases existing within state of the art machine learning based skin cancer detection tools by adapting minority images to appear as the majority. Using our domain adaptation techniques to augment our minority datasets, we are able to improve the accuracy, precision, recall, and F1 score of typical image classification machine learning models for skin cancer classification from the biased 50% accuracy rate to a 79% accuracy rate when testing on minority skin tone images. We evaluate and demonstrate a proof-of-concept smartphone application.more » « less
An official website of the United States government

