Multiple-surface segmentation in optical coherence tomography (OCT) images is a challenging problem, further complicated by the frequent presence of weak image boundaries. Recently, many deep learning-based methods have been developed for this task and yield remarkable performance. Unfortunately, due to the scarcity of training data in medical imaging, it is challenging for deep learning networks to learn the global structure of the target surfaces, including surface smoothness. To bridge this gap, this study proposes to seamlessly unify a U-Net for feature learning with a constrained differentiable dynamic programming module to achieve end-to-end learning for retina OCT surface segmentation to explicitly enforce surface smoothness. It effectively utilizes the feedback from the downstream model optimization module to guide feature learning, yielding better enforcement of global structures of the target surfaces. Experiments on Duke AMD (age-related macular degeneration) and JHU MS (multiple sclerosis) OCT data sets for retinal layer segmentation demonstrated that the proposed method was able to achieve subvoxel accuracy on both datasets, with the mean absolute surface distance (MASD) errors of 1.88 ± 1.96μmand 2.75 ± 0.94μm, respectively, over all the segmented surfaces.
more »
« less
This content will become publicly available on August 5, 2026
Octascope: A Lightweight Pre-Trained Model for Optical Coherence Tomography
Optical coherence tomography (OCT) imaging enables high resolution visualization of sub-surface tissue microstructures. However, OCT image analysis using deep learning is hampered by limited diverse training data to meet performance requirements and high inference latency for real-time applications. To address these challenges, we developed Octascope, a lightweight domain-specific convolutional neural network (CNN) - based model designed for OCT image analysis. Octascope was pre-trained using a curriculum learning approach, which involves sequential training, first on natural images (ImageNet), then on OCT images from retinal, abdominal, and renal tissues, to progressively acquire transferable knowledge. This multi-domain pre-training enables Octascope to generalize across varied tissue types. In two downstream tasks, Octascope demonstrated notable improvements in predictive accuracy compared to alternative approaches. In the epidural tissue detection task, our method surpassed single-task learning with fine-tuning by 9.13% and OCT-specific transfer learning by 5.95% in accuracy. Octascope outperformed VGG16 and ResNet50 by 5.36% and 6.66% in a retinal diagnosis task, respectively. In comparison to a Transformer-based OCT foundation model - RETFound, Octascope delivered 2 to 4.4 times faster inference speed with slightly better predictive accuracies in both downstream tasks. Octascope represented a significant advancement for OCT image analysis by providing an effective balance between computational efficiency and diagnostic accuracy for real-time clinical applications.
more »
« less
- Award ID(s):
- 2331409
- PAR ID:
- 10658497
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Access
- Volume:
- 13
- ISSN:
- 2169-3536
- Page Range / eLocation ID:
- 138005 to 138019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications.more » « less
-
AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.more » « less
-
AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy.more » « less
-
In the field of materials science, microscopy is the first and often only accessible method for structural characterization. There is a growing interest in the development of machine learning methods that can automate the analysis and interpretation of microscopy images. Typically training of machine learning models requires large numbers of images with associated structural labels, however, manual labeling of images requires domain knowledge and is prone to human error and subjectivity. To overcome these limitations, we present a semi-supervised transfer learning approach that uses a small number of labeled microscopy images for training and performs as effectively as methods trained on significantly larger image datasets. Specifically, we train an image encoder with unlabeled images using self-supervised learning methods and use that encoder for transfer learning of different downstream image tasks (classification and segmentation) with a minimal number of labeled images for training. We test the transfer learning ability of two self-supervised learning methods: SimCLR and Barlow-Twins on transmission electron microscopy (TEM) images. We demonstrate in detail how this machine learning workflow applied to TEM images of protein nanowires enables automated classification of nanowire morphologies ( e.g. , single nanowires, nanowire bundles, phase separated) as well as segmentation tasks that can serve as groundwork for quantification of nanowire domain sizes and shape analysis. We also extend the application of the machine learning workflow to classification of nanoparticle morphologies and identification of different type of viruses from TEM images.more » « less
An official website of the United States government
