Abstract Object tracking in microscopy videos is crucial for understanding biological processes. While existing methods often require fine-tuning tracking algorithms to fit the image dataset, here we explored an alternative paradigm: augmenting the image time-lapse dataset to fit the tracking algorithm. To test this approach, we evaluated whether generative video frame interpolation can augment the temporal resolution of time-lapse microscopy and facilitate object tracking in multiple biological contexts. We systematically compared the capacity of Latent Diffusion Model for Video Frame Interpolation (LDMVFI), Real-time Intermediate Flow Estimation (RIFE), Compression-Driven Frame Interpolation (CDFI), and Frame Interpolation for Large Motion (FILM) to generate synthetic microscopy images derived from interpolating real images. Our testing image time series ranged from fluorescently labeled nuclei to bacteria, yeast, cancer cells, and organoids. We showed that the off-the-shelf frame interpolation algorithms produced bio-realistic image interpolation even without dataset-specific retraining, as judged by high structural image similarity and the capacity to produce segmentations that closely resemble results from real images. Using a simple tracking algorithm based on mask overlap, we confirmed that frame interpolation significantly improved tracking across several datasets without requiring extensive parameter tuning and capturing complex trajectories that were difficult to resolve in the original image time series. Taken together, our findings highlight the potential of generative frame interpolation to improve tracking in time-lapse microscopy across diverse scenarios, suggesting that a generalist tracking algorithm for microscopy could be developed by combining deep learning segmentation models with generative frame interpolation.
more »
« less
Phasetime: Deep Learning Approach to Detect Nuclei in Time Lapse Phase Images
Time lapse microscopy is essential for quantifying the dynamics of cells, subcellular organelles and biomolecules. Biologists use different fluorescent tags to label and track the subcellular structures and biomolecules within cells. However, not all of them are compatible with time lapse imaging, and the labeling itself can perturb the cells in undesirable ways. We hypothesized that phase image has the requisite information to identify and track nuclei within cells. By utilizing both traditional blob detection to generate binary mask labels from the stained channel images and the deep learning Mask RCNN model to train a detection and segmentation model, we managed to segment nuclei based only on phase images. The detection average precision is 0.82 when the IoU threshold is to be set 0.5. And the mean IoU for masks generated from phase images and ground truth masks from experts is 0.735. Without any ground truth mask labels during the training time, this is good enough to prove our hypothesis. This result enables the ability to detect nuclei without the need for exogenous labeling.
more »
« less
- Award ID(s):
- 1705464
- PAR ID:
- 10110676
- Date Published:
- Journal Name:
- Journal of Clinical Medicine
- Volume:
- 8
- Issue:
- 8
- ISSN:
- 2077-0383
- Page Range / eLocation ID:
- 1159
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance. This sparse, vectorized boundary representation for objects, while attractive in many downstream computer vision tasks, quickly runs into issues of parity that need to be addressed: parity in supervision and parity in performance when compared to existing pixel-based methods. This is due in part to object instances being annotated with ground-truth in the form of polygonal boundaries or segmentation masks, yet being evaluated in a convenient manner using only segmentation masks. Our method, BoundaryFormer, is a Transformer based architecture that directly predicts polygons yet uses instance mask segmentations as the ground-truth supervision for computing the loss. We achieve this by developing an end-to-end differentiable model that solely relies on supervision within the mask space through differentiable rasterization. BoundaryFormer matches or surpasses the Mask R-CNN method in terms of instance segmentation quality on both COCO and Cityscapes while exhibiting significantly better transferability across datasets.more » « less
-
Training deep learning models for unbiased stereology requires a large data set with associated ground truth. However manual ground truth annotation is tedious, time-consuming, and expert dependent. We propose an active deep learning method for automatic stereology counts using a snapshot ensemble approach. The method provides a confidence score for each mask in an unlabeled pool that reduces user verification to only images with high information content for training the deep learning model. The proposed method reduces the error rate to less than 1% for unbiased stereology cell counts on immunostained brain cells compared to manual stereology and requires ~25% less expert verification time compared to a previously proposed iterative deep learning approach.more » « less
-
Abstract Segmentation of structural components in infrastructure inspection images is crucial for automated and accurate condition assessment. While deep neural networks hold great potential for this task, existing methods typically require fully annotated ground truth masks, which are time‐consuming and labor‐intensive to create. This paper introducesScribble‐supervised StructuralComponent SegmentationNetwork (ScribCompNet), the first weakly‐supervised method requiring only scribble annotations for multiclass structural component segmentation. ScribCompNet features a dual‐branch architecture with higher‐resolution refinement to enhance fine detail detection. It extends supervision from labeled to unlabeled pixels through a combined objective function, incorporating scribble annotation, dynamic pseudo label, semantic context enhancement, and scale‐adaptive harmony losses. Experimental results show that ScribCompNet outperforms other scribble‐supervised methods and most fully‐supervised counterparts, achieving 90.19% mean intersection over union (mIoU) with an 80% reduction in labeling time. Further evaluations confirm the effectiveness of the novel designs and robust performance, even with lower‐quality scribble annotations.more » « less
-
Observable reading behavior, the act of moving the eyes over lines of text, is highly stereotyped among the users of a language, and this has led to the development of reading detectors–methods that input windows of sequential fixations and output predictions of the fixation behavior during those windows being reading or skimming. The present study introduces a newmethod for reading detection using Region Ranking SVM (RRSVM). An SVM-based classifier learns the local oculomotor features that are important for real-time reading detection while it is optimizing for the global reading/skimming classification, making it unnecessary to hand-label local fixation windows for model training. This RRSVM reading detector was trained and evaluated using eye movement data collected in a laboratory context, where participants viewed modified web news articles and had to either read them carefully for comprehension or skim them quickly for the selection of keywords (separate groups). Ground truth labels were known at the global level (the instructed reading or skimming task), and obtained at the local level in a separate rating task. The RRSVM reading detector accurately predicted 82.5% of the global (article-level) reading/skimming behavior, with accuracy in predicting local window labels ranging from 72-95%, depending on how tuned the RRSVM was for local and global weights. With this RRSVM reading detector, a method now exists for near real-time reading detection without the need for hand-labeling of local fixation windows. With real-time reading detection capability comes the potential for applications ranging from education and training to intelligent interfaces that learn what a user is likely to know based on previous detection of their reading behavior.more » « less
An official website of the United States government

