Abstract The development of high-resolution microscopes has made it possible to investigate cellular processes in 3D and over time. However, observing fast cellular dynamics remains challenging because of photobleaching and phototoxicity. Here we report the implementation of two content-aware frame interpolation (CAFI) deep learning networks, Zooming SlowMo and Depth-Aware Video Frame Interpolation, that are highly suited for accurately predicting images in between image pairs, therefore improving the temporal resolution of image series post-acquisition. We show that CAFI is capable of understanding the motion context of biological structures and can perform better than standard interpolation methods. We benchmark CAFI’s performance on 12 different datasets, obtained from four different microscopy modalities, and demonstrate its capabilities for single-particle tracking and nuclear segmentation. CAFI potentially allows for reduced light exposure and phototoxicity on the sample for improved long-term live-cell imaging. The models and the training and testing data are available via the ZeroCostDL4Mic platform.
more »
« less
This content will become publicly available on March 26, 2026
Generative frame interpolation enhances tracking of biological objects in time-lapse microscopy
Abstract Object tracking in microscopy videos is crucial for understanding biological processes. While existing methods often require fine-tuning tracking algorithms to fit the image dataset, here we explored an alternative paradigm: augmenting the image time-lapse dataset to fit the tracking algorithm. To test this approach, we evaluated whether generative video frame interpolation can augment the temporal resolution of time-lapse microscopy and facilitate object tracking in multiple biological contexts. We systematically compared the capacity of Latent Diffusion Model for Video Frame Interpolation (LDMVFI), Real-time Intermediate Flow Estimation (RIFE), Compression-Driven Frame Interpolation (CDFI), and Frame Interpolation for Large Motion (FILM) to generate synthetic microscopy images derived from interpolating real images. Our testing image time series ranged from fluorescently labeled nuclei to bacteria, yeast, cancer cells, and organoids. We showed that the off-the-shelf frame interpolation algorithms produced bio-realistic image interpolation even without dataset-specific retraining, as judged by high structural image similarity and the capacity to produce segmentations that closely resemble results from real images. Using a simple tracking algorithm based on mask overlap, we confirmed that frame interpolation significantly improved tracking across several datasets without requiring extensive parameter tuning and capturing complex trajectories that were difficult to resolve in the original image time series. Taken together, our findings highlight the potential of generative frame interpolation to improve tracking in time-lapse microscopy across diverse scenarios, suggesting that a generalist tracking algorithm for microscopy could be developed by combining deep learning segmentation models with generative frame interpolation.
more »
« less
- Award ID(s):
- 2235451
- PAR ID:
- 10612012
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Traffic intersections are prime locations for deployment of infrastructure sensors and edge computing nodes to realize the vision of a smart city. It is expected that the needs of a smart city, in regards to traffic and pedestrian traffic systems monitored by cameras/video, can be met by using stateof-the-art artificial-intelligence (AI) based object detectors and trackers. A critical component in designing an effective real-time object detection/tracking pipeline is the understanding of how object density, i.e., the number of objects in a scene, and imageresolution and frame rate influence the performance metrics. This study explores the accuracy and speed metrics with the goal of supporting pipelines that meet the precision and latency needs of a real-time environment. We examine the impact of varying image-resolution, frame rate and object-density on the object detection performance metrics. The experiments on the COSMOS testbed dataset show that varying the frame width from 416 pixels to 832 pixels, and cropping the images to a square resolution, result in the increase in average precision for all object classes. Decreasing the frame rate from 15 fps to 5 fps preserves more than 90% of the highest F1 score achieved for all object classes. The results inform the choice of video preprocessing stages, modifications to established AI-based object detection/tracking methods, and suggest optimal hyper-parameter values. Index Terms—Object Detection, Smart City, Video Resolution, Deep Learning Models.more » « less
-
Comprehensive Underwater Object Tracking Benchmark Dataset and Underwater Image Enhancement With GANAbstract—Current state-of-the-art object tracking methods have largely benefited from the public availability of numerous benchmark datasets. However, the focus has been on open-air imagery and much less on underwater visual data. Inherent underwater distortions, such as color loss, poor contrast, and underexposure, caused by attenuation of light, refraction, and scattering, greatly affect the visual quality of underwater data, and as such, existing open-air trackers perform less efficiently on such data. To help bridge this gap, this article proposes a first comprehensive underwater object tracking (UOT100) benchmark dataset to facilitate the development of tracking algorithms well-suited for underwater environments. The proposed dataset consists of 104 underwater video sequences and more than 74 000 annotated frames derived from both natural and artificial underwater videos, with great varieties of distortions. We benchmark the performance of 20 state-of-the-art object tracking algorithms and further introduce a cascaded residual network for underwater image enhancement model to improve tracking accuracy and success rate of trackers. Our experimental results demonstrate the shortcomings of existing tracking algorithms on underwater data and how our generative adversarial network (GAN)-based enhancement model can be used to improve tracking performance. We also evaluate the visual quality of our model’s output against existing GAN-based methods using well-accepted quality metrics and demonstrate that our model yields better visual data. Index Terms—Underwater benchmark dataset, underwater generative adversarial network (GAN), underwater image enhancement (UIE), underwater object tracking (UOT).more » « less
-
Image sensors with programmable region-of-interest (ROI) readout are a new sensing technology important for energyefficient embedded computer vision. In particular, ROIs can subsample the number of pixels being readout while performing single object tracking in a video. In this paper, we develop adaptive sampling algorithms which perform joint object tracking and predictive video subsampling. We utilize an object detection consisting of either mean shift tracking or a neural network, coupled with a Kalman filter for prediction. We show that our algorithms achieve mean average precision of 0.70 or higher on a dataset of 20 videos in software. Further, we implement hardware acceleration of mean shift tracking with Kalman filter adaptive subsampling on an FPGA. Hardware results show a 23× improvement in clock cycles and latency as compared to baseline methods and achieves 38FPS real-time performance. This research points to a new domain of hardware-software co-design for adaptive video subsampling in embedded computer vision.more » « less
-
Video object segmentation is challenging yet important in a wide variety of applications for video analysis. Recent works formulate video object segmentation as a prediction task using deep nets to achieve appealing state-of-the-art performance. Due to the formulation as a prediction task, most of these methods require fine-tuning during test time, such that the deep nets memorize the appearance of the objects of interest in the given video. However, fine-tuning is time-consuming and computationally expensive, hence the algorithms are far from real time. To address this issue, we develop a novel matching based algorithm for video object segmentation. In contrast to memorization based classification techniques, the proposed approach learns to match extracted features to a provided template without memorizing the appearance of the objects. We validate the effectiveness and the robustness of the proposed method on the challenging DAVIS-16, DAVIS-17, Youtube-Objects and JumpCut datasets. Extensive results show that our method achieves comparable performance without fine-tuning and is much more favorable in terms of computational time.more » « less
An official website of the United States government
