skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-Time Camera Localization during Robot-Assisted Telecystoscopy for Bladder Cancer Surveillance
Telecystoscopy can lower the barrier to access critical urologic diagnostics for patients around the world. A major challenge for robotic control of flexible cystoscopes and intuitive teleoperation is the pose estimation of the scope tip. We propose a novel real-time camera localization method using video recordings from a prior cystoscopy and 3D bladder reconstruction to estimate cystoscope pose within the bladder during follow-up telecystoscopy. We map prior video frames into a low-dimensional space as a dictionary so that a new image can be likewise mapped to efficiently retrieve its nearest neighbor among the dictionary images. The cystoscope pose is then estimated by the correspondence among the new image, its nearest dictionary image, and the prior model from 3D reconstruction. We demonstrate performance of our methods using bladder phantoms with varying fidelity and a servo-controlled cystoscope to simulate the use case of bladder surveillance through telecystoscopy. The servo-controlled cystoscope with 3 degrees of freedom (angulation, roll, and insertion axes) was developed for collecting cystoscope videos from bladder phantoms. Cystoscope videos were acquired in a 2.5D bladder phantom (bladder-shape cross-section plus height) with a panorama of a urothelium attached to the inner surface. Scans of the 2.5D phantom were performed in separate arc trajectories each of which is generated by actuation on the angulation with a fixed roll and insertion length. We further included variance in moving speed, imaging distance and existence of bladder tumors. Cystoscope videos were also acquired in a water-filled 3D silicone bladder phantom with hand-painted vasculature. Scans of the 3D phantom were performed in separate circle trajectories each of which is generated by actuation on the roll axis under a fixed angulation and insertion length. These videos were used to create 3D reconstructions, dictionary sets, and test data sets for evaluating the computational efficiency and accuracy of our proposed method in comparison with a method based on global Scale-Invariant Feature Transform (SIFT) features, named SIFT-only. Our method can retrieve the nearest dictionary image for 94–100% of test frames in under 55[Formula: see text]ms per image, whereas the SIFT-only method can only find the image match for 56–100% of test frames in 6000–40000[Formula: see text]ms per image depending on size of the dictionary set and richness of SIFT features in the images. Our method, with a speed of around 20 Hz for the retrieval stage, is a promising tool for real-time image-based scope localization in robotic cystoscopy when prior cystoscopy images are available.  more » « less
Award ID(s):
1631146
PAR ID:
10331716
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Medical Robotics Research
ISSN:
2424-905X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective. The insertion of penetrating neural probes into the brain is crucial for advancing neuroscience, yet it involves various inherent risks. Prototype probes are typically inserted into hydrogel-based brain phantoms and the mechanical responses are analyzed in order to inform the insertion mechanics duringin vivoimplantation. However, the underlying mechanism of the insertion dynamics of neural probes in hydrogel brain phantoms, particularly the phenomenon of cracking, remains insufficiently understood. This knowledge gap leads to misinterpretations and discrepancies when comparing results obtained from phantom studies to those observed under thein vivoconditions. This study aims to elucidate the impact of probe sharpness and dimensions on the cracking mechanisms and insertion dynamics characterized during the insertion of probes in hydrogel phantoms.Approach. The insertion of dummy probes with different shank shapes defined by the tip angle, width, and thickness is systematically studied. The insertion-induced cracks in the transparent hydrogel were accentuated by an immiscible dye, tracked byin situimaging, and the corresponding insertion force was recorded. Three-dimensional finite element analysis models were developed to obtain the contact stress between the probe tip and the phantom.Main results. The findings reveal a dual pattern: for sharp, slender probes, the insertion forces remain consistently low during the insertion process, owing to continuously propagating straight cracks that align with the insertion direction. In contrast, blunt, thick probes induce large forces that increase rapidly with escalating insertion depth, mainly due to the formation of branched crack with a conical cracking surface, and the subsequent internal compression. This interpretation challenges the traditional understanding that neglects the difference in the cracking modes and regards increased frictional force as the sole factor contributing to higher insertion forces. The critical probe sharpness factors separating straight and branched cracking is identified experimentally, and a preliminary explanation of the transition between the two cracking modes is derived from three-dimensional finite element analysis.Significance. This study presents, for the first time, the mechanism underlying two distinct cracking modes during the insertion of neural probes into hydrogel brain phantoms. The correlations between the cracking modes and the insertion force dynamics, as well as the effects of the probe sharpness were established, offering insights into the design of neural probes via phantom studies and informing future investigations into cracking phenomena in brain tissue during probe implantations. 
    more » « less
  2. The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. This alleviates both occlusion and mismatch while preserving the signal, thereby easing the design of models for downstream robotics tasks. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention). Our experiments demonstrate the effectiveness of VIDM at improving inpainting quality on egocentric videos and the power of our factored representation for numerous tasks: object detection, 3D reconstruction of manipulated objects, and learning of reward functions, policies, and affordances from videos. 
    more » « less
  3. One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos. 
    more » « less
  4. A body of studies has proposed to obtain high-quality images from low-dose and noisy Computed Tomography (CT) scans for radiation reduction. However, these studies are designed for population-level data without considering the variation in CT devices and individuals, limiting the current approaches' performance, especially for ultra-low-dose CT imaging. Here, we proposed PIMA-CT, a physical anthropomorphic phantom model integrating an unsupervised learning framework, using a novel deep learning technique called Cyclic Simulation and Denoising (CSD), to address these limitations. We first acquired paired low-dose and standard-dose CT scans of the phantom and then developed two generative neural networks: noise simulator and denoiser. The simulator extracts real low-dose noise and tissue features from two separate image spaces (e.g., low-dose phantom model scans and standard-dose patient scans) into a unified feature space. Meanwhile, the denoiser provides feedback to the simulator on the quality of the generated noise. In this way, the simulator and denoiser cyclically interact to optimize network learning and ease the denoiser to simultaneously remove noise and restore tissue features. We thoroughly evaluate our method for removing both real low-dose noise and Gaussian simulated low-dose noise. The results show that CSD outperforms one of the state-of-the-art denoising algorithms without using any labeled data (actual patients' low-dose CT scans) nor simulated low-dose CT scans. This study may shed light on incorporating physical models in medical imaging, especially for ultra-low level dose CT scans restoration. 
    more » « less
  5. SUMMARY An efficient method for tracking a target using a single Pan-Tilt-Zoom (PTZ) camera is proposed. The proposed Scale-Invariant Optical Flow (SIOF) method estimates the motion of the target and rotates the camera accordingly to keep the target at the center of the image. Also, SIOF estimates the scale of the target and changes the focal length relatively to adjust the Field of View (FoV) and keep the target appear in the same size in all captured frames. SIOF is a feature-based tracking method. Feature points used are extracted and tracked using Optical Flow (OF) and Scale-Invariant Feature Transform (SIFT). They are combined in groups and used to achieve robust tracking. The feature points in these groups are used within a twist model to recover the 3D free motion of the target. The merits of this proposed method are (i) building an efficient scale-invariant tracking method that tracks the target and keep it in the FoV of the camera with the same size, and (ii) using tracking with prediction and correction to speed up the PTZ control and achieve smooth camera control. Experimental results were performed on online video streams and validated the efficiency of the proposed method SIOF, comparing with OF, SIFT, and other tracking methods. The proposed SIOF has around 36% less average tracking error and around 70% less tracking overshoot than OF. 
    more » « less