skip to main content


Title: Multicamera 3D Viewpoint Adjustment for Robotic Surgery via Deep Reinforcement Learning
While robot-assisted minimally invasive surgery (RMIS) procedures afford a variety of benefits over open surgery and manual laparoscopic operations (including increased tool dexterity, reduced patient pain, incision size, trauma and recovery time, and lower infection rates [ 1 ], lack of spatial awareness remains an issue. Typical laparoscopic imaging can lack sufficient depth cues and haptic feedback, if provided, rarely reflects realistic tissue–tool interactions. This work is part of a larger ongoing research effort to reconstruct 3D surfaces using multiple viewpoints in RMIS to increase visual perception. The manual placement and adjustment of multicamera systems in RMIS are nonideal and prone to error [ 2 ], and other autonomous approaches focus on tool tracking and do not consider reconstruction of the surgical scene [ 3 , 4 , 5 ]. The group’s previous work investigated a novel, context-aware autonomous camera positioning method [ 6 ], which incorporated both tool location and scene coverage for multiple camera viewpoint adjustments. In this paper, the authors expand upon this prior work by implementing a streamlined deep reinforcement learning approach between optimal viewpoints calculated using the prior method [ 6 ] which encourages discovery of otherwise unobserved and additional camera viewpoints. Combining the framework and robustness of the previous work with the efficiency and additional viewpoints of the augmentations presented here results in improved performance and scene coverage promising towards real-time implementation.  more » « less
Award ID(s):
2101107
NSF-PAR ID:
10326935
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Medical Robotics Research
Volume:
06
Issue:
01n02
ISSN:
2424-905X
Page Range / eLocation ID:
2140003
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In modern industrial manufacturing processes, robotic manipulators are routinely used in the assembly, packaging, and material handling operations. During production, changing end-of-arm tooling is frequently necessary for process flexibility and reuse of robotic resources. In conventional operation, a tool changer is sometimes employed to load and unload end-effectors, however, the robot must be manually taught to locate the tool changers by operators via a teach pendant. During tool change teaching, the operator takes considerable effort and time to align the master and tool side of the coupler by adjusting the motion speed of the robotic arm and observing the alignment from different viewpoints. In this paper, a custom robotic system, the NeXus, was programmed to locate and change tools automatically via an RGB-D camera. The NeXus was configured as a multi-robot system for multiple tasks including assembly, bonding, and 3D printing of sensor arrays, solar cells, and microrobot prototypes. Thus, different tools are employed by an industrial robotic arm to position grippers, printers, and other types of end-effectors in the workspace. To improve the precision and cycle-time of the robotic tool change, we mounted an eye-in-hand RGB-D camera and employed visual servoing to automate the tool change process. We then compared the teaching time of the tool location using this system and compared the cycle time with those of 6 human operators in the manual mode. We concluded that the tool location time in automated mode, on average, more than two times lower than the expert human operators. 
    more » « less
  2. Abstract

    This paper presents coupled and decoupled multi‐autonomous underwater vehicle (AUV) motion planning approaches for maximizing information gain. The work is motivated by applications in which multiple AUVs are tasked with obtaining video footage for the photogrammetric reconstruction of underwater archeological sites. Each AUV is equipped with a video camera and side‐scan sonar. The side‐scan sonar is used to initially collect low‐resolution data to construct an information map of the site. Coupled and decoupled motion planning approaches with respect to this map are presented. Both planning methods seek to generate multi‐AUV trajectories that capture close‐up video footage of a site from a variety of different viewpoints, building on prior work in single‐AUV rapidly exploring random tree (RRT) motion planning. The coupled and decoupled planners are compared in simulation. In addition, the multiple AUV trajectories constructed by each planner were executed at archeological sites located off the coast of Malta, albeit by a single‐AUV due to limited resources. Specifically, each AUV trajectory for a plan was executed in sequence instead of simultaneously. Modifications are also made by both planners to a baseline RRT algorithm. The results of the paper present a number of trade‐offs between the two planning approaches and demonstrate a large improvement in map coverage efficiency and runtime.

     
    more » « less
  3. Advancements in robot-assisted surgery have been rapidly growing since two decades ago. More recently, the automation of robotic surgical tasks has become the focus of research. In this area, the detection and tracking of a surgical tool are crucial for an autonomous system to plan and perform a procedure. For example, knowing the position and posture of a needle is a prerequisite for an automatic suturing system to grasp it and perform suturing tasks. In this paper, we proposed a novel method, based on Deep Learning and Point-to-point Registration, to track the 6 degrees of freedom (DOF) pose of a metal suture needle from a robotic endoscope (an Endoscopic Camera Manipulator from the da Vinci Robotic Surgical Systems), without the help of any marker. The proposed approach was implemented and evaluated in a standard simulated surgical environment provided by the 2021–2022 AccelNet Surgical Robotics Challenge, thus demonstrates the potential to be translated into a real-world scenario. A customized dataset containing 836 images collected from the simulated scene with ground truth of poses and key points information was constructed to train the neural network model. The best pipeline achieved an average position error of 1.76 mm while the average orientation error is 8.55 degrees, and it can run up to 10 Hz on a PC. 
    more » « less
  4. Laparoscopic surgery presents practical benefits over traditional open surgery, including reduced risk of infection, discomfort and recovery time for patients. Introducing robot systems into surgical tasks provides additional enhancements, including improved precision, remote operation, and an intelligent software layer capable of filtering aberrant motion and scaling surgical maneuvers. However, the software interface in telesurgery also lends itself to potential adversarial cyber attacks. Such attacks can negatively effect both surgeon motion commands and sensory information relayed to the operator. To combat cyber attacks on the latter, one method to enhance surgeon feedback through multiple sensory pathways is to incorporate reliable, complementary forms of information across different sensory modes. Built-in partial redundancies or inferences between perceptual channels, or perception complementarities, can be used both to detect and recover from compromised operator feedback. In surgery, haptic sensations are extremely useful for surgeons to prevent undue and unwanted tissue damage from excessive tool-tissue force. Direct force sensing is not yet deployable due to sterilization requirements of the operating room. Instead, combinations of other sensing methods may be relied upon, such as noncontact model-based force estimation. This paper presents the design of a surgical simulator software that can be used for vision-based non-contact force sensing to inform the perception complementarity of vision and force feedback for telesurgery. A brief user study is conducted to verify the efficacy of graphical force feedback from vision-based force estimation, and suggests that vision may effectively complement direct force sensing. 
    more » « less
  5. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The Temple University Hospital Seizure Detection Corpus (TUSZ) [1] has been in distribution since April 2017. It is a subset of the TUH EEG Corpus (TUEG) [2] and the most frequently requested corpus from our 3,000+ subscribers. It was recently featured as the challenge task in the Neureka 2020 Epilepsy Challenge [3]. A summary of the development of the corpus is shown below in Table 1. The TUSZ Corpus is a fully annotated corpus, which means every seizure event that occurs within its files has been annotated. The data is selected from TUEG using a screening process that identifies files most likely to contain seizures [1]. Approximately 7% of the TUEG data contains a seizure event, so it is important we triage TUEG for high yield data. One hour of EEG data requires approximately one hour of human labor to complete annotation using the pipeline described below, so it is important from a financial standpoint that we accurately triage data. A summary of the labels being used to annotate the data is shown in Table 2. Certain standards are put into place to optimize the annotation process while not sacrificing consistency. Due to the nature of EEG recordings, some records start off with a segment of calibration. This portion of the EEG is instantly recognizable and transitions from what resembles lead artifact to a flat line on all the channels. For the sake of seizure annotation, the calibration is ignored, and no time is wasted on it. During the identification of seizure events, a hard “3 second rule” is used to determine whether two events should be combined into a single larger event. This greatly reduces the time that it takes to annotate a file with multiple events occurring in succession. In addition to the required minimum 3 second gap between seizures, part of our standard dictates that no seizure less than 3 seconds be annotated. Although there is no universally accepted definition for how long a seizure must be, we find that it is difficult to discern with confidence between burst suppression or other morphologically similar impressions when the event is only a couple seconds long. This is due to several reasons, the most notable being the lack of evolution which is oftentimes crucial for the determination of a seizure. After the EEG files have been triaged, a team of annotators at NEDC is provided with the files to begin data annotation. An example of an annotation is shown in Figure 1. A summary of the workflow for our annotation process is shown in Figure 2. Several passes are performed over the data to ensure the annotations are accurate. Each file undergoes three passes to ensure that no seizures were missed or misidentified. The first pass of TUSZ involves identifying which files contain seizures and annotating them using our annotation tool. The time it takes to fully annotate a file can vary drastically depending on the specific characteristics of each file; however, on average a file containing multiple seizures takes 7 minutes to fully annotate. This includes the time that it takes to read the patient report as well as traverse through the entire file. Once an event has been identified, the start and stop time for the seizure is stored in our annotation tool. This is done on a channel by channel basis resulting in an accurate representation of the seizure spreading across different parts of the brain. Files that do not contain any seizures take approximately 3 minutes to complete. Even though there is no annotation being made, the file is still carefully examined to make sure that nothing was overlooked. In addition to solely scrolling through a file from start to finish, a file is often examined through different lenses. Depending on the situation, low pass filters are used, as well as increasing the amplitude of certain channels. These techniques are never used in isolation and are meant to further increase our confidence that nothing was missed. Once each file in a given set has been looked at once, the annotators start the review process. The reviewer checks a file and comments any changes that they recommend. This takes about 3 minutes per seizure containing file, which is significantly less time than the first pass. After each file has been commented on, the third pass commences. This step takes about 5 minutes per seizure file and requires the reviewer to accept or reject the changes that the second reviewer suggested. Since tangible changes are made to the annotation using the annotation tool, this step takes a bit longer than the previous one. Assuming 18% of the files contain seizures, a set of 1,000 files takes roughly 127 work hours to annotate. Before an annotator contributes to the data interpretation pipeline, they are trained for several weeks on previous datasets. A new annotator is able to be trained using data that resembles what they would see under normal circumstances. An additional benefit of using released data to train is that it serves as a means of constantly checking our work. If a trainee stumbles across an event that was not previously annotated, it is promptly added, and the data release is updated. It takes about three months to train an annotator to a point where their annotations can be trusted. Even though we carefully screen potential annotators during the hiring process, only about 25% of the annotators we hire survive more than one year doing this work. To ensure that the annotators are consistent in their annotations, the team conducts an interrater agreement evaluation periodically to ensure that there is a consensus within the team. The annotation standards are discussed in Ochal et al. [4]. An extended discussion of interrater agreement can be found in Shah et al. [5]. The most recent release of TUSZ, v1.5.2, represents our efforts to review the quality of the annotations for two upcoming challenges we hosted: an internal deep learning challenge at IBM [6] and the Neureka 2020 Epilepsy Challenge [3]. One of the biggest changes that was made to the annotations was the imposition of a stricter standard for determining the start and stop time of a seizure. Although evolution is still included in the annotations, the start times were altered to start when the spike-wave pattern becomes distinct as opposed to merely when the signal starts to shift from background. This cuts down on background that was mislabeled as a seizure. For seizure end times, all post ictal slowing that was included was removed. The recent release of v1.5.2 did not include any additional data files. Two EEG files had been added because, originally, they were corrupted in v1.5.1 but were able to be retrieved and added for the latest release. The progression from v1.5.0 to v1.5.1 and later to v1.5.2, included the re-annotation of all of the EEG files in order to develop a confident dataset regarding seizure identification. Starting with v1.4.0, we have also developed a blind evaluation set that is withheld for use in competitions. The annotation team is currently working on the next release for TUSZ, v1.6.0, which is expected to occur in August 2020. It will include new data from 2016 to mid-2019. This release will contain 2,296 files from 2016 as well as several thousand files representing the remaining data through mid-2019. In addition to files that were obtained with our standard triaging process, a part of this release consists of EEG files that do not have associated patient reports. Since actual seizure events are in short supply, we are mining a large chunk of data for which we have EEG recordings but no reports. Some of this data contains interesting seizure events collected during long-term EEG sessions or data collected from patients with a history of frequent seizures. It is being mined to increase the number of files in the corpus that have at least one seizure event. We expect v1.6.0 to be released before IEEE SPMB 2020. The TUAR Corpus is an open-source database that is currently available for use by any registered member of our consortium. To register and receive access, please follow the instructions provided at this web page: https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml. The data is located here: https://www.isip.piconepress.com/projects/tuh_eeg/downloads/tuh_eeg_artifact/v2.0.0/. 
    more » « less