skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Digital twins as a unifying framework for surgical data science: the enabling role of geometric scene understanding
Surgical data science is devoted to enhancing the quality, safety, and efficacy of interventional healthcare. While the use of powerful machine learning algorithms is becoming the standard approach for surgical data science, the underlying end-to-end task models directly infer high-level concepts (e.g., surgical phase or skill) from low-level observations (e.g., endoscopic video). This end-to-end nature of contemporary approaches makes the models vulnerable to non-causal relationships in the data and requires the re-development of all components if new surgical data science tasks are to be solved. The digital twin (DT) paradigm, an approach to building and maintaining computational representations of real-world scenarios, offers a framework for separating low-level processing from high-level inference. In surgical data science, the DT paradigm would allow for the development of generalist surgical data science approaches on top of the universal DT representation, deferring DT model building to low-level computer vision algorithms. In this latter effort of DT model creation, geometric scene understanding plays a central role in building and updating the digital model. In this work, we visit existing geometric representations, geometric scene understanding tasks, and successful applications for building primitive DT frameworks. Although the development of advanced methods is still hindered in surgical data science by the lack of annotations, the complexity and limited observability of the scene, emerging works on synthetic data generation, sim-to-real generalization, and foundation models offer new directions for overcoming these challenges and advancing the DT paradigm.  more » « less
Award ID(s):
2239077
PAR ID:
10520973
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
OAE Publishing
Date Published:
Journal Name:
Artificial Intelligence Surgery
Volume:
4
Issue:
3
ISSN:
2771-0408
Page Range / eLocation ID:
109 to 38
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes a low-cost interface and refined digital twin for the Raven-II surgical robot. Previous simulations of the Raven-II, e.g. via the Asynchronous Multibody Framework (AMBF), presented salient drawbacks, including control inputs inconsistent with Raven-II software, and lack of stable, high-fidelity physical contact simulations. This work bridges both of these gaps, both (1) enabling robust, simulated contact mechanics for dynamic physical interactions with the Raven-II, and (2) developing a universal input format for both simulated and physical platforms. The method furthermore proposes a low cost, commodity game-controller interface for controlling both virtual and real realizations of Raven-II, thus greatly reducing the barrier to access for Raven-II research and collaboration. Overall, this work aims to eliminate the inconsistencies between simulated and real representations of the Raven-II. Such a development can expand the reach of surgical robotics research. Namely, providing end-to-end transparency between the simulated AMBF and physical Raven-II platforms enables a software testbed previously unavailable, e.g. for training real surgeons, for creating digital synthetic datasets, or for prototyping novel architectures like shared control strategies. Experiments validate this transparency by comparing joint trajectories between digital twin and physical testbed given identical inputs. This work may be extended and incorporated into recent efforts in developing modular or common software infrastructures for both simulation and control of real robotic devices, such as the Collaborative Robotics Toolkit (CRTK). 
    more » « less
  2. While humans can successfully navigate using abstractions, ignoring details that are irrelevant to the task at hand, most of the existing approaches in robotics require detailed environment representations which consume a significant amount of sensing, computing, and storage; these issues become particularly important in resource-constrained settings with limited power budgets. Deep learning methods can learn from prior experience to abstract knowledge from novel environments, and use it to more efficiently execute tasks such as frontier exploration, object search, or scene understanding. We propose BoxMap, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities (rooms and doors) and their relations (connectivity). The predictions from low-level measurements can be leveraged to achieve high-level goals with lower computational costs than methods based on detailed representations. As an example application, we consider a robot equipped with a 2-D laser scanner tasked with exploring a residential building. Our BoxMap representation scales quadratically with the number of rooms (with a small constant), resulting in significant savings over a full geometric map. Moreover, our high-level topological representation results in 30.9% shorter trajectories in the exploration task with respect to a standard method. Code is available at: bit.ly/3F6w2Yl. 
    more » « less
  3. Deep learning has improved state-of-the-art results in many important fields, and has been the subject of much research in recent years, leading to the development of several systems for facilitating deep learning. Current systems, however, mainly focus on model building and training phases, while the issues of data management, model sharing, and lifecycle management are largely ignored. Deep learning modeling lifecycle generates a rich set of data artifacts, e.g., learned parameters and training logs, and it comprises of several frequently conducted tasks, e.g., to understand the model behaviors and to try out new models. Dealing with such artifacts and tasks is cumbersome and largely left to the users. This paper describes our vision and implementation of a data and lifecycle management system for deep learning. First, we generalize model exploration and model enumeration queries from commonly conducted tasks by deep learning modelers, and propose a high-level domain specific language (DSL), inspired by SQL, to raise the abstraction level and thereby accelerate the modeling process. To manage the variety of data artifacts, especially the large amount of checkpointed float parameters, we design a novel model versioning system (dlv), and a read-optimized parameter archival storage system (PAS) that minimizes storage footprint and accelerates query workloads with minimal loss of accuracy. PAS archives versioned models using deltas in a multi-resolution fashion by separately storing the less significant bits, and features a novel progressive query (inference) evaluation algorithm. Third, we develop efficient algorithms for archiving versioned models using deltas under co-retrieval constraints. We conduct extensive experiments over several real datasets from computer vision domain to show the efficiency of the proposed techniques. 
    more » « less
  4. Large models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such general models exist for scene flow. Even though scene flow prediction has wide potential, its practical use is limited because of the lack of generalization of current predictive models. We identify three key challenges and propose solutions for each. First, we create a method that jointly estimates geometry and motion for accurate prediction. Second, we alleviate scene flow data scarcity with a data recipe that affords us 1M annotated training samples across diverse synthetic scenes. Third, we evaluate different parameterizations for scene flow prediction and adopt a natural and effective parameterization. Our model outperforms existing methods as well as baselines built on large-scale models in terms of 3D end-point error, and shows zero-shot generalization to the casually captured videos from DAVIS and the robotic manipulation scenes from RoboTAP. Overall, our approach makes scene flow prediction more practical in-the-wild. Website: https://research.nvidia.com/labs/lpr/zero msf/ 
    more » « less
  5. Rapid advances in Digital Twin (DT) provide an unprecedented opportunity to derive data-enabled intelligence for smart manufacturing. However, traditional DT is more concerned about real-time data streaming, dashboard visualization, and predictive analytics, but focuses less on multi-agent intelligence. This limitation hampers the development of agentic intelligence for decentralized decision making in complex manufacturing environments. Therefore, this paper presents a Cognitive Digital Twin (CDT) approach for multi-objective production scheduling through decentralized, collaborative multi-agent learning. First, we propose to construct models of heterogeneous agents (e.g., machines, jobs, automated guided vehicles, and automated storage and retrieval systems) that interact with physical and digital twins. Second, multi-objective optimization is embedded in CDT to align production schedules with diverse and often conflicting objectives such as throughput, task transition efficiency, and workload balance. Third, we develop a multi-agent learning approach to enable decentralized decision making in response to unexpected disruptions and dynamic demands. Each agent operates independently and collaboratively with cognitive capabilities, including perception, learning, and reasoning, to optimize the individual agentic objective while contributing to overarching system-wide goals. Finally, the proposed CDT is evaluated and validated with experimental studies in a learning factory environment. Experimental results demonstrate that CDT improves operational performance in terms of task allocation, resource utilization, and system resilience compared to traditional centralized approaches. This initial study of CDT highlights the potential to bring multi-agent cognitive intelligence into next-generation smart manufacturing. 
    more » « less