We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability to find zero-shot solutions to problems that are not present in the training data. Foundation models may hold the potential to enhance various components of the robot autonomy stack, from perception to decision-making and control. For example, large language models can generate code or provide common sense reasoning, while vision-language models enable open-vocabulary visual recognition. However, significant open research challenges remain, particularly around the scarcity of robot-relevant training data, safety guarantees and uncertainty quantification, and real-time execution. In this survey, we study recent papers that have used or built foundation models to solve robotics problems. We explore how foundation models contribute to improving robot capabilities in the domains of perception, decision-making, and control. We discuss the challenges hindering the adoption of foundation models in robot autonomy and provide opportunities and potential pathways for future advancements. The GitHub project corresponding to this paper can be found here: https://github.com/robotics-survey/Awesome-Robotics-Foundation-Models .
more »
« less
Computer Vision Applications in Underwater Robotics and Oceanography
Real-time computer vision and remote visual sensing platforms are increasingly used in numerous underwater applications such as shipwreck mapping, subsea inspection, coastal water monitoring, surveillance, coral reef surveying, invasive fish tracking, and more. Recent advancements in robot vision and powerful single-board computers have paved the way for an imminent revolution in the next generation of subsea technologies. In this chapter, we present these exciting emerging applications and discuss relevant open problems and practical considerations. First, we delineate the specific environmental and operational challenges of underwater vision and highlight some prominent scientific and engineering solutions to ensure robust visual perception. We specifically focus on the characteristics of underwater light propagation from the perspective of image formation and photometry. We also discuss the recent developments and trends in underwater imaging literature to facilitate the restoration, enhancement, and filtering of inherently noisy visual data. Subsequently, we demonstrate how these ideas are extended and deployed in the perception pipelines of Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs). In particular, we present several use cases for marine life monitoring and conservation, human-robot cooperative missions for inspecting submarine cables and archaeological sites, subsea structure or cave mapping, aquaculture, and marine ecology. We elaborately discuss how the deep visual learning and on-device AI breakthroughs are transforming the perception, planning, localization, and navigation capabilities of visually-guided underwater robots. Along this line, we also highlight the prospective future research directions and open problems at the intersection of computer vision and underwater robotics domains.
more »
« less
- Award ID(s):
- 2330416
- PAR ID:
- 10653547
- Publisher / Repository:
- Chapman and Hall/CRC
- Date Published:
- Page Range / eLocation ID:
- 173 to 204
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Localization in underwater environments is a fundamental problem for autonomous vehicles with important applications such as underwater ecology monitoring, infrastructure maintenance, and conservation of marine species. However, several traditional sensing modalities used for localization in outdoor robotics (e.g., GPS, compasses, LIDAR, and Vision) are compromised in underwater scenarios. In addition, other problems such as aliasing, drifting, and dynamic changes in the environment also affect state estimation in aquatic environments. Motivated by these issues, we propose novel state estimation algorithms for underwater vehicles that can read noisy sensor observations in spatio-temporal varying fields in water (e.g., temperature, pH, chlorophyll-A, and dissolved oxygen) and have access to a model of the evolution of the fields as a set of partial differential equations. We frame the underwater robot localization in an optimization framework and formulate, study, and solve the state-estimation problem. First, we find the most likely position given a sequence of observations, and we prove upper and lower bounds for the estimation error given information about the error and the fields. Our methodology can find the actual location within a 95% confidence interval around the median in over 90% of the cases in different conditions and extensions.more » « less
-
Billard, A.; Asfour, T.; Khatib, O. (Ed.)In this paper, we discuss how to effectively map an underwater structure with a team of robots considering the specific challenges posed by the underwater environment. The overarching goal of this work is to produce high-definition, accurate, photorealistic representation of underwater structures. Due to the many limitations of vision underwater, operating at a distance from the structure results in degraded images that lack details, while operating close to the structure increases the accumulated uncertainty due to the limited viewing area which causes drifting. We propose a multi-robot mapping framework that utilizes two types of robots: proximal observers which map close to the structure and distal observers which provide localization for proximal observers and bird’s-eye-view situational awareness. The paper presents the fundamental components and related current results from real shipwrecks and simulations necessary to enable the proposed framework, including robust state estimation, real-time 3D mapping, and active perception navigation strategies for the two types of robots. Then, the paper outlines interesting research directions and plans to have a completely integrated framework that allows robots to map in harsh environments.more » « less
-
Abstract The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other institutional video to reduce annotation effort, and enable automated tracking of underwater concepts when integrated with robotic vehicles. As FathomNet continues to grow and incorporate more labeled data from the community, we can accelerate the processing of visual data to achieve a healthy and sustainable global ocean.more » « less
-
Modern social media platforms like Twitch, YouTube, etc., embody an open space for content creation and consumption. However, an unintended consequence of such content democratization is the proliferation of toxicity and abuse that content creators get subjected to. Commercial and volunteer content moderators play an indispensable role in identifying bad actors and minimizing the scale and degree of harmful content. Moderation tasks are often laborious, complex, and even if semi-automated, they involve high-consequence human decisions that affect the safety and popular perception of the platforms. In this paper, through an interdisciplinary collaboration among researchers from social science, human-computer interaction, and visualization, we present a systematic understanding of how visual analytics can help in human-in-the-loop content moderation. We contribute a characterization of the data-driven problems and needs for proactive moderation and present a mapping between the needs and visual analytic tasks through a task abstraction framework. We discuss how the task abstraction framework can be used for transparent moderation, design interventions for moderators’ well-being, and ultimately, for creating futuristic human-machine interfaces for data-driven content moderation.more » « less
An official website of the United States government

