NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Foundation models in robotics: Applications, challenges, and the future

https://doi.org/10.1177/02783649241281508

Firoozi, Roya; Tucker, Johnathan; Tian, Stephen; Majumdar, Anirudha; Sun, Jiankai; Liu, Weiyu; Zhu, Yuke; Song, Shuran; Kapoor, Ashish; Hausman, Karol; et al (September 2024, The International Journal of Robotics Research)

We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability to find zero-shot solutions to problems that are not present in the training data. Foundation models may hold the potential to enhance various components of the robot autonomy stack, from perception to decision-making and control. For example, large language models can generate code or provide common sense reasoning, while vision-language models enable open-vocabulary visual recognition. However, significant open research challenges remain, particularly around the scarcity of robot-relevant training data, safety guarantees and uncertainty quantification, and real-time execution. In this survey, we study recent papers that have used or built foundation models to solve robotics problems. We explore how foundation models contribute to improving robot capabilities in the domains of perception, decision-making, and control. We discuss the challenges hindering the adoption of foundation models in robot autonomy and provide opportunities and potential pathways for future advancements. The GitHub project corresponding to this paper can be found here: https://github.com/robotics-survey/Awesome-Robotics-Foundation-Models .
more » « less
Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction

https://doi.org/10.15607/RSS.2024.XX.027

Lidard, Justin; Pham, Hang; Bachman, Ariel; Boateng, Bryan; Majumdar, Anirudha (July 2024, Robotics: Science and Systems Foundation)

Full Text Available
MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction

Simon, Nathaniel; Majumdar, Anirudha (November 2023, International Symposium on Experimental Robotics (ISER))

A major challenge in deploying the smallest of Micro Aerial Vehicle (MAV) platforms (< 100 g) is their inability to carry sensors that provide high-resolution metric depth information (e.g., LiDAR or stereo cameras). Current systems rely on end-to-end learning or heuristic approaches that directly map images to control inputs, and struggle to fly fast in unknown environments. In this work, we ask the following question: using only a monocular camera, optical odometry, and offboard computation, can we create metrically accurate maps to leverage the powerful path planning and navigation approaches employed by larger state-of-the-art robotic systems to achieve robust autonomy in unknown environments? We present MonoNav: a fast 3D reconstruction and navigation stack for MAVs that leverages recent advances in depth prediction neural networks to enable metrically accurate 3D scene reconstruction from a stream of monocular images and poses. MonoNav uses off-the-shelf pre-trained monocular depth estimation and fusion techniques to construct a map, then searches over motion primitives to plan a collision-free trajectory to the goal. In extensive hardware experiments, we demonstrate how MonoNav enables the Crazyflie (a 37 g MAV) to navigate fast (0.5 m/s) in cluttered indoor environments. We evaluate MonoNav against a state-of-the-art end-to-end approach, and find that the collision rate in navigation is significantly reduced (by a factor of 4). This increased safety comes at the cost of conservatism in terms of a 22% reduction in goal completion.
more » « less
Full Text Available
AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer

Ren, Allen Z; Dai, Hongkai; Burchfiel, Benjamin; Majumdar, Anirudha (November 2023, Conference on Robot Learning)

Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification). However, there is often an irreducible gap between simulation and reality: attempting to match the dynamics between simulation and reality across all states and tasks may be infeasible and may not lead to policies that perform well in reality for a specific task. Addressing this issue, we propose AdaptSim, a new task-driven adaptation framework for sim-to-real transfer that aims to optimize task performance in target (real) environments -- instead of matching dynamics between simulation and reality. First, we meta-learn an adaptation policy in simulation using reinforcement learning for adjusting the simulation parameter distribution based on the current policy's performance in a target environment. We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training, using a small amount of real data. We perform experiments in three robotic tasks: (1) swing-up of linearized double pendulum, (2) dynamic table-top pushing of a bottle, and (3) dynamic scooping of food pieces with a spatula. Our extensive simulation and hardware experiments demonstrate AdaptSim achieving 1-3x asymptotic performance and ∼2x real data efficiency when adapting to different environments, compared to methods based on Sys-ID and directly training the task policy in target environments.
more » « less
Full Text Available
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Ren, Allen Z; Dixit, Anushri; Bodrova, Alexandra; Singh, Sumeet; Tu, Stephen; Brown, Noah; Xu, Pen; Takayama, Leila Takayama; Xia, Fei; Varley, Jake; et al (November 2023, Conference on Robot Learning (CoRL))

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models.
more » « less
Full Text Available
Sim-to-Lab-to-Real: Safe reinforcement learning with shielding and generalization guarantees

https://doi.org/10.1016/j.artint.2022.103811

Hsu, Kai-Chieh; Ren, Allen Z.; Nguyen, Duy P.; Majumdar, Anirudha; Fisac, Jaime F. (January 2023, Artificial Intelligence)

Full Text Available
Fundamental Tradeoffs in Learning with Prior Information

Majumdar, Anirudha (January 2023, Proceedings of the International Conference on Machine Learning)

Full Text Available
Switching Attention in Time-Varying Environments via Bayesian Inference of Abstractions

Booker, Meghan; Majumdar, Anirudha (January 2023, Proceedings IEEE International Conference on Robotics and Automation)

Full Text Available
Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

https://doi.org/10.15607/RSS.2022.XVIII.042

Farid, Alec; Snyder, David; Ren, Allen Z.; Majumdar, Anirudha (June 2022, Robotics: Science and Systems (RSS))

Full Text Available
Fundamental Performance Limits for Sensor-Based Robot Control and Policy Learning

https://doi.org/10.15607/RSS.2022.XVIII.036

Majumdar, Anirudha; Pacelli, Vincent (June 2022, Robotics: Science and Systems (RSS))

Full Text Available

« Prev Next »

Search for: All records