It was shown, in recent work by the authors, that it is possible to learn an asymptotically stabilizing controller from a small number of demonstrations performed by an expert on a feedback linearizable system. These results rely on knowledge of the plant dynamics to assemble the learned controller from the demonstrations. In this paper we show how to leverage recent results on data-driven control to dispense with the need to use the plant model. By bringing these two methodologies — learning from demonstrations and data-driven control — together, this paper provides a technique that enables the control of unknown nonlinear feedback linearizable systems solely based on a small number of expert demonstrations.
more »
« less
Watch and Learn: Learning to control feedback linearizable systems from expert demonstrations
In this paper, we revisit the problem of learning a stabilizing controller from a finite number of demonstrations by an expert. By focusing on feedback linearizable systems, we show how to combine expert demonstrations into a stabilizing controller, provided that demonstrations are sufficiently long and there are at least n+1 of them, where n is the number of states of the system being controlled. The results are experimentally demonstrated on a CrazyFlie 2.0 quadrotor.
more »
« less
- Award ID(s):
- 1705135
- PAR ID:
- 10411915
- Date Published:
- Journal Name:
- 2022 International Conference on Robotics and Automation (ICRA)
- Page Range / eLocation ID:
- 8577 to 8583
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The method iteratively constructs pairwise preference relationships between these LLM-generated samples and expert demonstrations, potentially including comparisons between different training checkpoints. These constructed preference pairs are then used to train the model using a preference optimization algorithm (e.g. DPO). We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.more » « less
-
In this paper, a hybrid shared controller is proposed for assisting human novice users to emulate human expert users within a human-automation interaction framework. This work is motivated to let human novice users learn the skills of human expert users using automation as a medium. Automation interacts with human users in two folds: it learns how to optimally control the system from the experts demonstrations by offline computation, and assists the novice in real time without excess amount of intervention based on the inference of the novice’s skill-level within our properly designed shared controller. Automation takes more control authority when the novices skill-level is poor, or it allows the novice to have more control authority when his/her skill-level is close to that of the expert to let the novice learn from his/her own control experience. The proposed scheme is shown to be able to improve the system performance while minimizing the intervention from the automation, which is demonstrated via an illustrative human-in-the-loop application example.more » « less
-
Multi-robot cooperative control has been extensively studied using model-based distributed control methods. However, such control methods rely on sensing and perception modules in a sequential pipeline design, and the separation of perception and controls may cause processing latencies and compounding errors that affect control performance. End-to-end learning overcomes this limitation by implementing direct learning from onboard sensing data, with control commands output to the robots. Challenges exist in end-to-end learning for multi-robot cooperative control, and previous results are not scalable. We propose in this article a novel decentralized cooperative control method for multi-robot formations using deep neural networks, in which inter-robot communication is modeled by a graph neural network (GNN). Our method takes LiDAR sensor data as input, and the control policy is learned from demonstrations that are provided by an expert controller for decentralized formation control. Although it is trained with a fixed number of robots, the learned control policy is scalable. Evaluation in a robot simulator demonstrates the triangular formation behavior of multi-robot teams of different sizes under the learned control policy.more » « less
-
In-Context Learning (ICL) empowers Large Language Models (LLMs) to tackle various tasks by providing input-output examples as additional inputs, referred to as demonstrations. Nevertheless, the performance of ICL could be easily impacted by the quality of selected demonstrations. Existing efforts generally learn a retriever model to score each demonstration for selecting suitable demonstrations, however, the effect is suboptimal due to the large search space and the noise from unhelpful demonstrations. In this study, we introduce MoD, which partitions the demonstration pool into groups, each governed by an expert to reduce search space. We further design an expert-wise training strategy to alleviate the impact of unhelpful demonstrations when optimizing the retriever model. During inference, experts collaboratively retrieve demonstrations for the input query to enhance the ICL performance. We validate MoD via experiments across a range of NLP datasets and tasks, demonstrating its state-of-the-art performance and shedding new light on the future design of retrieval methods for ICL.more » « less
An official website of the United States government

