skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exploiting the experts: Learning to control unknown SISO feedback linearizable systems from expert demonstrations
It was shown, in recent work by the authors, that it is possible to learn an asymptotically stabilizing controller from a small number of demonstrations performed by an expert on a feedback linearizable system. These results rely on knowledge of the plant dynamics to assemble the learned controller from the demonstrations. In this paper we show how to leverage recent results on data-driven control to dispense with the need to use the plant model. By bringing these two methodologies — learning from demonstrations and data-driven control — together, this paper provides a technique that enables the control of unknown nonlinear feedback linearizable systems solely based on a small number of expert demonstrations.  more » « less
Award ID(s):
1705135
PAR ID:
10411916
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 60th IEEE Conference on Decision and Control (CDC)
Page Range / eLocation ID:
5789 to 5794
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we revisit the problem of learning a stabilizing controller from a finite number of demonstrations by an expert. By focusing on feedback linearizable systems, we show how to combine expert demonstrations into a stabilizing controller, provided that demonstrations are sufficiently long and there are at least n+1 of them, where n is the number of states of the system being controlled. The results are experimentally demonstrated on a CrazyFlie 2.0 quadrotor. 
    more » « less
  2. In this paper, a hybrid shared controller is proposed for assisting human novice users to emulate human expert users within a human-automation interaction framework. This work is motivated to let human novice users learn the skills of human expert users using automation as a medium. Automation interacts with human users in two folds: it learns how to optimally control the system from the experts demonstrations by offline computation, and assists the novice in real time without excess amount of intervention based on the inference of the novice’s skill-level within our properly designed shared controller. Automation takes more control authority when the novices skill-level is poor, or it allows the novice to have more control authority when his/her skill-level is close to that of the expert to let the novice learn from his/her own control experience. The proposed scheme is shown to be able to improve the system performance while minimizing the intervention from the automation, which is demonstrated via an illustrative human-in-the-loop application example. 
    more » « less
  3. Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The method iteratively constructs pairwise preference relationships between these LLM-generated samples and expert demonstrations, potentially including comparisons between different training checkpoints. These constructed preference pairs are then used to train the model using a preference optimization algorithm (e.g. DPO). We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs. 
    more » « less
  4. null (Ed.)
    In this paper, we study the feedback synthesis problem for steering the joint state density or ensemble subject to multi-input state feedback linearizable dynamics. This problem is of interest to many practical applications including that of dynamically shaping a robotic swarm. Our results here show that it is possible to exploit the structural nonlinearities to derive the feedback controllers steering the joint density from a prescribed shape to another while minimizing the expected control effort to do so. The developments herein build on our previous work, and extend the theory of the Schrödinger bridge problem subject to feedback linearizable dynamics. 
    more » « less
  5. In this paper, we study the feedback synthesis problem for steering the joint state density or ensemble subject to multi-input state feedback linearizable dynamics. This problem is of interest to many practical applications including that of dynamically shaping a robotic swarm. Our results here show that it is possible to exploit the structural nonlinearities to derive the feedback controllers steering the joint density from a prescribed shape to another while minimizing the expected control effort to do so. The developments herein build on our previous work, and extend the theory of the Schro ̈dinger bridge problem subject to feedback linearizable dynamics. 
    more » « less