We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains.
more »
« less
Hybrid Control for Learning Motor Skills
We develop a hybrid control approach for robot learning based on combining learned predictive models with experience-based state-action policy mappings to improve the learning capabilities of robotic systems. Predictive models provide an understanding of the task and the physics (which improves sample-efficiency), while experience-based pol-icy mappings are treated as “muscle memory” that encode favorable actions as experiences that override planned actions. Hybrid control tools are used to create an algorithmic approach for combining learned predictive models with experience-based learning. Hybrid learning is presented as a method for efficiently learning motor skills by systematically combining and improving the performance of predictive models and experience-based policies. A deterministic variation of hybrid learning is derived and extended into a stochastic implementation that relaxes some of the key assumptions in the original derivation. Each variation is tested on experience-based learning methods (where the robot interacts with the environment to gain experience) as well as imitation learning methods(where experience is provided through demonstrations and tested in the environment). The results show that our method is capable of improving the performance and sample-efficiency of learning motor skills in a variety of experimental domains.
more »
« less
- Award ID(s):
- 1837515
- PAR ID:
- 10175970
- Date Published:
- Journal Name:
- Workshop on the Algorithmic Foundations of Robotics (WAFR)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a new policy class, Composable Interaction Primitives (CIPs), specialized for learning sustained-contact manipulation skills like opening a drawer, pulling a lever, turning a wheel, or shifting gears. CIPs have two primary design goals: to minimize what must be learned by exploiting structure present in the world and the robot, and to support sequential composition by construction, so that learned skills can be used by a task-level planner. Using an ablation experiment in four simulated manipulation tasks, we show that the structure included in CIPs substantially improves the efficiency of motor skill learning. We then show that CIPs can be used for plan execution in a zero-shot fashion by sequencing learned skills.We validate our approach on real robot hardware by learning and sequencing two manipulation skills.more » « less
-
Model-based reinforcement learning (MBRL) is believed to have much higher sample efficiency compared with model-free algorithms by learning a predictive model of the environment. However, the performance of MBRL highly relies on the quality of the learned model, which is usually built in a black-box manner and may have poor predictive accuracy outside of the data distribution. The deficiencies of the learned model may prevent the policy from being fully optimized. Although some uncertainty analysis-based remedies have been proposed to alleviate this issue, model bias still poses a great challenge for MBRL. In this work, we propose to leverage the prior knowledge of underlying physics of the environment, where the governing laws are (partially) known. In particular, we developed a physics-informed MBRL framework, where governing equations and physical constraints are used to inform the model learning and policy search. By incorporating the prior information of the environment, the quality of the learned model can be notably improved, while the required interactions with the environment are significantly reduced, leading to better sample efficiency and learning performance. The effectiveness and merit have been demonstrated over a handful of classic control problems, where the environments are governed by canonical ordinary/partial differential equations.more » « less
-
In an efficient and flexible human-robot collaborative work environment, a robot team member must be able to recognize both explicit requests and implied actions from human users. Identifying “what to do” in such cases requires an agent to have the ability to construct associations between objects, their actions, and the effect of actions on the environment. In this regard, semantic memory is being introduced to understand the explicit cues and their relationships with available objects and required skills to make “tea” and “sandwich”. We have extended our previous hierarchical robot control architecture to add the capability to execute the most appropriate task based on both feedback from the user and the environmental context. To validate this system, two types of skills were implemented in the hierarchical task tree: 1) Tea making skills and 2) Sandwich making skills. During the conversation between the robot and the human, the robot was able to determine the hidden context using ontology and began to act accordingly. For instance, if the person says “I am thirsty” or “It is cold outside” the robot will start to perform the tea-making skill. In contrast, if the person says, “I am hungry” or “I need something to eat”, the robot will make the sandwich. A humanoid robot Baxter was used for this experiment. We tested three scenarios with objects at different positions on the table for each skill. We observed that in all cases, the robot used only objects that were relevant to the skill.more » « less
-
The authors present the design and implementation of an exploratory virtual learning environment that assists children with autism (ASD) in learning science, technology, engineering, and mathematics (STEM) skills along with improving social-emotional and communication skills. The primary contribution of this exploratory research is how educational research informs technological advances in triggering a virtual AI companion (AIC) for children in need of social-emotional and communication skills development. The AIC adapts to students’ varying levels of needed support. This project began by using puppetry control (human-in-the-loop) of the AIC, assisting students with ASD in learning basic coding, practicing their social skills with the AIC, and attaining emotional recognition and regulation skills for effective communication and learning. The student is given the challenge to program a robot, Dash™, to move in a square. Based on observed behaviors, the puppeteer controls the virtual agent’s actions to support the student in coding the robot. The virtual agent’s actions that inform the development of the AIC include speech, facial expressions, gestures, respiration, and heart color changes coded to indicate emotional state. The paper provides exploratory findings of the first 2 years of this 5-year scaling-up research study. The outcomes discussed align with a common approach of research design used for students with disabilities, called single case study research. This type of design does not involve random control trial research; instead, the student acts as her or his own control subject. Students with ASD have substantial individual differences in their social skill deficits, behaviors, communications, and learning needs, which vary greatly from the norm and from other individuals identified with this disability. Therefore, findings are reported as changes within subjects instead of across subjects. While these exploratory observations serve as a basis for longer term research on a larger population, this paper focuses less on student learning and more on evolving technology in AIC and supporting students with ASD in STEM environments.more » « less
An official website of the United States government

