skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How Are Learned Perception-Based Controllers Impacted by the Limits of Robust Control?
The difficulty of optimal control problems has classically been characterized in terms of system properties such as minimum eigenvalues of controllability/observability gramians. We revisit these characterizations in the context of the increasing popularity of data-driven techniques like reinforcement learning (RL) in control settings where input observations are high-dimensional images and transition dynamics are not known beforehand. Specifically, we ask: to what extent are quantifiable control and perceptual difficulty metrics of a control task predictive of the performance of various families of data-driven controllers? We modulate two different types of partial observability in a cartpole “stick-balancing” problem–the height of one visible fixation point on the cartpole, which can be used to tune fundamental limits of performance achievable by any controller, and by using depth or RGB image observations of the scene, we add different levels of perception noise without affecting system dynamics. In these settings, we empirically study two popular families of controllers: RL and system identification-based H-infinity control, using visually estimated system state. Our results show the fundamental limits of robust control have corresponding implications for the sample-efficiency and performance of learned perception-based controllers.  more » « less
Award ID(s):
2038873
PAR ID:
10277350
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 3rd Conference on Learning for Dynamics and Control
Volume:
PMLR 144
Page Range / eLocation ID:
954-966
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of learning algorithms. In this paper, we address observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to 250%. Moreover, we evaluate our methods on visual delayed environments, for the first time showcasing delay-aware reinforcement learning continuous control with visual observations. 
    more » « less
  2. With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively. 
    more » « less
  3. This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world. The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot’s I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps. 
    more » « less
  4. Open Radio Access Network (O-RAN) has introduced an emerging RAN architecture that enables openness, intelligence, and automated control. The RAN Intelligent Controller (RIC) provides the platform to design and deploy network controllers. xApps are the applications that can leverage machine learning (ML) algorithms for near-real time control. Despite the opportunities provided by this new architecture, the progress of practical artificial intelligence (AI)-based solutions for network control and automation has been slow. There is a lack of end-to-end solutions for designing, deploying, and testing AI-based xApps in production-like network settings. This paper introduces an end-to-end O-RAN design and evaluation procedure using the latest O-RAN architecture and interface releases. We provide details on the development of a reinforcement learning (RL)-based xApp, considering two RL approaches and present numerical results to validate the xApp. 
    more » « less
  5. Real-time control of stormwater systems can reduce flooding and improve water quality. Current industry real-time control strategies use simple rules based on water quantity parameters at a local scale. However, system-level control methods that also incorporate observations of water quality could provide improved control and performance. Therefore, the objective of this research is to evaluate the impact of local and system-level control approaches on flooding and sediment-related water quality in a stormwater system within the flood-prone coastal city of Norfolk, Virginia, USA. Deep reinforcement learning (RL), an emerging machine learning technique, is used to learn system-level control policies that attempt to balance flood mitigation and treatment of sediment. RL is compared to the conventional stormwater system and two methods of local-scale rule-based control: (i) industry standard predictive rule-based control with a fixed detention time and (ii) rules based on water quality observations. For the studied system, both methods of rule-based control improved water quality compared to the passive system, but increased total system flooding due to uncoordinated releases of stormwater. An RL agent learned controls that maintained target pond levels while reducing total system flooding by 4% compared to the passive system. When pre-trained from the RL agent that learned to reduce flooding, another RL agent was able to learn to decrease TSS export by an average of 52% compared to the passive system and with an average of 5% less flooding than the rule-based control methods. As the complexity of stormwater RTC implementations grows and climate change continues, system-level control approaches such as the RL used here will be needed to help mitigate flooding and protect water quality. 
    more » « less