We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%.
more »
« less
Reinforcement Learning Assist-as-needed Control for Robot Assisted Gait Training
The primary goal of an assist-as-needed (AAN) controller is to maximize subjects' active participation during motor training tasks while allowing moderate tracking errors to encourage human learning of a target movement. Impedance control is typically employed by AAN controllers to create a compliant force-field around the desired motion trajectory. To accommodate different individuals with varying motor abilities, most of the existing AAN controllers require extensive manual tuning of the control parameters, resulting in a tedious and time-consuming process. In this paper, we propose a reinforcement learning AAN controller that can autonomously reshape the force-field in real-time based on subjects' training performances. The use of action-dependent heuristic dynamic programming enables a model-free implementation of the proposed controller. To experimentally validate the controller, a group of healthy individuals participated in a gait training session wherein they were asked to learn a modified gait pattern with the help of a powered ankle-foot orthosis. Results indicated the potential of the proposed control strategy for robot-assisted gait training.
more »
« less
- Award ID(s):
- 1944203
- PAR ID:
- 10216891
- Date Published:
- Journal Name:
- 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob)
- Page Range / eLocation ID:
- 785 to 790
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Powered prostheses aim to mimic the missing biological limb with controllers that are finely tuned to replicate the nominal gait pattern of non-amputee individuals. Unfortunately, this control approach poses a problem with real-world ambulation, which includes tasks such as crossing over obstacles, where the prosthesis trajectory must be modified to provide adequate foot clearance and ensure timely foot placement. Here, we show an indirect volitional control approach that enables prosthesis users to walk at different speeds while smoothly and continuously crossing over obstacles of different sizes without explicit classification of the environment. At the high level, the proposed controller relies on a heuristic algorithm to continuously change the maximum knee flexion angle and the swing duration in harmony with the user’s residual limb. At the low level, minimum-jerk planning is used to continuously adapt the swing trajectory while maximizing smoothness. Experiments with three individuals with above-knee amputation show that the proposed control approach allows for volitional control of foot clearance, which is necessary to negotiate environmental barriers. Our study suggests that a powered prosthesis controller with intrinsic, volitional adaptability may provide prosthesis users with functionality that is not currently available, facilitating real-world ambulation.more » « less
-
Stable heteroclinic channels for controlling a simulated aquatic serpentine robot in narrow crevicesStable Heteroclinic Channels (SHCs) are dynamical systems composed of connected saddle equilibria. This work demonstrates a control system that combines SHCs with movement primitives to enable swimming in a simulated six segment snake robot. We identify control system parameters for lateral undulation, where all joints oscillate with the same amplitude, and anguilliform swimming, where joint amplitudes increase linearly from the head to the tail. Swimming speed is improved by learning SHC movement primitive parameters. We also propose a method for adapting the gait amplitude and frequency with tactile sensor input to accommodate obstacles. Then, we evaluate the relationship between SHC movement primitive parameters and the resulting trajectories. The swimming speed and efficiency of SHC controllers for each gait are compared against a conventional serpenoid controller, which derives joint trajectories from sinusoids. Controllers are evaluated first in an unobstructed environment, then in straight passages of various widths, and finally in 65 randomly generated uneven channels. We find that the amplitudes of joint oscillations scale proportionally with the SHC controller parameters. Due to gait optimization, as well as adaptive amplitude and frequency in response to tactile input, the learned SHC control system exhibits an average 28.8% greater speed than a serpenoid controller that only adapts amplitude during contact. This research demonstrates that SHCs benefit from intuitive tuning like serpenoid control, while also effectively incorporating sensory information to generate smooth kinematic trajectories.more » « less
-
For autonomous legged robots to be deployed in practical scenarios, they need to perform perception, motion planning, and locomotion control. Since robots have limited computing capabilities, it is important to realize locomotion control with simple controllers that have modest calculations. The goal of this paper is to create computational simple controllers for locomotion control that can free up computational resources for more demanding computational tasks, such as perception and motion planning. The controller consists of a leg scheduler for sequencing a trot gait with a fixed step time; a reference trajectory generator for the feet in the Cartesian space, which is then mapped to the joint space using an analytical inverse; and a joint controller using a combination of feedforward torques based on static equilibrium and feedback torque. The resulting controller enables velocity command following in the forward, sideways, and turning directions. With these three velocity command following-modes, a waypoint tracking controller is developed that can track a curve in global coordinates using feedback linearization. The command following and waypoint tracking controllers are demonstrated in simulation and on hardware.more » « less
-
Risk sensitivity is a fundamental aspect of biological motor control that accounts for both the expectation and variability of movement cost in the face of uncertainty. However, most computational models of biological motor control rely on model-based risk-sensitive optimal control, which requires an accurate internal representation in the central neural system to predict the outcomes of motor commands. In reality, the dynamics of human-environment interaction is too complex to be accurately modeled, and noise further complicates system identification. To address this issue, this paper proposes a novel risk-sensitive computational mechanism for biological motor control based on reinforcement learning (RL) and adaptive dynamic programming (ADP). The proposed ADP-based mechanism suggests that humans can directly learn an approximation of the risk-sensitive optimal feedback controller from noisy sensory data without the need for system identification. Numerical validation of the proposed mechanism is conducted on the arm-reaching task under divergent force field. The preliminary computational results align with the experimental observations from the past literature of computational neuroscience.more » « less
An official website of the United States government

