skip to main content


Title: Robust control of PDEs with disturbances using mobile actuators constrained over time-varying reachability sets
We design a practical mobile actuator guidance policy for linear parabolic equations in 2D: the guidance is chosen so that H2-measure of uncertainty is minimized provided the system is subject to a distributed disturbance. We first present a guidance policy where the mobile actuator location to be selected will be fixed over a certain time interval of interest. Further we add extra complexity by taking into account the dynamics of the mobile actuator over the 2D domain of interest under reachability constraints. The proposed approach is illustrated through numerical studies.  more » « less
Award ID(s):
1825546
NSF-PAR ID:
10385852
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 60th IEEE Conference on Decision and Control
Page Range / eLocation ID:
428 to 433
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Employing mobile actuators and sensors for control and estimation of spatially distributed processes offers a significant advantage over immobile actuators and sensors. In addition to the control performance improvement, one also comes across the economic advantages since fewer devices, if allowed to be repositioned within a spatial domain, must be employed. While simulation studies of mobile actuators report superb controller performance, they are far from reality as the mechanical constraints of the mobile platforms carrying actuators and sensors have to satisfy motional constraints. Terrain platforms cannot behave as point masses without inertia; instead they must satisfy constraints which are adequately represented as path-dependent reachability sets. When the control algorithm commands a mobile platform to reposition itself in a different spatial location within the spatial domain, this does not occur instantaneously and for the most part the motion is not omnidirectional. This constraint is combined with a computationally feasible and suboptimal control policy with mobile actuators to arrive at a numerically viable control and guidance scheme. The feasible control decision comes from a continuous-discrete control policy whereby the mobile platform carrying the actuator is repositioned at discrete times and dwells in a specific position for a certain time interval. Moving to a subsequent spatial location and computing its associated path over a physics-imposed time interval, a set of candidate positions and paths is derived using a path-dependent reachability set. Embedded into the path-dependent reachability sets that dictate the mobile actuator repositioning, a scheme is proposed to integrate collocated sensing measurements in order to minimize costly state estimation schemes. The proposed scheme is demonstrated with a 2D PDE having two sets of collocated actuator-sensor pairs onboard mobile platforms. 
    more » « less
  2. The use of mobile actuators for the control of spatially distributed systems governed by PDEs results in both implementational and computational challenges. First it requires the backward-in-time solution to the actuator guidance and the backward-in-time solution to the control operator Riccati equation. A way to address this computational challenge is to consider a continuous-discrete alternative whereby the mobile actuator is repositioned at discrete instances and resides in a specific spatial location for a certain time interval. In order to find optimal paths for a given time interval, a set of feasible locations is derived using the reachability set. These reachability sets are further constrained to take into account the time it takes to travel to any spatial position with a prescribed maximum velocity. The proposed hybrid continuous-discrete control and actuator guidance is demonstrated for a 2D diffusion PDE that uses no constraints and angular constraints on the actuator motion. 
    more » « less
  3. This paper considers a class of distributed parameter systems that can be controlled by an actuator onboard a mobile platform. In order to avoid computational costs and control architecture complexity associated with a joint optimization of actuator guidance and control law, a suboptimal policy is proposed that significantly reduces the computational costs. By utilizing a continuous-discrete optimal control design, a mobile actuator moves to a new position at the beginning of a new time interval and resides for a prescribed time. Using the cost to go with variable lower limit, the optimization simplifies to solving algebraic Riccati equations instead of differential Riccati equations. Adding a hardware feature whereby the mobile sensors are constrained to stay within the proximity of the mobile actuator, a feedback kernel decomposition scheme is proposed to approximate a full state feedback controller by the weighted sum of sensor measurements. 
    more » « less
  4. This work incorporates the effects that hazardous environments have on sensing devices, in the guidance of mobile platforms with onboard sensors. Mobile sensors are utilized in the state reconstruction of spatiotemporally varying processes, often described by advection-diffusion PDEs. A typical sensor guidance policy is based on a gradient ascent scheme which repositions the sensors to spatial regions that have larger state estimation errors. If the cumulative measurements of the spatial process are used as a means to represent the effects of hazardous environments on the sensors, then the sensors are considered inoperable the instance the cumulative measurements exceed a device-specific tolerance level. A binary guidance policy considered earlier repositioned the sensors to regions of larger values of the state estimation errors thus implementing an information-sensitive policy. The policy switched to an information-averse guidance the instance the cumulative effects exceeded a certain tolerance level. Such a binary policy switches the sensor velocity abruptly from a positive to a negative value. To alleviate these discontinuity effects, a ternary guidance policy is considered and which inserts a third guidance policy, the information-neutral policy, that smooths out the transitions from information-sensitive to information-averse guidance. A novelty in this ternary guidance has to do with the level-set approach which changes from a guidance towards large values of the state estimation error towards level sets of the state estimation error and eventually towards reduced values of the state estimation error. An example on an advection-diffusion PDE in 2D employing a single interior mobile sensor using both the binary and ternary guidance policies is used to demonstrate the effects of hazardous environments on both the sensor life expectancy and the performance of the state estimator. 
    more » « less
  5. A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance. 
    more » « less