In this paper, we consider direct policy optimization for the linear-quadratic Gaussian (LQG) setting. Over the past few years, it has been recognized that the landscape of stabilizing output-feedback controllers of relevance to LQG has an intricate geometry, particularly as it pertains to the existence of spurious stationary points. In order to address such challenges, in this paper, we first adopt a Riemannian metric for the space of stabilizing full-order minimal output-feedback controllers. We then proceed to prove that the orbit space of such controllers modulo coordinate transformation admits a Riemannian quotient manifold structure. This geometric structure is then used to develop a Riemannian gradient descent for the direct LQG policy optimization. We prove a local convergence guarantee with linear rate and show the proposed approach exhibits significantly faster and more robust numerical performance as compared with ordinary gradient descent for LQG. Subsequently, we provide reasons for this observed behavior; in particular, we argue that optimizing over the orbit space of controllers is the right theoretical and computational setup for direct LQG policy optimization.
more »
« less
On Controller Reduction in Linear Quadratic Gaussian Control with Performance Bounds
The problem of controller reduction has a rich history in control theory. Yet, many questions remain open. In particular, there exist very few results on the order reduction of general non-observer based controllers and the subsequent quantification of the closed-loop performance. Recent developments in model-free policy optimization for Linear Quadratic Gaussian (LQG) control have highlighted the importance of this question. In this paper, we first propose a new set of sufficient conditions ensuring that a perturbed controller remains internally stabilizing. Based on this result, we illustrate how to perform order reduction of general (non-observer based) output feedback controllers using balanced truncation and modal truncation. We also provide explicit bounds on the LQG performance of the reduced-order controller. Furthermore, for single-input-single-output (SISO) systems, we introduce a new controller reduction technique by truncating unstable modes. We illustrate our theoretical results with numerical simulations. Our results will serve as valuable tools to design direct policy search algorithms for control problems with partial observations.
more »
« less
- PAR ID:
- 10443278
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 211
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 1008-1019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The work provides a general model of communication attacks on a networked infinite dimensional system. The system employs a network of inexpensive control units consisting of actuators, sensors and control processors. In an effort to replace a reduced number of expensive high-end actuating and sensing devices implementing an observer-based feedback, the alternate is to use multiple inexpensive actuators/sensors with static output feedback. In order to emulate the performance of the high-end devices, the controllers for the multiple actuator/sensors implement controllers which render the system networked. In doing so, they become prone to communication attacks either as accidental or deliberate actions on the connectivity of the control nodes. A single attack function is proposed which models all types of communication attacks and an adaptive detection scheme is proposed in order to (i) detect the presence of an attack, (ii) diagnose the attack and (iii) accommodate the attack via an appropriate control reconfiguration. The reconfiguration employs the adaptive estimates of the controller gains and restructure the controller adaptively in order to minimize the detrimental effects of the attack on closed-loop performance. Numerical studies on a 1D diffusion PDE employing networked actuator/sensor pairs are included in order to further convey the special architecture of detection and accommodation of networked systems under communication attacks.more » « less
-
Not AvailableObjective: Fatigue-resistant and graded muscle forces can be evoked through asynchronous intrafascicular multi-electrode stimulation (aIFMS). Prior studies on controlled force generation using aIFMS employed either a feedback controller featuring a multiple-input single-output delayed-integral (MISO-$$\delta$$I) control law, or a feedforward controller with a non-predictive model-based policy. However, these controllers resulted in lagged responses as stimulation was coordinated via intentional time delays and lacked immediate control corrections. To address these limitations, this paper presents an adaptive feedforward model predictive controller (aF-MPC) for isometric torque control. Methods: The aF-MPC was evaluated in experiments in anesthetized felines implanted with Utah Slanted Electrode Arrays in their sciatic nerves. This controller redesigned the existing aIFMS feedforward controller by enhancing it with a predictive policy and an online model learning algorithm to compensate for unaccounted aIFMS effects. Statistical comparisons of the aF-MPC and the (non-adaptive) F-MPC trials and observational comparisons of the aF-MPC and the MISO-$$\delta$$I controller were performed for different desired trajectories. Results: The aF-MPC exhibited significant performance improvements over the F-MPC across multiple metrics. Observationally, the aF-MPC showed improvements in all performance metrics over the MISO-$$\delta$$I controller. Conclusion: Despite unknown dynamics in the aIFMS system, this paper's aF-MPC outperformed alternate approaches as it accurately tracked desired torque profiles even under high-frequency commands. Significance: The application of the aF-MPC in conjunction with aIFMS could provide a better avenue for developing naturalistic motor neuroprosthesis than F-MPCs or MISO-$$\delta$$I controllers.more » « less
-
This paper addresses the end-to-end sample complexity bound for learning the H2 optimal controller (the Linear Quadratic Gaussian (LQG) problem) with unknown dynamics, for potentially unstable Linear Time Invariant (LTI) systems. The robust LQG synthesis procedure is performed by considering bounded additive model uncertainty on the coprime factors of the plant. The closed-loopi dentification of the nominal model of the true plant is performed by constructing a Hankel-like matrix from a single time-series of noisy finite length input-output data, using the ordinary least squares algorithm from Sarkar and Rakhlin (2019). Next, an H∞ bound on the estimated model error is provided and the robust controller is designed via convex optimization, much in the spirit of Mania et al. (2019) and Zheng et al. (2020b), while allowing for bounded additive uncertainty on the coprime factors of the model. Our conclusions are consistent with previous results on learning the LQG and LQR controllers.more » « less
-
This paper revisits the design of compensator-based controller for a class of infinite dimensional systems. In order to save computational time, a functional observer is employed to reconstruct a functional of the state which coincides with the full state feedback control signal. Such a full-state feedback corresponds to an idealized case wherein the state is available. Instead of reconstructing the entire state via a state-observer and then use this state estimate in a controller expression, a functional observer is used to estimate the product of the state and the feedback operator, thus resulting in a significant reduction in computational load. This observer design is subsequently integrated with a sensor selection in order to improve controller performance. An appropriate metric is used to optimize the sensor location resulting in improved performance of the functional observer-based compensator. The integrated design is further extended to include a controller with an unknown input functional observer. The results are applied to 2D partial differential equations and detailed numerical studies are included to provide an appreciation in the significant savings in both operational and computational costs.more » « less
An official website of the United States government

