 Award ID(s):
 1901134
 NSFPAR ID:
 10273949
 Date Published:
 Journal Name:
 Proceedings of IEEE Smartgridcom Conference
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Deep reinforcement learning approaches are becoming appealing for the design of nonlinear controllers for voltage control problems, but the lack of stability guarantees hinders their realworld deployment. This letter constructs a decentralized RLbased controller for inverterbased realtime voltage control in distribution systems. It features two components: a transient control policy and a steadystate performance optimizer. The transient policy is parameterized as a neural network, and the steadystate optimizer represents the gradient of the longterm operating cost function. The two parts are synthesized through a safe gradient flow framework, which prevents the violation of reactive power capacity constraints. We prove that if the output of the transient controller is bounded and monotonically decreasing with respect to its input, then the closedloop system is asymptotically stable and converges to the optimal steadystate solution. We demonstrate the effectiveness of our method by conducting experiments with IEEE 13bus and 123bus distribution system test feeders.more » « less

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the probably approximately correct (PAC)Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (relative entropy programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural networkbased grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural networkbased policies.

Feature representations from pretrained deep neural networks have been known to exhibit excellent generalization and utility across a variety of related tasks. Finetuning is by far the simplest and most widely used approach that seeks to exploit and adapt these feature representations to novel tasks with limited data. Despite the effectiveness of finetuning, itis often suboptimal and requires very careful optimization to prevent severe overfitting to small datasets. The problem of suboptimality and overfitting, is due in part to the large number of parameters used in a typical deep convolutional neural network. To address these problems, we propose a simple yet effective regularization method for finetuning pretrained deep networks for the task of kshot learning. To prevent overfitting, our key strategy is to cluster the model parameters while ensuring intracluster similarity and intercluster diversity of the parameters, effectively regularizing the dimensionality of the parameter search space. In particular, we identify groups of neurons within each layer of a deep network that shares similar activation patterns. When the network is to be finetuned for a classification task using only k examples, we propagate a single gradient to all of the neuron parameters that belong to the same group. The grouping of neurons is nontrivial as neuron activations depend on the distribution of the input data. To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer. Experimental results show that our method can be easily applied to several popular convolutional neural networks and improve upon other stateoftheart finetuning based kshot learning strategies by more than10%more » « less

Abstract Shared control of mobile robots integrates manual input with auxiliary autonomous controllers to improve the overall system performance. However, prior work that seeks to find the optimal shared control ratio needs an accurate human model, which is usually challenging to obtain. In this study, the authors develop an extended Twin Delayed Deep Deterministic Policy Gradient (DDPG) (TD3X)‐based shared control framework that learns to assist a human operator in teleoperating mobile robots optimally. The robot's states, shared control ratio in the previous time step, and human's control input is used as inputs to the reinforcement learning (RL) agent, which then outputs the optimal shared control ratio between human input and autonomous controllers without knowing the human model. Noisy softmax policies are developed to make the TD3X algorithm feasible under the constraint of a shared control ratio. Furthermore, to accelerate the training process and protect the robot, a navigation demonstration policy and a safety guard are developed. A neural network (NN) structure is developed to maintain the correlation of sensor readings among heterogeneous input data and improve the learning speed. In addition, an extended DAGGER (DAGGERX) human agent is developed for training the RL agent to reduce human workload. Robot simulations and experiments with humans in the loop are conducted. The results show that the DAGGERX human agent can simulate real human inputs in the worst‐case scenarios with a mean square error of 0.0039. Compared to the original TD3 agent, the TD3X‐based shared control system decreased the average collision number from 387.3 to 44.4 in a simplistic environment and 394.2 to 171.2 in a more complex environment. The maximum average return increased from 1043 to 1187 with a faster converge speed in the simplistic environment, while the performance is equally good in the complex environment because of the use of an advanced human agent. In the human subject tests, participants' average perceived workload was significantly lower in shared control than that in exclusively manual control (26.90 vs. 40.07,
p = 0.013). 
o shift the computational burden from realtime to offline in delaycritical power systems applications, recent works entertain the idea of using a deep neural network (DNN) to predict the solutions of the AC optimal power flow (ACOPF) once presented load demands. As network topologies may change, training this DNN in a sampleefficient manner becomes a necessity. To improve data efficiency, this work utilizes the fact OPF data are not simple training labels, but constitute the solutions of a parametric optimization problem. We thus advocate training a sensitivityinformed DNN (SIDNN) to match not only the OPF optimizers, but also their partial derivatives with respect to the OPF parameters (loads). It is shown that the required Jacobian matrices do exist under mild conditions, and can be readily computed from the related primal/dual solutions. The proposed SIDNN is compatible with a broad range of OPF solvers, including a nonconvex quadratically constrained quadratic program (QCQP), its semidefinite program (SDP) relaxation, and MATPOWER; while SIDNN can be seamlessly integrated in other learningtoOPF schemes. Numerical tests on three benchmark power systems corroborate the advanced generalization and constraint satisfaction capabilities for the OPF solutions predicted by an SIDNN over a conventionally trained DNN, especially in lowdata setups.more » « less