Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the probably approximately correct (PAC)Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (relative entropy programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural networkbased grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstaclemore »
Deep Policy Gradient for Reactive Power Control in Distribution Systems
Pronounced variability due to the growth of renewable energy sources, flexible loads, and distributed generation is challenging residential distribution systems. This context, motivates
well fast, efficient, and robust reactive power control. Optimal reactive power control is possible in theory by solving a nonconvex optimization problem based on the exact model of distribution flow. However, lack of highprecision instrumentation and reliable communications, as well as the heavy computational burden of nonconvex optimization solvers render computing and implementing the optimal control challenging in practice. Taking a statistical learning viewpoint, the inputoutput relationship between each grid state and the corresponding optimal reactive power control (a.k.a., policy) is parameterized in the present work by a deep neural network, whose unknown weights are updated by
minimizing the accumulated power loss over a number of historical and simulated training pairs, using the policy gradient method. In the inference phase, one just feeds the realtime state vector into the learned neural network to obtain the ‘optimal’ reactive power control decision with only several matrixvector multiplications. The merits of this novel deep policy gradient approach include its computational efficiency as well as robustness to random input perturbations. Numerical tests on a 47bus distribution network using real solar and consumption data more »
 Award ID(s):
 1901134
 Publication Date:
 NSFPAR ID:
 10273949
 Journal Name:
 Proceedings of IEEE Smartgridcom Conference
 Sponsoring Org:
 National Science Foundation
More Like this


Feature representations from pretrained deep neural networks have been known to exhibit excellent generalization and utility across a variety of related tasks. Finetuning is by far the simplest and most widely used approach that seeks to exploit and adapt these feature representations to novel tasks with limited data. Despite the effectiveness of finetuning, itis often suboptimal and requires very careful optimization to prevent severe overfitting to small datasets. The problem of suboptimality and overfitting, is due in part to the large number of parameters used in a typical deep convolutional neural network. To address these problems, we propose a simple yet effective regularization method for finetuning pretrained deep networks for the task of kshot learning. To prevent overfitting, our key strategy is to cluster the model parameters while ensuring intracluster similarity and intercluster diversity of the parameters, effectively regularizing the dimensionality of the parameter search space. In particular, we identify groups of neurons within each layer of a deep network that shares similar activation patterns. When the network is to be finetuned for a classification task using only k examples, we propagate a single gradient to all of the neuron parameters that belong to the same group. The grouping ofmore »

Mobile devices such as drones and autonomous vehicles increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks such as navigation, targettracking and surveillance, just to name a few. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Lowcomplexity object tracking (OT) is thus used along with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which does not comply with realtime applications requirements. Offloading OD to edge servers can mitigate this issue, but existing work focuses on the optimization of the offloading process in systems where the wireless channel has a very large capacity. Herein, we consider systems with constrained and erratic channel capacity, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose KatchUp, a novel tracking mechanism that improves the system resilience to excessive OD delay. We show that this technique greatly improves the quality of the reference available to tracking, and boosts performance up to 33%. However, while KatchUp significantly improves performance, it also increases the computing loadmore »

Deep reinforcement learning (RL) has led to encouraging successes in many challenging control tasks. However, a deep RL model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Programmatic policies structured in more interpretable representations emerge as a promising solution. Yet two shortcomings remain: First, synthesizing programmatic policies requires optimizing over the discrete and nondifferentiable search space of program architectures. Previous works are suboptimal because they only enumerate program architectures greedily guided by a pretrained RL oracle. Second, these works do not exploit compositionality, an important programming concept, to reuse and compose primitive functions to form a complex function for new tasks. Our first contribution is a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules. Our algorithm allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policygradient methods, and thus does not require a pretrained oracle. Our second contribution is improving programmatic policies to support compositionality by integrating primitive functions learned to grasp taskagnostic skills as a composite program to solve novel RL problems. Experiment results demonstrate that ourmore »

We propose a datadriven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. Current network compression methods either find a lowrank factorization of the features that requires more memory, or select only a subset of features by pruning entire filter channels. We propose the Cascaded Projection (CaP) compression method that projects the output and input filter channels of successive layers to a unified low dimensional space based on a lowrank projection. We optimize the projection to minimize classification loss and the difference between the next layer’s features in the compressed and uncompressed networks. To solve this nonconvex optimization problem we propose a new optimization method of a proxy matrix using back propagation and Stochastic Gradient Descent (SGD) with geometric constraints. Our cascaded projection approach leads to improvements in all critical areas of network compression: high accuracy, low memory consumption, low parameter count and high processing speed. The proposed CaP method demonstrates stateoftheart results compressing VGG16 and ResNet networks with over 4× reduction in the number of computations and excellent performance in top5 accuracy on the ImageNet dataset before and after finetuning.