skip to main content


Title: Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow
This paper examines the problem of real-time optimization of networked systems and develops online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model. The problem is modeled as a dynamic optimization problem with time-varying performance objectives and engineering constraints. The design of the algorithms leverages the online zero-order primal-dual projected-gradient method. In particular, the primal step that involves the gradient of the objective function (and hence requires a networked systems model) is replaced by its zero-order approximation with two function evaluations using a deterministic perturbation signal. The evaluations are performed using the measurements of the system output, hence giving rise to a feedback interconnection, with the optimization algorithm serving as a feedback controller. The paper provides some insights on the stability and tracking properties of this interconnection. Finally, the paper applies this methodology to a real-time optimal power flow problem in power systems, and shows its efficacy on the IEEE 37-node distribution test feeder for reference power tracking and voltage regulation.  more » « less
Award ID(s):
1935389
NSF-PAR ID:
10348780
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
American Control Conference
Page Range / eLocation ID:
3140-3147
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper presents one of the first real-life demonstrations of coordinated and distributed resource control for secondary frequency response in a power distribution grid. A series of tests involved up to 69 heterogeneous active distributed energy resources consisting of air handling units, unidirectional and bidirectional electric vehicle charging stations, a battery energy storage system, and 107 passive distributed energy resources consisting of building loads and solar photovoltaic systems. The distributed control setup consists of a set of Raspberry Pi end-points exchanging messages via an ethernet switch. Actuation commands for the distributed energy resources are obtained by solving a power allocation problem at every regulation instant using distributed ratio-consensus, primal-dual, and Newton-like algorithms. The problem formulation minimizes the sum of distributed energy resource costs while tracking the aggregate setpoint provided by the system operator. We demonstrate accurate and fast real-time distributed computation of the optimization solution and effective tracking of the regulation signal over 40 min time horizons. An economic benefit analysis confirms eligibility to participate in an ancillary services market and demonstrates up to $53k of potential annual revenue for the selected population of distributed energy resources. 
    more » « less
  2. A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance. 
    more » « less
  3. We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics. 
    more » « less
  4. We present alfonso, an open-source Matlab package for solving conic optimization problems over nonsymmetric convex cones. The implementation is based on the authors’ corrected analysis of a method of Skajaa and Ye. It enables optimization over any convex cone as long as a logarithmically homogeneous self-concordant barrier is available for the cone or its dual. This includes many nonsymmetric cones, for example, hyperbolicity cones and their duals (such as sum-of-squares cones), semidefinite and second-order cone representable cones, power cones, and the exponential cone. Besides enabling the solution of problems that cannot be cast as optimization problems over a symmetric cone, algorithms for nonsymmetric conic optimization also offer performance advantages for problems whose symmetric cone programming representation requires a large number of auxiliary variables or has a special structure that can be exploited in the barrier computation. The worst-case iteration complexity of alfonso is the best known for nonsymmetric cone optimization: [Formula: see text] iterations to reach an ε-optimal solution, where ν is the barrier parameter of the barrier function used in the optimization. Alfonso can be interfaced with a Matlab function (supplied by the user) that computes the Hessian of a barrier function for the cone. A simplified interface is also available to optimize over the direct product of cones for which a barrier function has already been built into the software. This interface can be easily extended to include new cones. Both interfaces are illustrated by solving linear programs. The oracle interface and the efficiency of alfonso are also demonstrated using an optimal design of experiments problem in which the tailored barrier computation greatly decreases the solution time compared with using state-of-the-art, off-the-shelf conic optimization software. Summary of Contribution: The paper describes an open-source Matlab package for optimization over nonsymmetric cones. A particularly important feature of this software is that, unlike other conic optimization software, it enables optimization over any convex cone as long as a suitable barrier function is available for the cone or its dual, not limiting the user to a small number of specific cones. Nonsymmetric cones for which such barriers are already known include, for example, hyperbolicity cones and their duals (such as sum-of-squares cones), semidefinite and second-order cone representable cones, power cones, and the exponential cone. Thus, the scope of this software is far larger than most current conic optimization software. This does not come at the price of efficiency, as the worst-case iteration complexity of our algorithm matches the iteration complexity of the most successful interior-point methods for symmetric cones. Besides enabling the solution of problems that cannot be cast as optimization problems over a symmetric cone, our software can also offer performance advantages for problems whose symmetric cone programming representation requires a large number of auxiliary variables or has a special structure that can be exploited in the barrier computation. This is also demonstrated in this paper via an example in which our code significantly outperforms Mosek 9 and SCS 2. 
    more » « less
  5. We develop an optimization-based framework for joint real-time trajectory planning and feedback control of feedback-linearizable systems. To achieve this goal, we define a target trajectory as the optimal solution of a time-varying optimization problem. In general, however, such trajectory may not be feasible due to , e.g., nonholonomic constraints. To solve this problem, we design a control law that generates feasible trajectories that asymptotically converge to the target trajectory. More precisely, for systems that are (dynamic) full-state linearizable, the proposed control law implicitly transforms the nonlinear system into an optimization algorithm of sufficiently high order. We prove global exponential convergence to the target trajectory for both the optimization algorithm and the original system. We illustrate the effectiveness of our proposed method on multi-target or multi-agent tracking problems with constraints. 
    more » « less