The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting, algorithms with entropy regularization have been developed, leading to improvements over deterministic methods. Despite the distinct benefits of these approaches, deep RL algorithms for the entropy-regularized average-reward objective have not been developed. While policy-gradient based approaches have recently been presented for the average-reward literature, the corresponding actor-critic framework remains less explored. In this paper, we introduce an average-reward soft actor-critic algorithm to address these gaps in the field. We validate our method by comparing with existing average-reward algorithms on standard RL benchmarks, achieving superior performance for the average-reward criterion.
more »
« less
Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors
Gaussian Mixture Models (GMMs) have been recently proposed for approximating actors in actor-critic reinforcement learning algorithms. Such GMM-based actors are commonly optimized using stochastic policy gradients along with an entropy maximization objective. In contrast to previous work, we define and study deterministic policy gradients for optimiz- ing GMM-based actors. Similar to stochastic gradient approaches, our proposed method, denoted Gaussian Mixture Deterministic Policy Gradient (Gamid-PG), encourages policy entropy maximization. To this end, we define the GMM entropy gradient using Varia- tional Approximation of the KL-divergence between the GMM’s component Gaussians. We compare Gamid-PG with common stochastic policy gradient methods on benchmark dense- reward MuJoCo tasks and sparse-reward Fetch tasks. We observe that Gamid-PG outper- forms stochastic gradient-based methods in 3/6 MuJoCo tasks while performing similarly on the remaining 3 tasks. In the Fetch tasks, Gamid-PG outperforms single-actor determinis- tic gradient-based methods while performing worse than stochastic policy gradient methods. Consequently, we conclude that GMMs optimized using deterministic policy gradients (1) should be favorably considered over stochastic gradients in dense-reward continuous control tasks, and (2) improve upon single-actor deterministic gradients.
more »
« less
- Award ID(s):
- 2238979
- PAR ID:
- 10577268
- Editor(s):
- Larochelle, Hugo; Murray, Naila; Kamath, Gautam; Shah, Nihar B
- Publisher / Repository:
- Transactions on Machine Learning Research (TMLR)
- Date Published:
- Journal Name:
- Transactions on Machine Learning Research
- ISSN:
- 2835-8856
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Accurate knowledge of transmission line parameters is essential for a variety of power system monitoring, protection, and control applications. The use of phasor measurement unit (PMU) data for transmission line parameter estimation (TLPE) is well-documented. However, existing literature on PMU-based TLPE implicitly assumes the measurement noise to be Gaussian. Recently, it has been shown that the noise in PMU measurements (especially in the current phasors) is better represented by Gaussian mixture models (GMMs), i.e., the noises are non-Gaussian. We present a novel approach for TLPE that can handle non-Gaussian noise in the PMU measurements. The measurement noise is expressed as a GMM, whose components are identified using the expectation-maximization (EM) algorithm. Subsequently, noise and parameter estimation is carried out by solving a maximum likelihood estimation problem iteratively until convergence. The superior performance of the proposed approach over traditional approaches such as least squares and total least squares as well as the more recently proposed minimum total error entropy approach is demonstrated by performing simulations using the IEEE 118-bus system as well as proprietary PMU data obtained from a U.S. power utility.more » « less
-
Accurate knowledge of transmission line parameters is essential for a variety of power system monitoring, protection, and control applications. The use of phasor measurement unit (PMU) data for transmission line parameter estimation (TLPE) is well-documented. However, existing literature on PMU-based TLPE implicitly assumes the measurement noise to be Gaussian. Recently, it has been shown that the noise in PMU measurements (especially in the current phasors) is better represented by Gaussian mixture models (GMMs), i.e., the noises are non-Gaussian. We present a novel approach for TLPE that can handle non-Gaussian noise in the PMU measurements. The measurement noise is expressed as a GMM, whose components are identified using the expectation-maximization (EM) algorithm. Subsequently, noise and parameter estimation is carried out by solving a maximum likelihood estimation problem iteratively until convergence. The superior performance of the proposed approach over traditional approaches such as least squares and total least squares as well as the more recently proposed minimum total error entropy approach is demonstrated by performing simulations using the IEEE 118-bus system as well as proprietary PMU data obtained from a U.S. power utility.more » « less
-
Gaussian Mixture Models (GMM) are an effective representation of resource uncertainty in power systems planning, as they can be tractably incorporated within stochastic optimization models. However, the skewness, multimodality, and bounded physical support of long-term wind power forecasts can entail requiring a large number of mixture components to achieve a good fit, leading to complex optimization problems. We propose a probabilistic model for wind generation uncertainty to address this challenge, termed Discrete-Gaussian Mixture Model (DGMM), that combines continuous Gaussian components with discrete masses. The model generalizes classical GMMs that have been widely used to estimate wind power outputs. We employ a modified Expectation-Maximization algorithm (called FixedEM) to estimate the parameters of the DGMM. We provide empirical results on the ACTIVSg2000 synthetic wind generation dataset, where we demonstrate that the fitted DGMM is capable of capturing the high frequencies of time windows when wind generating units are either producing at maximum capacity or not producing any power at all. Furthermore, we find that the Bayesian Information Criterion of the DGMM is significantly lower compared to that of existing GMMs using the same number of Gaussian components. This improvement is particularly advantageous when the allowed number of Gaussian components is limited, facilitating the efficient solution to optimization problems for long-term planning.more » « less
-
We develop a measure for evaluating the performance of generative networks given two sets of images. A popular performance measure currently used to do this is the Fréchet Inception Distance (FID). FID assumes that images featurized using the penultimate layer of Inception-v3 follow a Gaussian distribution, an assumption which cannot be violated if we wish to use FID as a metric. However, we show that Inception-v3 features of the ImageNet dataset are not Gaussian; in particular, every single marginal is not Gaussian. To remedy this problem, we model the featurized images using Gaussian mixture models (GMMs) and compute the 2-Wasserstein distance restricted to GMMs. We define a performance measure, which we call WaM, on two sets of images by using Inception-v3 (or another classifier) to featurize the images, estimate two GMMs, and use the restricted 2-Wasserstein distance to compare the GMMs. We experimentally show the advantages of WaM over FID, including how FID is more sensitive than WaM to imperceptible image perturbations. By modelling the non-Gaussian features obtained from Inception-v3 as GMMs and using a GMM metric, we can more accurately evaluate generative network performance.more » « less
An official website of the United States government

