skip to main content

Title: Rapid Discrete Optimization via Simulation with Gaussian Markov Random Fields
Inference-based optimization via simulation, which substitutes Gaussian process (GP) learning for the structural properties exploited in mathematical programming, is a powerful paradigm that has been shown to be remarkably effective in problems of modest feasible-region size and decision-variable dimension. The limitation to “modest” problems is a result of the computational overhead and numerical challenges encountered in computing the GP conditional (posterior) distribution on each iteration. In this paper, we substantially expand the size of discrete-decision-variable optimization-via-simulation problems that can be attacked in this way by exploiting a particular GP—discrete Gaussian Markov random fields—and carefully tailored computational methods. The result is the rapid Gaussian Markov Improvement Algorithm (rGMIA), an algorithm that delivers both a global convergence guarantee and finite-sample optimality-gap inference for significantly larger problems. Between infrequent evaluations of the global conditional distribution, rGMIA applies the full power of GP learning to rapidly search smaller sets of promising feasible solutions that need not be spatially close. We carefully document the computational savings via complexity analysis and an extensive empirical study. Summary of Contribution: The broad topic of the paper is optimization via simulation, which means optimizing some performance measure of a system that may only be estimated by executing a stochastic, more » discrete-event simulation. Stochastic simulation is a core topic and method of operations research. The focus of this paper is on significantly speeding-up the computations underlying an existing method that is based on Gaussian process learning, where the underlying Gaussian process is a discrete Gaussian Markov Random Field. This speed-up is accomplished by employing smart computational linear algebra, state-of-the-art algorithms, and a careful divide-and-conquer evaluation strategy. Problems of significantly greater size than any other existing algorithm with similar guarantees can solve are solved as illustrations. « less
Authors:
; ; ;
Award ID(s):
1854562
Publication Date:
NSF-PAR ID:
10335104
Journal Name:
INFORMS Journal on Computing
Volume:
33
Issue:
3
Page Range or eLocation-ID:
915 to 930
ISSN:
1091-9856
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools.We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioningmore »on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data. We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally.The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments. We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.« less
  2. It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GPmore »remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.« less
  3. We study a first-order primal-dual subgradient method to optimize risk-constrained risk-penalized optimization problems, where risk is modeled via the popular conditional value at risk (CVaR) measure. The algorithm processes independent and identically distributed samples from the underlying uncertainty in an online fashion and produces an η/√K-approximately feasible and η/√K-approximately optimal point within K iterations with constant step-size, where η increases with tunable risk-parameters of CVaR. We find optimized step sizes using our bounds and precisely characterize the computational cost of risk aversion as revealed by the growth in η. Our proposed algorithm makes a simple modification to a typical primal-dualmore »stochastic subgradient algorithm. With this mild change, our analysis surprisingly obviates the need to impose a priori bounds or complex adaptive bounding schemes for dual variables to execute the algorithm as assumed in many prior works. We also draw interesting parallels in sample complexity with that for chance-constrained programs derived in the literature with a very different solution architecture.« less
  4. Community detection in graphs can be solved via spectral methods or posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational detection thresholds in terms of the signal-to-noise ratio. By recasting community detection as a node-wise classification problem on graphs, we can also study it from a learning perspective. We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting. We show that, in a data-driven manner and without access tomore »the underlying generative models, they can match or even surpass the performance of the belief propagation algorithm on binary and multiclass stochastic block models, which is believed to reach the computational threshold in these cases. In particular, we propose to augment GNNs with the non-backtracking operator defined on the line graph of edge adjacencies. The GNNs are achieved good performance on real-world datasets. In addition, we perform the first analysis of the optimization landscape of using (linear) GNNs to solve community detection problems, demonstrating that under certain simplifications and assumptions, the loss value at any local minimum is close to the loss value at the global minimum/minima.« less
  5. Community detection in graphs can be solved via spectral methods or posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational detection thresholds in terms of the signal-to-noise ratio. By recasting community detection as a node-wise classification problem on graphs, we can also study it from a learning perspective. We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting. We show that, in a data-driven manner and without access tomore »the underlying generative models, they can match or even surpass the performance of the belief propagation algorithm on binary and multiclass stochastic block models, which is believed to reach the computational threshold in these cases. In particular, we propose to augment GNNs with the non-backtracking operator defined on the line graph of edge adjacencies. The GNNs are achieved good performance on real-world datasets. In addition, we perform the first analysis of the optimization landscape of using (linear) GNNs to solve community detection problems, demonstrating that under certain simplifications and assumptions, the loss value at any local minimum is close to the loss value at the global minimum/minima.« less