Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.
more »
« less
Rapid Discrete Optimization via Simulation with Gaussian Markov Random Fields
Inference-based optimization via simulation, which substitutes Gaussian process (GP) learning for the structural properties exploited in mathematical programming, is a powerful paradigm that has been shown to be remarkably effective in problems of modest feasible-region size and decision-variable dimension. The limitation to “modest” problems is a result of the computational overhead and numerical challenges encountered in computing the GP conditional (posterior) distribution on each iteration. In this paper, we substantially expand the size of discrete-decision-variable optimization-via-simulation problems that can be attacked in this way by exploiting a particular GP—discrete Gaussian Markov random fields—and carefully tailored computational methods. The result is the rapid Gaussian Markov Improvement Algorithm (rGMIA), an algorithm that delivers both a global convergence guarantee and finite-sample optimality-gap inference for significantly larger problems. Between infrequent evaluations of the global conditional distribution, rGMIA applies the full power of GP learning to rapidly search smaller sets of promising feasible solutions that need not be spatially close. We carefully document the computational savings via complexity analysis and an extensive empirical study. Summary of Contribution: The broad topic of the paper is optimization via simulation, which means optimizing some performance measure of a system that may only be estimated by executing a stochastic, discrete-event simulation. Stochastic simulation is a core topic and method of operations research. The focus of this paper is on significantly speeding-up the computations underlying an existing method that is based on Gaussian process learning, where the underlying Gaussian process is a discrete Gaussian Markov Random Field. This speed-up is accomplished by employing smart computational linear algebra, state-of-the-art algorithms, and a careful divide-and-conquer evaluation strategy. Problems of significantly greater size than any other existing algorithm with similar guarantees can solve are solved as illustrations.
more »
« less
- Award ID(s):
- 1854562
- PAR ID:
- 10335104
- Date Published:
- Journal Name:
- INFORMS Journal on Computing
- Volume:
- 33
- Issue:
- 3
- ISSN:
- 1091-9856
- Page Range / eLocation ID:
- 915 to 930
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.more » « less
-
Multi-output Gaussian process (GP) regression has been widely used as a flexible nonparametric Bayesian model for predicting multiple correlated outputs given inputs. However, the cubic complexity in the sample size and the output dimensions for inverting the kernel matrix has limited their use in the large-data regime. In this paper, we introduce the factorial stochastic differential equation as a representation of multi-output GP regression, which is a factored state-space representation as in factorial hidden Markov models. We propose a structured mean-field variational inference approach that achieves a time complexity linear in the number of samples, along with its sparse variational inference counterpart with complexity linear in the number of inducing points. On simulated and real-world data, we show that our approach significantly improves upon the scalability of previous methods, while achieving competitive prediction accuracy.more » « less
-
Feng, B.; Pedrielli, G; Peng, Y.; Shashaani, S.; Song, E.; Corlu, C.; Lee, L.; Chew, E.; Roeder, T.; Lendermann, P. (Ed.)The Rapid Gaussian Markov Improvement Algorithm (rGMIA) solves discrete optimization via simulation problems by using a Gaussian Markov random field and complete expected improvement as the sampling and stopping criterion. rGMIA has been created as a sequential sampling procedure run on a single processor. In this paper, we extend rGMIA to a parallel computing environment when q+1 solutions can be simulated in parallel. To this end, we introduce the q-point complete expected improvement criterion to determine a batch of q+1 solutions to simulate. This new criterion is implemented in a new object-oriented rGMIA package.more » « less
-
This article considers a discrete optimization via simulation (DOvS) problem defined on a graph embedded in the high-dimensional integer grid. Several DOvS algorithms that model the responses at the solutions as a realization of a Gaussian Markov random field (GMRF) have been proposed exploiting its inferential power and computational benefits. However, the computational cost of inference increases exponentially in dimension. We propose the projected Gaussian Markov improvement algorithm (pGMIA), which projects the solution space onto a lower-dimensional space creating the region-layer graph to reduce the cost of inference. Each node on the region-layer graph can be mapped to a set of solutions projected to the node; these solutions form a lower-dimensional solution-layer graph. We define the response at each region-layer node to be the average of the responses within the corresponding solution-layer graph. From this relation, we derive the region-layer GMRF to model the region-layer responses. The pGMIA alternates between the two layers to make a sampling decision at each iteration. It first selects a region-layer node based on the lower-resolution inference provided by the region-layer GMRF, then makes a sampling decision among the solutions within the solution-layer graph of the node based on the higher-resolution inference from the solution-layer GMRF. To solve even higher-dimensional problems (e.g., 100 dimensions), we also propose the pGMIA+: a multi-layer extension of the pGMIA. We show that both pGMIA and pGMIA+ converge to the optimum almost surely asymptotically and empirically demonstrate their competitiveness against state-of-the-art high-dimensional Bayesian optimization algorithms.more » « less
An official website of the United States government

