skip to main content

Title: Uncertainty quantification using martingales for misspecified Gaussian processes
We address uncertainty quantification for Gaussian processes (GPs) under misspecified priors, with an eye towards Bayesian Optimization (BO). GPs are widely used in BO because they easily enable exploration based on posterior uncertainty bands. However, this convenience comes at the cost of robustness: a typical function encountered in practice is unlikely to have been drawn from the data scientist’s prior, in which case uncertainty estimates can be misleading, and the resulting exploration can be suboptimal. We present a frequentist approach to GP/BO uncertainty quantification. We utilize the GP framework as a working model, but do not assume correctness of the prior. We instead construct a \emph{confidence sequence} (CS) for the unknown function using martingale techniques. There is a necessary cost to achieving robustness: if the prior was correct, posterior GP bands are narrower than our CS. Nevertheless, when the prior is wrong, our CS is statistically valid and empirically outperforms standard GP methods, in terms of both coverage and utility for BO. Additionally, we demonstrate that powered likelihoods provide robustness against model misspecification.
Authors:
;
Award ID(s):
1916320
Publication Date:
NSF-PAR ID:
10251948
Journal Name:
Proceedings of Machine Learning Research
Volume:
132
Page Range or eLocation-ID:
963-982
ISSN:
2640-3498
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Bayesian optimization (BO) has been leveraged for guiding autonomous and high-throughput experiments in materials science. However, few have evaluated the efficiency of BO across a broad range of experimental materials domains. In this work, we quantify the performance of BO with a collection of surrogate model and acquisition function pairs across five diverse experimental materials systems. By defining acceleration and enhancement metrics for materials optimization objectives, we find that surrogate models such as Gaussian Process (GP) with anisotropic kernels and Random Forest (RF) have comparable performance in BO, and both outperform the commonly used GP with isotropic kernels. GP with anisotropic kernels has demonstrated the most robustness, yet RF is a close alternative and warrants more consideration because it is free from distribution assumptions, has smaller time complexity, and requires less effort in initial hyperparameter selection. We also raise awareness about the benefits of using GP with anisotropic kernels in future materials optimization campaigns.

  2. Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of ab initio molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires O(( NM) 3 ) computational operations for computing the likelihood function and making predictions, where N is the number of atoms and M is the number of simulated configurations in the training sample due to the inversion of a large covariance matrix. The high computational cost limits its applications to the simulation of small molecules. The computational challenge of using both gradient information and function values in GPs was recently noticed in machine learning communities, whereas conventional approximation methods may not work well. Here, we introduce a new approach, the atomized force field model, that integrates both force and energy in the emulator with many fewer computational operations. The drastic reduction in computation is achieved by utilizing the naturally sparse covariance structure that satisfies the constraints of the energy conservation and permutation symmetry of atoms. The efficient machine learning algorithm extends the limits of its applications on larger molecules undermore »the same computational budget, with nearly no loss of predictive accuracy. Furthermore, our approach contains an uncertainty assessment of predictions of atomic forces and energies, useful for developing a sequential design over the chemical input space.« less
  3. Abstract. We consider the problem of inferring the basal sliding coefficientfield for an uncertain Stokes ice sheet forward model from syntheticsurface velocity measurements. The uncertainty in the forward modelstems from unknown (or uncertain) auxiliary parameters (e.g., rheologyparameters). This inverse problem is posed within the Bayesianframework, which provides a systematic means of quantifyinguncertainty in the solution. To account for the associated modeluncertainty (error), we employ the Bayesian approximation error (BAE)approach to approximately premarginalize simultaneously over both thenoise in measurements and uncertainty in the forward model. We alsocarry out approximative posterior uncertainty quantification based ona linearization of the parameter-to-observable map centered at themaximum a posteriori (MAP) basal sliding coefficient estimate, i.e.,by taking the Laplace approximation. The MAP estimate is found byminimizing the negative log posterior using an inexact Newtonconjugate gradient method. The gradient and Hessian actions to vectorsare efficiently computed using adjoints. Sampling from theapproximate covariance is made tractable by invoking a low-rankapproximation of the data misfit component of the Hessian. We studythe performance of the BAE approach in the context of three numericalexamples in two and three dimensions. For each example, the basalsliding coefficient field is the parameter of primary interest whichwe seek to infer, and the rheology parameters (e.g., themore »flow ratefactor or the Glen's flow law exponent coefficient field) representso-called nuisance (secondary uncertain) parameters. Our resultsindicate that accounting for model uncertainty stemming from thepresence of nuisance parameters is crucial. Namely our findingssuggest that using nominal values for these parameters, as is oftendone in practice, without taking into account the resulting modelingerror, can lead to overconfident and heavily biased results. We alsoshow that the BAE approach can be used to account for the additionalmodel uncertainty at no additional cost at the online stage.« less
  4. It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inferencemore »enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.« less
  5. Mean-field Variational Bayes (MFVB) is an approximate Bayesian posterior inference technique that is increasingly popular due to its fast runtimes on large-scale data sets. However, even when MFVB provides accurate posterior means for certain parameters, it often mis-estimates variances and covariances. Furthermore, prior robustness measures have remained undeveloped for MFVB. By deriving a simple formula for the effect of infinitesimal model perturbations on MFVB posterior means, we provide both improved covariance estimates and local robustness measures for MFVB, thus greatly expanding the practical usefulness of MFVB posterior approximations. The estimates for MFVB posterior covariances rely on a result from the classical Bayesian robustness literature that relates derivatives of posterior expectations to posterior covariances and includes the Laplace approximation as a special case. Our key condition is that the MFVB approximation provides good estimates of a select subset of posterior means---an assumption that has been shown to hold in many practical settings. In our experiments, we demonstrate that our methods are simple, general, and fast, providing accurate posterior uncertainty estimates and robustness measures with runtimes that can be an order of magnitude faster than MCMC.