skip to main content

Title: Uncertainty quantification using martingales for misspecified Gaussian processes
We address uncertainty quantification for Gaussian processes (GPs) under misspecified priors, with an eye towards Bayesian Optimization (BO). GPs are widely used in BO because they easily enable exploration based on posterior uncertainty bands. However, this convenience comes at the cost of robustness: a typical function encountered in practice is unlikely to have been drawn from the data scientist’s prior, in which case uncertainty estimates can be misleading, and the resulting exploration can be suboptimal. We present a frequentist approach to GP/BO uncertainty quantification. We utilize the GP framework as a working model, but do not assume correctness of the prior. We instead construct a \emph{confidence sequence} (CS) for the unknown function using martingale techniques. There is a necessary cost to achieving robustness: if the prior was correct, posterior GP bands are narrower than our CS. Nevertheless, when the prior is wrong, our CS is statistically valid and empirically outperforms standard GP methods, in terms of both coverage and utility for BO. Additionally, we demonstrate that powered likelihoods provide robustness against model misspecification.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method. 
    more » « less
  2. Optimizing a black-box function that is expensive to evaluate emerges in a gamut of machine learning and artifcial intelligence applications including drug discovery, policy optimization in robotics, and hyperparameter tuning of learning models to list a few. Bayesian optimization (BO) provides a principled framework to fnd the global optimum of such functions using a limited number of function evaluations. BO relies on a statistical surrogate model to actively select new query points, that is typically captured by a Gaussian process (GP). Unlike most existing approaches that hinge on a single GP surrogate model with a pre-selected kernel function that may confne the expressiveness of the sought function especially under the limited evaluation budget, the present work puts forth a weighted ensemble of GPs as a surrogate model. Building on the advocated Gaussian mixture (GM) posterior, the EGP framework adapts to the most ftted surrogate model as data arrive on-the-fy, offering a richer function space. For the acquisition of next evaluation points, the EGP-based posterior is coupled with an adaptive expected improvement (EI) criterion to balance exploration and exploitation of the search space. Numerical tests on a set of benchmark synthetic functions and two robotic tasks, demonstrate the impressive benefts of the proposed approach. 
    more » « less
  3. Abstract

    Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes both optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Markov chain Monte Carlo (MCMC) methods typically requiremodel runs and are sensitive to internal variability noise, rendering them infeasible for climate models. Here we demonstrate an approach to model calibration and uncertainty quantification that requires onlymodel runs and can accommodate internal climate variability. The approach consists of three stages: (a) a calibration stage uses variants of ensemble Kalman inversion to calibrate a model by minimizing mismatches between model and data statistics; (b) an emulation stage emulates the parameter‐to‐data map with Gaussian processes (GP), using the model runs in the calibration stage for training; (c) a sampling stage approximates the Bayesian posterior distributions by sampling the GP emulator with MCMC. We demonstrate the feasibility and computational efficiency of this calibrate‐emulate‐sample (CES) approach in a perfect‐model setting. Using an idealized general circulation model, we estimate parameters in a simple convection scheme from synthetic data generated with the model. The CES approach generates probability distributions of the parameters that are good approximations of the Bayesian posteriors, at a fraction of the computational cost usually required to obtain them. Sampling from this approximate posterior allows the generation of climate predictions with quantified parametric uncertainties.

    more » « less

    The spatio-temporal properties of seismicity give us incisive insight into the stress state evolution and fault structures of the crust. Empirical models based on self-exciting point processes continue to provide an important tool for analysing seismicity, given the epistemic uncertainty associated with physical models. In particular, the epidemic-type aftershock sequence (ETAS) model acts as a reference model for studying seismicity catalogues. The traditional ETAS model uses simple parametric definitions for the background rate of triggering-independent seismicity. This reduces the effectiveness of the basic ETAS model in modelling the temporally complex seismicity patterns seen in seismic swarms that are dominated by aseismic tectonic processes such as fluid injection rather than aftershock triggering. In order to robustly capture time-varying seismicity rates, we introduce a deep Gaussian process (GP) formulation for the background rate as an extension to ETAS. GPs are a robust non-parametric model for function spaces with covariance structure. By conditioning the length-scale structure of a GP with another GP, we have a deep-GP: a probabilistic, hierarchical model that automatically tunes its structure to match data constraints. We show how the deep-GP-ETAS model can be efficiently sampled by making use of a Metropolis-within-Gibbs scheme, taking advantage of the branching process formulation of ETAS and a stochastic partial differential equation (SPDE) approximation for Matérn GPs. We illustrate our method using synthetic examples, and show that the deep-GP-ETAS model successfully captures multiscale temporal behaviour in the background forcing rate of seismicity. We then apply the results to two real-data catalogues: the Ridgecrest, CA 2019 July 5 Mw 7.1 event catalogue, showing that deep-GP-ETAS can successfully characterize a classical aftershock sequence; and the 2016–2019 Cahuilla, CA earthquake swarm, which shows two distinct phases of aseismic forcing concordant with a fluid injection-driven initial sequence, arrest of the fluid along a physical barrier and release following the largest Mw 4.4 event of the sequence.

    more » « less
  5. Abstract

    Bayesian optimization (BO) has been leveraged for guiding autonomous and high-throughput experiments in materials science. However, few have evaluated the efficiency of BO across a broad range of experimental materials domains. In this work, we quantify the performance of BO with a collection of surrogate model and acquisition function pairs across five diverse experimental materials systems. By defining acceleration and enhancement metrics for materials optimization objectives, we find that surrogate models such as Gaussian Process (GP) with anisotropic kernels and Random Forest (RF) have comparable performance in BO, and both outperform the commonly used GP with isotropic kernels. GP with anisotropic kernels has demonstrated the most robustness, yet RF is a close alternative and warrants more consideration because it is free from distribution assumptions, has smaller time complexity, and requires less effort in initial hyperparameter selection. We also raise awareness about the benefits of using GP with anisotropic kernels in future materials optimization campaigns.

    more » « less