skip to main content

Title: Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency
The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; it allows for principled learning of hyperparameters; and it can provide uncertainty quantification. The posterior distribution may in principle be sampled by means of MCMC or SMC methods, but for many problems it is computationally infeasible to do so. In this situation maximum a posteriori (MAP) estimators are often sought. Whilst these are relatively cheap to compute, and have an attractive variational formulation, a key drawback is their lack of invariance under change of parameterization; it is important to study MAP estimators, however, because they provide a link with classical optimization approaches to inverse problems and the Bayesian link may be used to improve upon classical optimization approaches. The lack of invariance of MAP estimators under change of parameterization is a particularly significant issue when hierarchical priors are employed to learn hyperparameters. In this paper we study the effect of the choice of parameterization on MAP estimators when a conditionally Gaussian hierarchical prior distribution is employed. Specifically we consider the centred parameterization, the natural parameterization in which the unknown state is solved for directly, and the noncentred parameterization, which works with a more » whitened Gaussian as the unknown state variable, and arises naturally when considering dimension-robust MCMC algorithms; MAP estimation is well-defined in the nonparametric setting only for the noncentred parameterization. However, we show that MAP estimates based on the noncentred parameterization are not consistent as estimators of hyperparameters; conversely, we show that limits of finite-dimensional centred MAP estimators are consistent as the dimension tends to infinity. We also consider empirical Bayesian hyperparameter estimation, show consistency of these estimates, and demonstrate that they are more robust with respect to noise than centred MAP estimates. An underpinning concept throughout is that hyperparameters may only be recovered up to measure equivalence, a well-known phenomenon in the context of the Ornstein–Uhlenbeck process. The applicability of the results is demonstrated concretely with the study of hierarchical Whittle–Matérn and ARD priors. « less
; ;
Award ID(s):
Publication Date:
Journal Name:
The SMAI journal of computational mathematics
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions they describe. Therefore, researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally as models in this context. The growth in data acquisition, both through experiment and through simulations, provides an opportunity for the systematic derivation of SDE models in many disciplines. However, inconsistencies between SDEs and real data at short time scales often cause problems, when standard statisticalmore »methodology is applied to parameter estimation. The incompatibility between SDEs and real data can be addressed by deriving sufficient statistics from the time-series data and learning parameters of SDEs based on these. Here, we study sufficient statistics computed from time averages, an approach that we demonstrate to lead to sufficient statistics on a variety of problems and that has the secondary benefit of obviating the need to match trajectories. Following this approach, we formulate the fitting of SDEs to sufficient statistics from real data as an inverse problem and demonstrate that this inverse problem can be solved by using ensemble Kalman inversion. Furthermore, we create a framework for non-parametric learning of drift and diffusion terms by introducing hierarchical, refinable parameterizations of unknown functions, using Gaussian process regression. We demonstrate the proposed methodology for the fitting of SDE models, first in a simulation study with a noisy Lorenz ’63 model, and then in other applications, including dimension reduction in deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics and simplified models for key observables arising in molecular dynamics. The results confirm that the proposed methodology provides a robust and systematic approach to fitting SDE models to real data.« less
  2. Many imaging problems can be formulated as inverse problems expressed as finite-dimensional optimization problems. These optimization problems generally consist of minimizing the sum of a data fidelity and regularization terms. In Darbon (SIAM J. Imag. Sci. 8:2268–2293, 2015), Darbon and Meng, (On decomposition models in imaging sciences and multi-time Hamilton-Jacobi partial differential equations, arXiv preprint arXiv:1906.09502, 2019), connections between these optimization problems and (multi-time) Hamilton-Jacobi partial differential equations have been proposed under the convexity assumptions of both the data fidelity and regularization terms. In particular, under these convexity assumptions, some representation formulas for a minimizer can be obtained. From amore »Bayesian perspective, such a minimizer can be seen as a maximum a posteriori estimator. In this chapter, we consider a certain class of non-convex regularizations and show that similar representation formulas for the minimizer can also be obtained. This is achieved by leveraging min-plus algebra techniques that have been originally developed for solving certain Hamilton-Jacobi partial differential equations arising in optimal control. Note that connections between viscous Hamilton-Jacobi partial differential equations and Bayesian posterior mean estimators with Gaussian data fidelity terms and log-concave priors have been highlighted in Darbon and Langlois, (On Bayesian posterior mean estimators in imaging sciences and Hamilton-Jacobi partial differential equations, arXiv preprint arXiv:2003.05572, 2020). We also present similar results for certain Bayesian posterior mean estimators with Gaussian data fidelity and certain non-log-concave priors using an analogue of min-plus algebra techniques.« less
  3. McCulloch, R. (Ed.)
    Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, wemore »aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.« less
  4. Abstract This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirichlet boundary conditions. We also investigate the graph-based approximation of forward models from PDE parameters to observed quantities. In the construction of graph-based prior and forward models, we leverage the ghost point diffusion map algorithm to approximate second-order elliptic operators with classical boundary conditions. Numerical results validate our graph-based approach and demonstrate the need to designmore »prior covariance models that account for boundary conditions.« less
  5. In this article, we provide closed-form approximations of log-likelihood ratio (LLR) values for direct sequence spread spectrum (DS-SS) systems over three particular scenarios, which are commonly found in the Global Navigation Satellite System (GNSS) environment. Those scenarios are the open sky with smooth variation of the signal-to-noise ratio (SNR), the additive Gaussian interference, and pulsed jamming. In most of the current communications systems, block-wise estimators are considered. However, for some applications such as GNSSs, symbol-wise estimators are available due to the low data rate. Usually, the noise variance is considered either perfectly known or available through symbol-wise estimators, leading tomore »possible mismatched demodulation, which could induce errors in the decoding process. In this contribution, we first derive two closed-form expressions for LLRs in additive white Gaussian and Laplacian noise channels, under noise uncertainty, based on conjugate priors. Then, assuming those cases where the statistical knowledge about the estimation error is characterized by a noise variance following an inverse log-normal distribution, we derive the corresponding closed-form LLR approximations. The relevance of the proposed expressions is investigated in the context of the GPS L1C signal where the clock and ephemeris data (CED) are encoded with low-density parity-check (LDPC) codes. Then, the CED is iteratively decoded based on the belief propagation (BP) algorithm. Simulation results show significant frame error rate (FER) improvement compared to classical approaches not accounting for such uncertainty.« less