skip to main content

Title: Variational inference with NoFAS: Normalizing flow with adaptive surrogate for computationally expensive models
Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at  more » « less
Award ID(s):
1918692 1942662
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of computational physics
15 October 2022
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We consider Bayesian inference for large-scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require O ( 1 0 4 ) model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefore derivative-free algorithms are highly desirable. We propose a framework, which is built on Kalman methodology, to efficiently perform Bayesian inference in such inverse problems. The basic method is based on an approximation of the filtering distribution of a novel mean-field dynamical system, into which the inverse problem is embedded as an observation operator. Theoretical properties are established for linear inverse problems, demonstrating that the desired Bayesian posterior is given by the steady state of the law of the filtering distribution of the mean-field dynamical system, and proving exponential convergence to it. This suggests that, for nonlinear problems which are close to Gaussian, sequentially computing this law provides the basis for efficient iterative methods to approximate the Bayesian posterior. Ensemble methods are applied to obtain interacting particle system approximations of the filtering distribution of the mean-field model; and practical strategies to further reduce the computational and memory cost of the methodology are presented, including low-rank approximation and a bi-fidelity approach. The effectiveness of the framework is demonstrated in several numerical experiments, including proof-of-concept linear/nonlinear examples and two large-scale applications: learning of permeability parameters in subsurface flow; and learning subgrid-scale parameters in a global climate model. Moreover, the stochastic ensemble Kalman filter and various ensemble square-root Kalman filters are all employed and are compared numerically. The results demonstrate that the proposed method, based on exponential convergence to the filtering distribution of a mean-field dynamical system, is competitive with pre-existing Kalman-based methods for inverse problems. 
    more » « less
  2. Abstract

    Big datasets are gathered daily from different remote sensing platforms. Recently, statistical co‐kriging models, with the help of scalable techniques, have been able to combine such datasets by using spatially varying bias corrections. The associated Bayesian inference for these models is usually facilitated via Markov chain Monte Carlo (MCMC) methods which present (sometimes prohibitively) slow mixing and convergence because they require the simulation of high‐dimensional random effect vectors from their posteriors given large datasets. To enable fast inference in big data spatial problems, we propose the recursive nearest neighbor co‐kriging (RNNC) model. Based on this model, we develop two computationally efficient inferential procedures: (a) the collapsed RNNC which reduces the posterior sampling space by integrating out the latent processes, and (b) the conjugate RNNC, an MCMC free inference which significantly reduces the computational time without sacrificing prediction accuracy. An important highlight of conjugate RNNC is that it enables fast inference in massive multifidelity data sets by avoiding expensive integration algorithms. The efficient computational and good predictive performances of our proposed algorithms are demonstrated on benchmark examples and the analysis of the High‐resolution Infrared Radiation Sounder data gathered from two NOAA polar orbiting satellites in which we managed to reduce the computational time from multiple hours to just a few minutes.

    more » « less
  3. De Vico Fallani, Fabrizio (Ed.)
    The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. 
    more » « less
  4. Phylogenetic networks extend the phylogenetic tree structure and allow for modeling vertical and horizontal evolution in a single framework. Statistical inference of phylogenetic networks is prohibitive and currently limited to small networks. An approach that could significantly improve phylogenetic network space exploration is based on first inferring an evolutionary tree of the species under consideration, and then augmenting the tree into a network by adding a set of "horizontal" edges to better fit the data. In this paper, we study the performance of such an approach on networks generated under a birth-hybridization model and explore its feasibility as an alternative to approaches that search the phylogenetic network space directly (without relying on a fixed underlying tree). We find that the concatenation method does poorly at obtaining a "backbone" tree that could be augmented into the correct network, whereas the popular species tree inference method ASTRAL does significantly better at such a task. We then evaluated the tree-to-network augmentation phase under the minimizing deep coalescence and pseudo-likelihood criteria. We find that even though this is a much faster approach than the direct search of the network space, the accuracy is much poorer, even when the backbone tree is a good starting tree. Our results show that tree-based inference of phylogenetic networks could yield very poor results. As exploration of the network space directly in search of maximum likelihood estimates or a representative sample of the posterior is very expensive, significant improvements to the computational complexity of phylogenetic network inference are imperative if analyses of large data sets are to be performed. We show that a recently developed divide-and-conquer approach significantly outperforms tree-based inference in terms of accuracy, albeit still at a higher computational cost. 
    more » « less
  5. Abstract

    Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces.

    Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

    more » « less