 Award ID(s):
 2053804
 NSFPAR ID:
 10430764
 Editor(s):
 Ruiz, F.; Dy, J.; Meent, J.W.
 Date Published:
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 206
 ISSN:
 26403498
 Page Range / eLocation ID:
 29602974
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Deep neural networks (DNNs) have surpassed humanlevel accuracy in various learning tasks. However, unlike humans who have a natural cognitive intuition for probabilities, DNNs cannot express their uncertainty in the output decisions. This limits the deployment of DNNs in mission critical domains, such as warfighter decisionmaking or medical diagnosis. Bayesian inference provides a principled approach to reason about model’s uncertainty by estimating the posterior distribution of the unknown parameters. The challenge in DNNs remains the multilayer stages of nonlinearities, which make the propagation of highdimensional distributions mathematically intractable. This paper establishes the theoretical and algorithmic foundations of uncertainty or belief propagation by developing new deep learning models named PremiUmCNNs (Propagating Uncertainty in Convolutional Neural Networks). We introduce a tensor normal distribution as a prior over convolutional kernels and estimate the variational posterior by maximizing the evidence lower bound (ELBO). We start by deriving the firstorder meancovariance propagation framework. Later, we develop a framework based on the unscented transformation (correct at least up to the secondorder) that propagates sigma points of the variational distribution through layers of a CNN. The propagated covariance of the predictive distribution captures uncertainty in the output decision. Comprehensive experiments conducted on diverse benchmark datasets demonstrate: 1) superior robustness against noise and adversarial attacks, 2) selfassessment through predictive uncertainty that increases quickly with increasing levels of noise or attacks, and 3) an ability to detect a targeted attack from ambient noise.more » « less

ABSTRACT Observations of the cosmic 21cm power spectrum (PS) are starting to enable precision Bayesian inference of galaxy properties and physical cosmology, during the first billion years of our Universe. Here we investigate the impact of common approximations about the likelihood used in such inferences, including: (i) assuming a Gaussian functional form; (ii) estimating the mean from a single realization; and (iii) estimating the (co)variance at a single point in parameter space. We compare ‘classical’ inference that uses an explicit likelihood with simulationbased inference (SBI) that estimates the likelihood from a training set. Our forward models include: (i) realizations of the cosmic 21cm signal computed with 21cmFAST by varying ultraviolet (UV) and Xray galaxy parameters together with the initial conditions; (ii) realizations of the telescope noise corresponding to a $1000 \, \mathrm{h}$ integration with the lowfrequency component of the Square Kilometre Array (SKA1Low); and (iii) the excision of Fourier modes corresponding to a foregrounddominated horizon ‘wedge’. We find that the 1D PS likelihood is well described by a Gaussian accounting for covariances between wave modes and redshift bins (higher order correlations are small). However, common approaches of estimating the forwardmodelled mean and (co)variance from a random realization or at a single point in parameter space result in biased and overconstrained posteriors. Our best results come from using SBI to fit a nonGaussian likelihood with a Gaussian mixture neural density estimator. Such SBI can be performed with up to an order of magnitude fewer simulations than classical, explicit likelihood inference. Thus SBI provides accurate posteriors at a comparably low computational cost.

Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of samplingbased approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradientbased optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.more » « less

Approximating probability distributions can be a challenging task, particularly when they are supported over regions of high geometrical complexity or exhibit multiple modes. Annealing can be used to facilitate this task which is often combined with constant a priori selected increments in inverse temperature. However, using constant increments limits the computational efficiency due to the inability to adapt to situations where smooth changes in the annealed density could be handled equally well with larger increments. We introduce AdaAnn, an adaptive annealing scheduler that automatically adjusts the temperature increments based on the expected change in the KullbackLeibler divergence between two distributions with a sufficiently close annealing temperature. AdaAnn is easy to implement and can be integrated into existing sampling approaches such as normalizing flows for variational inference and Markov chain Monte Carlo. We demonstrate the computational efficiency of the AdaAnn scheduler for variational inference with normalizing flows on a number of examples, including posterior estimation of parameters for dynamical systems and probability density approximation in multimodal and highdimensional settings.more » « less

Cussens, James ; Zhang, Kun (Ed.)Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular mappings from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by rootfinding algorithms as a closedform inverse is unavailable. In this paper, we introduce a new integralbased approach termed: Atomic Unrestricted Time Machine (AUTM), equipped with unrestricted integrands and easytocompute explicit inverse. AUTM offers a versatile and efficient way to the design of normalizing flows with explicit inverse and unrestricted function classes or parameters. Theoretically, we present a constructive proof that AUTM is universal: all monotonic normalizing flows can be viewed as limits of AUTM flows. We provide a concrete example to show how to approximate any given monotonic normalizing flow using AUTM flows with guaranteed convergence. The result implies that AUTM can be used to transform an existing flow into a new one equipped with explicit inverse and unrestricted parameters. The performance of the new approach is evaluated on high dimensional density estimation, variational inference and image generation. Experiments demonstrate superior speed and memory efficiency of AUTM.more » « less