skip to main content


Title: Structured Stochastic Gradient MCMC
Stochastic gradient Markov Chain Monte Carlo (SGMCMC) is a scalable algorithm for asymptotically exact Bayesian inference in parameter-rich models, such as Bayesian neural networks. However, since mixing can be slow in high dimensions, practitioners often resort to variational inference (VI). Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. To relax these assumptions, this work proposes a new non-parametric variational inference scheme that combines ideas from both SGMCMC and coordinate-ascent VI. The approach relies on a new Langevin-type algorithm that operates on a "self-averaged" posterior energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies between coordinates can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a "dropout" manner, leading to even more scalability. We test our scheme for ResNet-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI.  more » « less
Award ID(s):
2047418 2007719 2003237
NSF-PAR ID:
10347069
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
162
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.

     
    more » « less
  2. Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS. 
    more » « less
  3. null (Ed.)
    Electrification of vehicles is becoming one of the main avenues for decarbonization of the transportation market. To reduce stress on the energy grid, large-scale charging will require optimal scheduling of when electricity is delivered to vehicles. Coordinated electric-vehicle charging can produce optimal, flattened loads that would improve reliability of the power system as well as reduce system costs and emissions. However, a challenge for successful introduction of coordinated deadline-scheduling of residential charging comes from the demand side: customers would need to be willing both to defer charging their vehicles and to accept less than a 100% target for battery charge. Within a coordinated electric-vehicle charging pilot run by the local utility in upstate New York, this study analyzes the necessary incentives for customers to accept giving up control of when charging of their vehicles takes place. Using data from a choice experiment implemented in an online survey of electric-vehicle owners and lessees in upstate New York (N=462), we make inference on the willingness to pay for features of hypothetical coordinated electric-vehicle charging programs. To address unobserved preference heterogeneity, we apply Variational Bayes (VB) inference to a mixed logit model. Stochastic variational inference has recently emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of discrete choice models. Our results show that individuals negatively perceive the duration of the timeframe in which the energy provider would be allowed to defer charging, even though both the desired target for battery charge and deadline would be respected. This negative monetary valuation is evidenced by an expected average reduction in the annual fee of joining the charging program of $2.64 per hour of control yielded to the energy provider. Our results also provide evidence of substantial heterogeneity in preferences. For example, the 25% quantile of the posterior distribution of the mean of the willingness to accept an additional hour of control yielded to the utility is $5.16. However, the negative valuation of the timeframe for deferring charging is compensated by positive valuation of emission savings coming from switching charging to periods of the day with a higher proportion of generation from renewable sources. Customers also positively valued discounts in the price of energy delivery. 
    more » « less
  4. Abstract

    We present a novel approach for the search of dark matter in the DarkSide-50 experiment, relying on Bayesian Networks. This method incorporates the detector response model into the likelihood function, explicitly maintaining the connection with the quantity of interest. No assumptions about the linearity of the problem or the shape of the probability distribution functions are required, and there is no need to morph signal and background spectra as a function of nuisance parameters. By expressing the problem in terms of Bayesian Networks, we have developed an inference algorithm based on a Markov Chain Monte Carlo to calculate the posterior probability. A clever description of the detector response model in terms of parametric matrices allows us to study the impact of systematic variations of any parameter on the final results. Our approach not only provides the desired information on the parameter of interest, but also potential constraints on the response model. Our results are consistent with recent published analyses and further refine the parameters of the detector response model.

     
    more » « less
  5. Abstract

    The ambient solar wind plays a significant role in propagating interplanetary coronal mass ejections and is an important driver of space weather geomagnetic storms. A computationally efficient and widely used method to predict the ambient solar wind radial velocity near Earth involves coupling three models: Potential Field Source Surface, Wang‐Sheeley‐Arge (WSA), and Heliospheric Upwind eXtrapolation. However, the model chain has 11 uncertain parameters that are mainly non‐physical due to empirical relations and simplified physics assumptions. We, therefore, propose a comprehensive uncertainty quantification (UQ) framework that is able to successfully quantify and reduce parametric uncertainties in the model chain. The UQ framework utilizes variance‐based global sensitivity analysis followed by Bayesian inference via Markov chain Monte Carlo to learn the posterior densities of the most influential parameters. The sensitivity analysis results indicate that the five most influential parameters are all WSA parameters. Additionally, we show that the posterior densities of such influential parameters vary greatly from one Carrington rotation to the next. The influential parameters are trying to overcompensate for the missing physics in the model chain, highlighting the need to enhance the robustness of the model chain to the choice of WSA parameters. The ensemble predictions generated from the learned posterior densities significantly reduce the uncertainty in solar wind velocity predictions near Earth.

     
    more » « less