NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

https://doi.org/10.1137/24M1678659

Wang, Zheyu Oliver; Baptista, Ricardo; Marzouk, Youssef; Ruthotto, Lars; Verma, Deepanshu (August 2025, SIAM Journal on Scientific Computing)

Free, publicly-accessible full text available August 31, 2026
Transport map unadjusted Langevin algorithms: Learning and discretizing perturbed samplers

https://doi.org/10.3934/fods.2024047

Zhang, Benjamin J; Marzouk, Youssef M; Spiliopoulos, Konstantinos (January 2025, Foundations of Data Science)

Full Text Available
Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps

Baptista, Ricardo; Pooladian, Aram-Alexandre; Brennan, Michael; Marzouk, Youssef; Niles-Weed, Jonathan (November 2024, arXiv)

Full Text Available
Conditional Sampling with Monotone GANs: From Generative Models to Likelihood-Free Inference

https://doi.org/10.1137/23M1581546

Baptista, Ricardo; Hosseini, Bamdad; Kovachki, Nikola B; Marzouk, Youssef M (September 2024, SIAM/ASA Journal on Uncertainty Quantification)

Full Text Available
hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty

https://doi.org/10.1145/3580278

Kim, Ki-Tae; Villa, Umberto; Parno, Matthew; Marzouk, Youssef; Ghattas, Omar; Petra, Noemi (June 2023, ACM Transactions on Mathematical Software)

Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.
more » « less
Full Text Available
Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

https://doi.org/10.1007/s11222-022-10147-6

Zhang, Benjamin J.; Marzouk, Youssef M.; Spiliopoulos, Konstantinos (September 2022, Statistics and Computing)

Abstract We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator.
more » « less
Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design

Jagalur-Mohan, Jayanth; Marzouk, Youssef (October 2021, Journal of machine learning research)

We propose and analyze batch greedy heuristics for cardinality constrained maximization of non-submodular non-decreasing set functions. We consider the standard greedy paradigm, along with its distributed greedy and stochastic greedy variants. Our theoretical guarantees are characterized by the combination of submodularity and supermodularity ratios. We argue how these parameters define tight modular bounds based on incremental gains, and provide a novel reinterpretation of the classical greedy algorithm using the minorize maximize (MM) principle. Based on that analogy, we propose a new class of methods exploiting any plausible modular bound. In the context of optimal experimental design for linear Bayesian inverse problems, we bound the submodularity and supermodularity ratios when the underlying objective is based on mutual information. We also develop novel modular bounds for the mutual information in this setting, and describe certain connections to polyhedral combinatorics. We discuss how algorithms using these modular bounds relate to established statistical notions such as leverage scores and to more recent efforts such as volume sampling. We demonstrate our theoretical findings on synthetic problems and on a real-world climate monitoring example.
more » « less
Full Text Available
Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2

https://doi.org/10.5194/gmd-13-3439-2020

Susiluoto, Jouni; Spantini, Alessio; Haario, Heikki; Härkönen, Teemu; Marzouk, Youssef (January 2020, Geoscientific Model Development)
null (Ed.)
Abstract. Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools.We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data. We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally.The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments. We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.
more » « less
Full Text Available
Students Tackle Bayesian Inverse Problems in the Colorado Rockies

Ghattas, Omar; Marzouk, Youssef; Parno, Matt; Petra, Noemi; Stadler, Georg; Villa, Umberto (January 2019, SIAM news)

Full Text Available

Search for: All records