There has been growing interest in the AI community for precise uncertainty quantification. Conditional density models f(y|x), where x represents potentially high-dimensional features, are an integral part of uncertainty quantification in prediction and Bayesian inference. However, it is challenging to assess conditional density estimates and gain insight into modes of failure. While existing diagnostic tools can determine whether an approximated conditional density is compatible overall with a data sample, they lack a principled framework for identifying, locating, and interpreting the nature of statistically significant discrepancies over the entire feature space. In this paper, we present rigorous and easy-to-interpret diagnostics such as (i) the “Local Coverage Test” (LCT), which distinguishes an arbitrarily misspecified model from the true conditional density of the sample, and (ii) “Amortized Local P-P plots” (ALP) which can quickly provide interpretable graphical summaries of distributional differences at any location x in the feature space. Our validation procedures scale to high dimensions and can potentially adapt to any type of data at hand. We demonstrate the effectiveness of LCT and ALP through a simulated experiment and applications to prediction and parameter inference for image data.
more »
« less
Novel Uncertainty Quantification through Perturbation-Assisted Sample Synthesis
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pretrained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal’s distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
more »
« less
- Award ID(s):
- 1952539
- PAR ID:
- 10511463
- Editor(s):
- Lee, Kyoung Mu
- Publisher / Repository:
- The IEEE Computer Society
- Date Published:
- Journal Name:
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Edition / Version:
- 4
- ISSN:
- 0162-8828
- Page Range / eLocation ID:
- 1 to 12
- Subject(s) / Keyword(s):
- Uncertainty Quantification, Diffusion, Normalizing Flows, Large pre-trained Models, Multimodality, High dimensionality.
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Hicks, Michael (Ed.)This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL’s query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data, which makes it possible to use models written in a variety of probabilistic programming languages that are tailored to specific workflows. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. GenSQL is formalized using a novel type system and denotational semantics, which together enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case real-world studies—an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab—and show that GenSQL more accurately captures the complexity of the data as compared to common baselines. We also show that the declarative syntax in GenSQL is more concise and less error-prone as compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization.more » « less
-
WISER: Multimodal variational inference for full-waveform inversion without dimensionality reductionWe develop a semiamortized variational inference (VI) framework designed for computationally feasible uncertainty quantification in full-waveform inversion to explore the multimodal posterior distribution without dimensionality reduction. The framework is called full-waveform VI via subsurface extensions with refinements (WISER). WISER builds on top of a supervised generative artificial intelligence method that performs approximate amortized inference that is low-cost albeit showing an amortization gap. This gap is closed through nonamortized refinements that make frugal use of wave physics. Case studies illustrate that WISER is capable of full-resolution, computationally feasible, and reliable uncertainty estimates of velocity models and imaged reflectivities.more » « less
-
In hydrology, modeling streamflow remains a challenging task due to the limited availability of basin characteristics information such as soil geology and geomorphology. These characteristics may be noisy due to measurement errors or may be missing altogether. To overcome this challenge, we propose a knowledge-guided, probabilistic inverse modeling method for recovering physical characteristics from streamflow and weather data, which are more readily available. We compare our framework with state-of-the-art inverse models for estimating river basin characteristics. We also show that these estimates offer improvement in streamflow modeling as opposed to using the original basin characteristic values. Our inverse model offers a 3% improvement in R2 for the inverse model (basin characteristic estimation) and 6% for the forward model (streamflow prediction). Our framework also offers improved explainability since it can quantify uncertainty in both the inverse and the forward model. Uncertainty quantification plays a pivotal role in improving the explainability of machine learning models by providing additional insights into the reliability and limitations of model predictions. In our analysis, we assess the quality of the uncertainty estimates. Compared to baseline uncertainty quantification methods, our framework offers a 10% improvement in the dispersion of epistemic uncertainty and a 13% improvement in coverage rate. This information can help stakeholders understand the level of uncertainty associated with the predictions and provide a more comprehensive view of the potential outcomes.more » « less
-
As we approach the High Luminosity Large Hadron Collider (HL-LHC) set to begin collisions by the end of this decade, it is clear that the computational demands of traditional collision simulations have become untenably high. Current methods, relying heavily on first-principles Monte Carlo simulations for event showers in calorimeters, are estimated to require millions of CPU-years annually, a demand that far exceeds current capabilities. This bottleneck presents a unique opportunity for breakthroughs in computational physics through the integration of generative AI with quantum computing technologies. We propose a Quantum-Assisted deep generative model. In particular, we combine a variational autoencoder (VAE) with a Restricted Boltzmann Machine (RBM) embedded in its latent space as a prior. The RBM in latent space provides further expressiveness compared to legacy VAE where the prior is a fixed Gaussian distribution. By crafting the RBM couplings, we leverage D-Wave’s Quantum Annealer to significantly speed up the shower sampling time. By combining classical and quantum computing, this framework sets a path towards utilizing large-scale quantum simulations as priors in deep generative models and demonstrate their ability to generate high-quality synthetic data for the HL-LHC experiments.more » « less
An official website of the United States government

