Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational–statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.
more »
« less
A statistical framework for domain shape estimation in Stokes flows
Abstract We develop and implement a Bayesian approach for the estimation of the shape of a two dimensional annular domain enclosing a Stokes flow from sparse and noisy observations of the enclosed fluid. Our setup includes the case of direct observations of the flow field as well as the measurement of concentrations of a solute passively advected by and diffusing within the flow. Adopting a statistical approach provides estimates of uncertainty in the shape due both to the non-invertibility of the forward map and to error in the measurements. When the shape represents a design problem of attempting to match desired target outcomes, this ‘uncertainty’ can be interpreted as identifying remaining degrees of freedom available to the designer. We demonstrate the viability of our framework on three concrete test problems. These problems illustrate the promise of our framework for applications while providing a collection of test cases for recently developed Markov chain Monte Carlo algorithms designed to resolve infinite-dimensional statistical quantities.
more »
« less
- PAR ID:
- 10427696
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Inverse Problems
- Volume:
- 39
- Issue:
- 8
- ISSN:
- 0266-5611
- Page Range / eLocation ID:
- Article No. 085009
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
AbstractOne‐dimensional (1D) cardiovascular models offer a non‐invasive method to answer medical questions, including predictions of wave‐reflection, shear stress, functional flow reserve, vascular resistance and compliance. This model type can predict patient‐specific outcomes by solving 1D fluid dynamics equations in geometric networks extracted from medical images. However, the inherent uncertainty inin vivoimaging introduces variability in network size and vessel dimensions, affecting haemodynamic predictions. Understanding the influence of variation in image‐derived properties is essential to assess the fidelity of model predictions. Numerous programs exist to render three‐dimensional surfaces and construct vessel centrelines. Still, there is no exact way to generate vascular trees from the centrelines while accounting for uncertainty in data. This study introduces an innovative framework employing statistical change point analysis to generate labelled trees that encode vessel dimensions and their associated uncertainty from medical images. To test this framework, we explore the impact of uncertainty in 1D haemodynamic predictions in a systemic and pulmonary arterial network. Simulations explore haemodynamic variations resulting from changes in vessel dimensions and segmentation; the latter is achieved by analysing multiple segmentations of the same images. Results demonstrate the importance of accurately defining vessel radii and lengths when generating high‐fidelity patient‐specific haemodynamics models.image Key pointsThis study introduces novel algorithms for generating labelled directed trees from medical images, focusing on accurate junction node placement and radius extraction using change points to provide haemodynamic predictions with uncertainty within expected measurement error.Geometric features, such as vessel dimension (length and radius) and network size, significantly impact pressure and flow predictions in both pulmonary and aortic arterial networks.Standardizing networks to a consistent number of vessels is crucial for meaningful comparisons and decreases haemodynamic uncertainty.Change points are valuable to understanding structural transitions in vascular data, providing an automated and efficient way to detect shifts in vessel characteristics and ensure reliable extraction of representative vessel radii.more » « less
-
Abstract The rise of exascale supercomputing has motivated an increase in high‐fidelity computational fluid dynamics (CFD) simulations. The detail in these simulations, often involving shape‐dependent, time‐variant flow domains and low‐speed, complex, turbulent flows, is essential for fueling innovations in fields like wind, civil, automotive, or aerospace engineering. However, the massive amount of data these simulations produce can overwhelm storage systems and negatively affect conventional data management and postprocessing workflows, including iterative procedures such as design space exploration, optimization, and uncertainty quantification. This study proposes a novel sampling method harnessing the signed distance function (SDF) concept: SDF‐biased flow importance sampling (BiFIS) and implicit compression based on implicit neural network representations for transforming large‐size, shape‐dependent flow fields into reduced‐size shape‐agnostic images. Designed to alleviate the above‐mentioned problems, our approach achieves near‐lossless compression ratios of approximately :, reducing the size of a bridge aerodynamics forced‐vibration simulation from roughly to about while maintaining low reproduction errors, in most cases below , which is unachievable with other sampling approaches. Our approach also allows for real‐time analysis and visualization of these massive simulations and does not involve decompression preprocessing steps that yield full simulation data again. Given that image sampling is a fundamental step for any image‐based flow field prediction model, the proposed BiFIS method can significantly improve the accuracy and efficiency of such models, helping any application that relies on precise flow field predictions. The BiFIS code is available onGitHub.more » « less
-
ABSTRACT Observations of gravitational waves emitted by merging compact binaries have provided tantalizing hints about stellar astrophysics, cosmology, and fundamental physics. However, the physical parameters describing the systems (mass, spin, distance) used to extract these inferences about the Universe are subject to large uncertainties. The most widely used method of performing these analyses requires performing many Monte Carlo integrals to marginalize over the uncertainty in the properties of the individual binaries and the survey selection bias. These Monte Carlo integrals are subject to fundamental statistical uncertainties. Previous treatments of this statistical uncertainty have focused on ensuring that the precision of the inferred inference is unaffected; however, these works have neglected the question of whether sufficient accuracy can also be achieved. In this work, we provide a practical exploration of the impact of uncertainty in our analyses and provide a suggested framework for verifying that astrophysical inferences made with the gravitational-wave transient catalogue are accurate. Applying our framework to models used by the LIGO–Virgo–KAGRA collaboration and in the wider literature, we find that Monte Carlo uncertainty in estimating the survey selection bias is the limiting factor in our ability to probe narrow population models and this will rapidly grow more problematic as the size of the observed population increases.more » « less
-
Abstract Identification of preferential flow paths in heterogeneous subsurface environments is key to assess early solute arrival times at environmentally sensitive targets. We propose a novel methodology that leverages the information contained in preferential flow paths to quantify early arrival times and their associated uncertainty. Our methodology is based on a two‐stage approach that combines Convolutional Neural Networks (CNN) and Multi‐Layer Perceptron (MLP) techniques. The CNN is used to identify preferential flow paths, the MLP being employed to map tortuosity of these paths and key geostatistical parameters of conductivities therein onto early arrival times. As such, our approach provides novel insights into the relationship between the geostatistical characterization of conductivities along preferential flow paths and early arrival times. The effectiveness of the approach is exemplified on synthetic two‐dimensional (randomly) heterogeneous hydraulic conductivity fields. In this context, we assess three distinct CNN architectures and two MLP architectures to determine the most effective combination between these to reliably and effectively quantifying preferential flow paths and early arrival times of solutes. The resulting framework is robust and efficient. It enhances our ability to assess early solute arrival times in heterogeneous aquifers and offers valuable insights into connectivity patterns associated with preferential flow paths therein.more » « less
An official website of the United States government
