skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Amortized simulation-based frequentist inference for tractable and intractable likelihoods
Abstract High-fidelity simulators that connect theoretical models with observations are indispensable tools in many sciences. If the likelihood is known, inference can proceed using standard techniques. However, when the likelihood is intractable or unknown, a simulator makes it possible to infer the parameters of a theoretical model directly from real and simulated observations when coupled with machine learning. We introduce an extension of the recently proposed likelihood-free frequentist inference (LF2I) approach that makes it possible to construct confidence sets with thep-value function and to use the same function to check the coverage explicitly at any given parameter point. LikeLF2I, this extension yields provably valid confidence sets in parameter inference problems for which a high-fidelity simulator is available. The utility of our algorithm is illustrated by applying it to three pedagogically interesting examples: the first is from cosmology, the second from high-energy physics and astronomy, both with tractable likelihoods, while the third, with an intractable likelihood, is from epidemiology33Code to reproduce all of our results is available onhttps://github.com/AliAlkadhim/ALFFI..  more » « less
Award ID(s):
2045843
PAR ID:
10510378
Author(s) / Creator(s):
; ;
Publisher / Repository:
Machine Learning: Science and Technology
Date Published:
Journal Name:
Machine Learning: Science and Technology
Volume:
5
Issue:
1
ISSN:
2632-2153
Page Range / eLocation ID:
015020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Ground-based high-resolution cross-correlation spectroscopy (HRCCS;R ≳ 15,000) is a powerful complement to space-based studies of exoplanet atmospheres. By resolving individual spectral lines, HRCCS can precisely measure chemical abundance ratios, directly constrain atmospheric dynamics, and robustly probe multidimensional physics. But the subtleties of HRCCS data sets—e.g., the lack of exoplanetary spectra visible by eye and the statistically complex process of telluric removal—can make interpreting them difficult. In this work, we seek to clarify the uncertainty budget of HRCCS with a forward-modeling approach. We present an HRCCS observation simulator,scope,55https://github.com/arjunsavel/scopethat incorporates spectral contributions from the exoplanet, star, tellurics, and instrument. This tool allows us to control the underlying data set, enabling controlled experimentation with complex HRCCS methods. Simulating a fiducial hot Jupiter data set (WASP-77Ab emission with IGRINS), we first confirm via multiple tests that the commonly used principal component analysis does not bias the planetary signal when few components are used. Furthermore, we demonstrate that mildly varying tellurics and moderate wavelength solution errors induce only mild decreases in HRCCS detection significance. However, limiting-case, strongly varying tellurics can bias the retrieved velocities and gas abundances. Additionally, in the low signal-to-noise ratio limit, constraints on gas abundances become highly non-Gaussian. Our investigation of the uncertainties and potential biases inherent in HRCCS data analysis enables greater confidence in scientific results from this maturing method. 
    more » « less
  2. Abstract Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variableYgiven complex inputsX. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for allx, and when needed, to reshape the densities ofytoward ‘instance-wise’ calibration. This paper introduces the local amortized diagnostics and reshaping of conditional densities (LADaR) framework and proposes a new computationally efficient algorithm (Cal-PIT) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs).Cal-PITlearns a single interpretable local probability–probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, whereCal-PITachieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyzes99Code available as a Python package here:https://github.com/lee-group-cmu/Cal-PIT.. 
    more » « less
  3. Abstract Precise and accurate predictions of the halo mass function for cluster mass scales inwνCDM cosmologies are crucial for extracting robust and unbiased cosmological information from upcoming galaxy cluster surveys.Here, we present a halo mass function emulator for cluster mass scales (≳ 1013M/h) up to redshiftz= 2 with comprehensive support for the parameter space ofwνCDM cosmologies allowed by current data.Based on theAemulusνsuite of simulations, the emulator marks a significant improvement in the precision of halo mass function predictions by incorporating both massive neutrinos and non-standard dark energy equation of state models.This allows for accurate modeling of the cosmology dependence in large-scale structure and galaxy cluster studies.We show that the emulator, designed using Gaussian Process Regression, has negligible theoretical uncertainties compared to dominant sources of error in future cluster abundance studies.Our emulator is publicly available (https://github.com/DelonShen/aemulusnu_hmf), providing the community with a crucial tool for upcoming cosmological surveys such as LSST and Euclid. 
    more » « less
  4. Abstract While space-borne optical and near-infrared facilities have succeeded in delivering a precise and spatially resolved picture of our Universe, their small survey area is known to underrepresent the true diversity of galaxy populations. Ground-based surveys have reached comparable depths but at lower spatial resolution, resulting in source confusion that hampers accurate photometry extractions. What once was limited to the infrared regime has now begun to challenge ground-based ultradeep surveys, affecting detection and photometry alike. Failing to address these challenges will mean forfeiting a representative view into the distant Universe. We introduceThe Farmer: an automated, reproducible profile-fitting photometry package that pairs a library of smooth parametric models fromThe Tractorwith a decision tree that determines the best-fit model in concert with neighboring sources. Photometry is measured by fitting the models on other bands leaving brightness free to vary. The resulting photometric measurements are naturally total, and no aperture corrections are required. Supporting diagnostics (e.g.,χ2) enable measurement validation. As fitting models is relatively time intensive,The Farmeris built with high-performance computing routines. We benchmarkThe Farmeron a set of realistic COSMOS-like images and find accurate photometry, number counts, and galaxy shapes.The Farmeris already being utilized to produce catalogs for several large-area deep extragalactic surveys where it has been shown to tackle some of the most challenging optical and near-infrared data available, with the promise of extending to other ultradeep surveys expected in the near future.The Farmeris available to download from GitHub (https://github.com/astroweaver/the_farmer) and Zenodo (https://doi.org/10.5281/zenodo.8205817). 
    more » « less
  5. Abstract Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While such data have traditionally been available via “bulk sequencing,” the more recently introduced techniques for single-cell DNA sequencing (scDNAseq) provide the type of data that makes CNA inference possible at the single-cell resolution. We introduce a new birth-death evolutionary model of CNAs and a Bayesian method, NestedBD, for the inference of evolutionary trees (topologies and branch lengths with relative mutation rates) from single-cell data. We evaluated NestedBD’s performance using simulated data sets, benchmarking its accuracy against traditional phylogenetic tools as well as state-of-the-art methods. The results show that NestedBD infers more accurate topologies and branch lengths, and that the birth-death model can improve the accuracy of copy number estimation. And when applied to biological data sets, NestedBD infers plausible evolutionary histories of two colorectal cancer samples. NestedBD is available athttps://github.com/Androstane/NestedBD. 
    more » « less